Gemini breaks new ground with a faster model, longer context, AI agents and more (2024)

AI

May 14, 2024

[[read-time]] min read

We’re introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency, and Project Astra, our vision for the future of AI assistants.

Demis Hassabis CEO of Google DeepMind, on behalf of the Gemini team

Gemini breaks new ground with a faster model, longer context, AI agents and more (2)

In December, we launched our first natively multimodal model Gemini 1.0 in three sizes: Ultra, Pro and Nano. Just a few months later we released 1.5 Pro, with enhanced performance and a breakthrough long context window of 1 million tokens.

Developers and enterprise customers have been putting 1.5 Pro to use in incredible ways and finding its long context window, multimodal reasoning capabilities and impressive overall performance incredibly useful.

We know from user feedback that some applications need lower latency and a lower cost to serve. This inspired us to keep innovating, so today, we’re introducing Gemini 1.5 Flash: a model that’s lighter-weight than 1.5 Pro, and designed to be fast and efficient to serve at scale.

Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window in Google AI Studio and Vertex AI. And now, 1.5 Pro is also available with a 2 million token context window via waitlist to developers using the API and to Google Cloud customers.

We’re also introducing updates across the Gemini family of models, announcing our next generation of open models, Gemma 2, and sharing progress on the future of AI assistants, with Project Astra.

Gemini family of model updates

The new 1.5 Flash, optimized for speed and efficiency

1.5 Flash is the newest addition to the Gemini model family and the fastest Gemini model served in the API. It’s optimized for high-volume, high-frequency tasks at scale, is more cost-efficient to serve and features our breakthrough long context window.

While it’s a lighter weight model than 1.5 Pro, it’s highly capable of multimodal reasoning across vast amounts of information and delivers impressive quality for its size.

Gemini breaks new ground with a faster model, longer context, AI agents and more (3)

1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more. This is because it’s been trained by 1.5 Pro through a process called “distillation,” where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model.

Read more about 1.5 Flash in our updated Gemini 1.5 technical report, on the Gemini technology page, and learn about 1.5 Flash’s availability and pricing.

Significantly improving 1.5 Pro

Over the last few months, we’ve significantly improved 1.5 Pro, our best model for general performance across a wide range of tasks.

Beyond extending its context window to 2 million tokens, we’ve enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding through data and algorithmic advances. We see strong improvements on public and internal benchmarks for each of these tasks.

1.5 Pro can now follow increasingly complex and nuanced instructions, including ones that specify product-level behavior involving role, format and style. We’ve improved control over the model’s responses for specific use cases, like crafting the persona and response style of a chat agent or automating workflows through multiple function calls. And we’ve enabled users to steer model behavior by setting system instructions.

We added audio understanding in the Gemini API and Google AI Studio, so 1.5 Pro can now reason across image and audio for videos uploaded in Google AI Studio. And we’re now integrating 1.5 Pro into Google products, including Gemini Advanced and in Workspace apps.

Read more about 1.5 Pro in our updated Gemini 1.5 technical report and on the Gemini technology page.

Gemini Nano understands multimodal inputs

Gemini Nano is expanding beyond text-only inputs to include images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand the world the way people do — not just through text, but also through sight, sound and spoken language.

Read more about Gemini 1.0 Nano on Android.

Next generation of open models

Today, we’re also sharing a series of updates to Gemma, our family of open models built from the same research and technology used to create the Gemini models.

We’re announcing Gemma 2, our next generation of open models for responsible AI innovation. Gemma 2 has a new architecture designed for breakthrough performance and efficiency, and will be available in new sizes.

The Gemma family is also expanding with PaliGemma, our first vision-language model inspired by PaLI-3. And we’ve upgraded our Responsible Generative AI Toolkit with LLM Comparator for evaluating the quality of model responses.

Read more on the Developer blog.

Progress developing universal AI agents

As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, we’ve always wanted to develop universal AI agents that can be helpful in everyday life. That’s why today, we’re sharing our progress in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).

To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.

While we’ve made incredible progress developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge. Over the past few years, we've been working to improve how our models perceive, reason and converse to make the pace and quality of interaction feel more natural.

Gemini breaks new ground with a faster model, longer context, AI agents and more (4)

10:25

A two-part demo of Project Astra, our vision for the future of AI assistants. Each part was captured in a single take, in real time.

Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall.

By leveraging our leading speech models, we also enhanced how they sound, giving the agents a wider range of intonations. These agents can better understand the context they’re being used in, and respond quickly, in conversation.

With technology like this, it’s easy to envision a future where people could have an expert AI assistant by their side, through a phone or glasses. And some of these capabilities are coming to Google products, like the Gemini app and web experience, later this year.

Continued exploration

We’ve made incredible progress so far with our family of Gemini models, and we’re always striving to advance the state-of-the-art even further. By investing in a relentless production line of innovation, we’re able to explore new ideas at the frontier, while also unlocking the possibility of new and exciting Gemini use cases.

Learn more about Gemini and its capabilities.

Collection Collection I/O 2024 Here’s a look at everything we announced at Google I/O 2024. See more

Gemini breaks new ground with a faster model, longer context, AI agents and more (6)

Gemini breaks new ground with a faster model, longer context, AI agents and more (7)

Gemini breaks new ground with a faster model, longer context, AI agents and more (8)

Gemini breaks new ground with a faster model, longer context, AI agents and more (9)

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a .

POSTED IN:

Gemini breaks new ground with a faster model, longer context, AI agents and more (2024)

FAQs

Gemini breaks new ground with a faster model, longer context, AI agents and more? ›

Gemini breaks new ground with a faster model, longer context, AI agents and more. We're introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency, and Project Astra, our vision for the future of AI assistants.

Is Gemini or ChatGPT better? ›

ChatGPT offers a more reliable and advanced voice mode

Both AI chatbots allow you to verbally interact with them. In Gemini, this feature operates more like voice-to-text: you speak (and manually click send), Gemini transcribes it, it responds with text, and you click play to listen to Gemini's responses.

What is Gemini AI used for? ›

What is Google Gemini (formerly Bard)? Google Gemini -- formerly known as Bard -- is an artificial intelligence (AI) chatbot tool designed by Google to simulate human conversations using natural language processing (NLP) and machine learning.

Is Gemini better than GPT-4? ›

Gemini edges out GPT-4 in broader comprehension, logical reasoning, and creative text generation. GPT-4 is better for commonsense reasoning and everyday tasks.

Why did Bard change to Gemini? ›

Here's why: From Persona to Platform: The name "Bard" evoked a specific AI persona, a conversational creative partner. "Gemini" symbolizes versatility and reflects Google's broader AI mission beyond a single chatbot interface.

Is OpenAI better than Gemini? ›

OpenAI in Azure is well-suited for developers and businesses looking for versatile AI capabilities that can be integrated into a wide range of applications. Google Gemini targets industries requiring specialized AI solutions, focusing on delivering high-impact results tailored to specific sectors.

What is the downside of Geminis? ›

Gemini, the third sign in the list of zodiac signs, is known for its lively and adaptable nature. However, Geminis can struggle with decision-making, tend to gossip, get easily bored, lack sincerity, and act impulsively without considering the consequences.

What are the disadvantages of Gemini AI? ›

Gemini may produce inaccurate responses. In the AI world, these are known as hallucinations. Since generative AI tools work by making predictions, it's possible that sometimes these predictions will be incorrect. This means that a tool like Gemini can make errors even when summarizing information directly from the web.

Is Gemini app good or bad? ›

Gemini is a user-friendly cryptocurrency exchange that could be a good choice for beginners and experienced traders alike. With industry-leading security features, its own hot wallet, and a comprehensive support center, Gemini is worth considering if you're interested in crypto investing or trading.

Should I use Gemini AI? ›

Who is Google Gemini AI Best For? Google Gemini is a very powerful tool, but it isn't packed with features and templates, which can often act as guides for people who are less experienced at prompt engineering. As a result, the tool is probably best for someone who is confident in prompting AI tools.

Why did Google take down Gemini? ›

Google halted Gemini's image generation feature nearly two weeks ago after users on social media flagged that it was creating inaccurate historical images that sometimes replaced White people with images of Black, Native American and Asian people.

Is Copilot better than Gemini? ›

Microsoft Copilot scores ahead of Gemini in AI quality because its output is more accurate and consistent. Copilot's Pro version is powered by OpenAI's GPT-4, which so far outstrips most of its AI rivals in precision and responsiveness to feedback.

Is Google Gemini free? ›

Is Google Gemini free? The standard version of Google Gemini is free, but it's more limited than the paid spin on the AI. As we've already discussed, the free Gemini AI is based on a simpler model, whereas those who pay a subscription for Gemini Advanced get a lot more depth in terms of features and capabilities.

Will Google Gemini replace Bard? ›

You can still visit bard.google.com, but the experience is now called Gemini. But it's more than just a name change. Google is also rolling out a new AI experience, and a new app on Android. Look, I think we can all agree that Bard was not a great name for an AI chatbot.

Is Gemini better than Bard? ›

Clearly, Google Gemini is far more than just a renamed version of Google Bard. It surpasses the previous chatbot solution in everything from accuracy to nuance and real-world integration. The advanced models powering Gemini mean it's more powerful, versatile, and capable of supporting users with a wide range of tasks.

Is Gemini Advanced Ultra? ›

Access our most capable AI model with Gemini Advanced

Today we're launching Gemini Advanced — a new experience that gives you access to Ultra 1.0, our largest and most capable state-of-the-art AI model.

Which AI is better than ChatGPT? ›

I test AI chatbots for a living and these are the best ChatGPT...
  • Best overall: Claude 3.
  • Best for Live Data: Google Gemini.
  • Most Creative: Microsoft Copilot.
  • Best for Research: Perplexity.
  • Most personal: Inflection Pi.
  • Best for Social: xAI Grok.
  • Best for open source: Llama 3.
  • Most fun: MetaAI.
5 days ago

Is Bard or ChatGPT better? ›

Final thoughts. Ultimately, the best chatbot for you will depend on your specific needs and preferences. If you need up-to-date information and a conversational tone, Bard may be more suited. If you need to summarise text or want to generate creative writing, however, ChatGPT may be a better fit.

Which is better, ChatGPT or Copilot? ›

You should use Copilot if...

One of the biggest problems with ChatGPT is the inability to confirm the accuracy of its responses, as the tool does not provide sources. Even though the May update to ChatGPT made it possible for the chatbot to browse the internet, ChatGPT still only provides links in some instances.

How much is Gemini vs ChatGPT API? ›

Gemini Chatbot Pricing

Free: Try Gemini Pro for free in supported countries or opt for Gemini Advanced via Google One AI Premium plan for $19.99/month. ChatGPT Chatbot Pricing: Free access to GPT-3.5 or opt for paid options like ChatGPT Plus starting at $20/month.

Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 6630

Rating: 4.7 / 5 (77 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.