Google launches Gemini—a powerful AI model it says can surpass GPT-4

Key Takeaways:

– Google has announced Gemini, a multimodal AI model family that aims to rival OpenAI’s GPT-4.
– Gemini can handle multiple types of input, including text, code, images, and audio.
– The goal is to create an AI that can solve problems, give advice, and answer questions in various fields.
– Gemini is available in three sizes: Ultra, Pro, and Nano, each designed for different tasks and computational power requirements.
– Google claims that Gemini outperforms current state-of-the-art results on 30 out of 32 academic benchmarks.
– The mid-level Gemini model is currently available in over 170 countries as part of the Google Bard chatbot.
– Gemini is more scalable and efficient when run on Google’s custom Tensor Processing Units (TPU).
– Google also trained a coding-centric version of Gemini called AlphaCode 2, which excels at solving competitive programming problems.
– Gemini represents a significant science and engineering effort for Google, according to CEO Sundar Pichai.

Ars Technica:

Enlarge / The Google Gemini logo.

Google

On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in large language model (LLM) research and development.” It’s a follow-up to PaLM 2, an earlier AI model that Google hoped would match GPT-4 in capability.

A specially tuned English version of its mid-level Gemini model is available now in over 170 countries as part of the Google Bard chatbot—although not in the EU or the UK due to potential regulation issues.

Like GPT-4, Gemini can handle multiple types (or “modes”) of input, making it multimodal. That means it can process text, code, images, and even audio. The goal is to make a type of artificial intelligence that can accurately solve problems, give advice, and answer questions in a variety of fields—from the mundane to the scientific. Google says this will power a new era in computing, and it hopes to tightly integrate the technology into its products.

“Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information,” writes Google. “Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering, and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.”

Google says Gemini will be available in three sizes: Gemini Ultra (“for highly complex tasks”), Gemini Pro (“for scaling across a wide range of tasks”), and Gemini Nano (“for on device tasks” like Google’s Pixel 8 Pro smartphone). Each is likely separated in complexity by parameter count. More parameters means a bigger neural network that is generally more capable of executing more complex tasks but requires more computational power to run. That means Nano, the smallest, is designed to run locally on consumer devices, while Ultra can only run on data center hardware.

Google Gemini promotional video from Google.

“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” wrote Google CEO Sundar Pichai in a prepared statement. “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead and for the opportunities Gemini will unlock for people everywhere.”

Although Gemini will come in three sizes, only the mid-level model is available for public use as of today. As mentioned above, Google Bard now runs a specially tuned version of Gemini Pro. From our informal testing so far, Gemini Pro does appear to perform much better than the previous version of Bard, which was based on Google’s PaLM 2 language model.

Google also claims that Gemini is more scalable and efficient than its previous AI models when run on Google’s custom Tensor Processing Units (TPU). “On TPUs,” Google says, “Gemini runs significantly faster than earlier, smaller and less-capable models.”

And it’s purportedly great at coding. Google trained a special coding-centric version of Gemni called AlphaCode 2, which “excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science,” according to Google. Gemini is also excellent at inflating Google’s PR language—if the models were any less capable and revolutionary, would the marketing copy be any less breathless? It’s doubtful.

Source link

AI Eclipse TLDR:

Google has announced the launch of Gemini, a multimodal AI model family that aims to rival OpenAI’s GPT-4. Google claims that the largest version of Gemini outperforms current state-of-the-art results on 30 out of 32 widely used academic benchmarks in large language model research. Gemini is a follow-up to PaLM 2, an earlier AI model that Google developed to match GPT-4’s capabilities. The mid-level Gemini model, specially tuned for English, is currently available in over 170 countries as part of the Google Bard chatbot, excluding the EU and the UK due to potential regulation concerns. Like GPT-4, Gemini is capable of processing multiple types of input, including text, code, images, and audio. Google envisions Gemini as an AI model that can solve problems, provide advice, and answer questions across various fields. It hopes to integrate the technology tightly into its products and believes that Gemini will usher in a new era in computing. Gemini will come in three sizes: Gemini Ultra for complex tasks, Gemini Pro for a wide range of tasks, and Gemini Nano for on-device tasks. Each size is likely differentiated by the complexity of its parameters, with Nano designed to run locally on consumer devices and Ultra requiring data center hardware. Although Gemini is available in three sizes, only the mid-level model is currently accessible to the public. Google claims that Gemini is more scalable and efficient than its previous AI models when running on its custom Tensor Processing Units (TPUs), and it excels at coding tasks. The company has trained a coding-centric version of Gemini called AlphaCode 2, which is capable of solving competitive programming problems involving complex math and theoretical computer science. Google CEO Sundar Pichai states that Gemini represents one of the biggest science and engineering efforts the company has undertaken and expresses excitement for the opportunities it will unlock.