Google’s GPT-4 competitor, multimodal LLM Gemini is finally here, with its most powerful variant slated for a 2024 release.
Google finally released the long-awaited Gemini model – touting it as a new era – on Wednesday. So far, Google’s consumer-facing apps like Bard were using the PaLM model (still are, until the transition is complete). Gemini is all set to change the game and compete with OpenAI’s GPT-4, where it provides excellent gains.
The latest LLM was teased back in June. There are three versions of Gemini for different use cases. The smallest one, called Gemini Nano, will be incorporated into Pixel phones. The middle one, Gemini Pro, will power apps like Bard (which has already started rolling out). The highest-end and largest of the trio, Gemini Ultra, is set for release next year.
This model will lie under the hood and power the majority of Google’s products including Google Search. It’s already been over a year since ChatGPT was launched, rushing the world toward a crazy AI popular culture boom. Google, being at the forefront of AI for over a decade with DeepMind and its various algorithms, was caught off-guard by OpenAI’s instant success. Now, the score is even.
Compared to GPT-4, Gemini scores really well across benchmarks. It’s also multimodal, meaning it can work with photos, videos, voice, and text. Whereas multimodality isn’t something you get with ChatGPT (OpenAI has DALL-E for images and Whisper for sound), it is truly the future as this is the only way to connect robotics, sensors, and the internet together to create smarter robots.
In some ways, Gemini has been created to meet the larger requirements of an integrated world and a seamless transition to robotics.
Notably, the inferencing this time around was cheaper and more power efficient, being trained on Google’s Tensor Processing Units. According to Verge, talking to the CEOs of Google and Google DeepMind signals the beginning of a larger project.
Here are the head-to-head stats:
Benchmark | Gemini Ultra | GPT-4 |
MMLU (general) | 90% | 86.4% |
DROP (reasoning) | 82.3 | 80.9 |
GSM8K (math) | 94.4% | 92% |
HumanEval (code) | 74.4% | 67% |