Third-Party Testing of Gemini Pro Reveals Lower Performance than Free ChatGPT

Researchers find out that Gemini Pro has slightly lower accuracy compared to free GPT-3.5-turbo and much lower compared to GPT-4.

A group of researchers published a paper on arXiv where they objectively compared Google’s Gemini LLM to OpenAI’s GPT 3.5 and 4 models. The researchers concluded that Gemini Pro model compares to GPT 3.5 Turbo, while being somewhat inferior” and “much worse than GPT 4.” The model outperformed Mixtral. The team found that the Gemini Pro model was inferior to GPT 3.5 Turbo mainly in a few key aspects ranging from multiple-choice questions to mathematical reasoning. On the bright side, Gemini Pro beat GPT 3.5 Turbo on “particularly long and complex reasoning tasks, and also was adept multilingually in tasks where responses were not filtered.”

GPT 3.5 Turbo is the model used in the free version of ChatGPT and was released months ago. Notably, arXiv is a platform to publish research before peer reviewing.

The researchers found Gemini Pro to lack specifically in the topics of human sexuality, formal logic, elementary math, and professional medicine. In the test, the accuracy of Gemini Pro was 64.12/60.63 vs. GPT 3.5 Turbo’s 67.75/70.07 and GPT 4’s 80.48/78.95.

VentureBeat reached out to Google for a comment on this. The article mentions that Google’s own research shows that Gemini Pro is better than GPT 3.5 and the upcoming version, Gemini Ultra, is superior to GPT 4 based on internal research and benchmarks. Gemini’s Ultra weight is scheduled for a launch in 2024. There are also further clarifications on what could have caused the lower accuracy of Gemini Pro vis-à-vis GPT 3.5 Turbo in this test which the Google spokesperson provides in the VentureBeat article.

By Abhimanyu