The LMSys Chatbot Arena Leaderboards are the most comprehensive and reliable ranking to determine the relative proficiency of different LLMs. So far, OpenAI’s GPT-4 has been the leader. Claude 3’s Opus weight has taken GPT-4 down shortly after being released.
In a major setback to OpenAI’s GPT-4, Claude 3’s Opus has finally dethroned the long-time champion of AI language models. The leaderboards on the LMSys Chatbot Arena compare different chatbot outputs. It’s compiled by researchers and is a manual process. Unlike technical specs one-upmanship (larger context window, more tokens, better score on some high school exam, etc.), this is a much more dependable and accurate measure of an LLM’s true power.
As I’ve said earlier, now we’re kind of living in a transitionary phase. LLMs are going to grow incrementally better month on month. New, major updates will dethrone previous champions in different aspects before getting beaten themselves. This is a cycle that we’re going to be in for a while now.
Topping the LMSys chatbot arena leaderboards of Hugging Face is a big step. Claude has proven to be the most powerful chatbot yet. It achieved an Elo of 1255 points. GPT-4’s 1106-preview scored 1252 and the 0125-preview scored 1249. Gemini Pro ranks fourth with 1204 points. Claude 3’s Sonnet ranks #5 and Haiku ranks #7.
The leaderboard tracks a total of 76 models and is based on 500K+ votes as of March 29, 2024.