Open-Source LLM Beats GPT-4 & Claude 3 Haiku

command r+

A new open-source model has outperformed proprietary leaders like GPT-4 and Claude 3’s Haiku weight in the latest standings.

LMSYS keeps a leaderboard of the performance of all language models in the world based on human votes. A new LLM called Command R+ by Cohere outperformed previous leaders such as GPT-4-0314, Claude 3 Haiku, and the earliest best open-source LLM Mistral-Large-2402. Notably, it hasn’t been able to beat state-of-the-art LLMs such as GPT-4-Turbo and Claude 3 Opus. Command R+, the largest of the Command family, currently ranks #7 on the leaderboard.

You can also access an arena on the website to test various chatbots side by side yourself.

In a list dominated by for-profit companies like Google, Anthropic, and OpenAI, it’s quite a surprise to see an open-source LLM outcompete others. Cohere’s family includes Command, Command R, and the Command R+ models. Command R+ offers impeccable reasoning, text generation quality, coding capabilities, long context window, multi-step tool use or agent-creation, and RAG workflows.

According to the official documentation, Command R+ has been fine-tuned for critical business use cases. So, if you add the fact that it’s now at the level of GPT-4 (an earlier version of it) and that it’s further fine-tuned for business tasks, it’s clear to see how the whole enterprise market is about to be shifted now.

To date, companies more or less rely on API calls and pay companies like Google, OpenAI, and Anthropic for all of their AI needs. Previous LLMs such as Meta’s Llama 2 can get the job done, such as working with data, but they’re not complete replacements. Mixtral is definitely another notable open-source challenger, but by beating GPT-4, Command R+ has come on top as a viable alternative that’s free to deploy for large businesses. And deploy it, they will.

It’s another thing that with GPT-5 all of these benchmarks will be kind of reset. And then with Gemini Ultra 2.0 or the next version of Claude Opus once more.

Cohere has been developing enterprise-grade AI tools that are already being used by companies such as Jasper, Notion, Salesforce, and Oracle. It’s an organization focusing on business clients, to begin with.

You can get the model from Hugging Face free of cost. Enterprise use is billed at $3/m tokens (input) and $15/m tokens (output).

By Abhimanyu

Unwrapping the fast-evolving AI popular culture.