xAI’s new chatbot Grok is based on the LLM Grok-1, will compete with ChatGPT, and will be available to Twitter/X Premium+ users.
Benchmark | Grok-0 (33B) | LLaMa 2 70B | Inflection-1 | GPT-3.5 | Grok-1 | Palm 2 | Claude 2 | GPT-4 |
---|---|---|---|---|---|---|---|---|
GSM8k | 56.8% 8-shot | 56.8% 8-shot | 62.9% 8-shot | 57.1% 8-shot | 62.9% 8-shot | 80.7% 8-shot | 88.0% 8-shot | 92.0% 8-shot |
MMLU | 65.7% 5-shot | 68.9% 5-shot | 72.7% 5-shot | 70.0% 5-shot | 73.0% 5-shot | 78.0% 5-shot | 75.0% 5-shot + CoT | 86.4% 5-shot |
HumanEval | 39.7% 0-shot | 29.9% 0-shot | 35.4% 0-shot | 48.1% 0-shot | 63.2% 0-shot | – | 70% 0-shot | 67% 0-shot |
MATH | 15.7% 4-shot | 13.5% 4-shot | 16.0% 4-shot | 23.5% 4-shot | 23.9% 4-shot | 34.6% 4-shot | – | 42.5% 4-shot |
Commenting on a “real-life” test that Grok-1 went through with Claude-2 and GPT-4, it passed with 59% marks without any tuning, whereas Claude-2 scored 55% and GPT-4 scored 69%. Commenting on a “real-life” test that Grok-1 went through with Claude-2 and GPT-4, it passed with 59% marks without any tuning, whereas Claude-2 scored 55% and GPT-4 scored 69%.
Grok will only be available to the Premium+ users of X/Twitter, which costs $16 per month.
Elon Musk is a fan of science fiction and the word “grok” was coined in a sci-fi novel by Robert A. Heinlein. It means “to understand intuitively or by empathy, to establish rapport with” and “to empathize or communicate sympathetically (with); also, to experience enjoyment.”
Notably, the announcement reads that the chatbot has a “rebellious streak” with the fundamental advantage of being trained via the X/Twitter platform (stylized as ). Commenting on the humor that Grok is supposed to have, Elon Musk posted this tweet: