4 LLMs Face Off in Chess, GPT-4 Beats Gemini Pro and Mixtral 8x7B

GPT-4, GPT-3.5 Turbo, Gemini Pro, and Mixtral 8x7B were programmed to play chess against each other in an experiment by a Redditor. GPT-4 won by a large margin.

The LLM World Chess Championship was held between major LLMs including GPT-4 Turbo, Mixtral 8x7B, GPT-3.5 Turbo, and Gemini Pro. OpenAI’s GPT-4 came out on top with a large margin whereas Google’s Gemini Pro lagged behind, showing its incapabilities in reasoning and chain-of-thought problems. Mistral AI’s Mixtral 8x7B came second and GPT-3.5 Turbo was third.

The experiment was conducted using the python-chess library to comply with official chess rules. Each model played 30 games against every other model, alternating between black and white.

Notably, GPT-4 is trained on screenshots of chess games from high-skill players (ELO above 1800).

This was a pet experiment by a Reddit user who, in his original post, says that he will be publishing a detailed blog post, an arXiv paper, a GitHub repository, the PGN files, and additional data. He also commented the exact prompt used for all models.

"""You are a chess engine playing a match against another engine. Your color in this game is {color}. Your objective is to select a move that maximizes your chances of winning.
    
    Current Game Status:
    PGN: {str(game)} (Portable Game Notation - a standard chess notation used to record the moves in a game)
    FEN: {board.fen()} (Forsyth-Edwards Notation - a method to describe a particular board position)
    List of Valid Moves: {moves_str}
    
    Guidelines for Selecting Your Move:

    1. Mandatory Adherence to Valid Moves: You must STRICTLY choose your move ONLY from the provided List of Valid Moves. Even if you think you can't win still pick a valid move from the provided List of Valid Moves. Any selection outside this list is INVALID and should NOT be considered.
    2. Strategic Approach: Select a move that significantly improves your chances of winning. Consider the implications of your move for future turns.
    3. Response Format: Present your response exactly as follows: "Move: [Your Move selected from the provided List of Valid Moves];; Explanation: [Your Thought Process Step by Step]". Example: "Move: Nd7;; Explanation: Choosing Nd7 as it threatens the c7 pawn, puts the black king in check, and opens a path to victory."
    4. Final Step - Self-Reflection and Compliance Check. Before finalizing your response, conduct a thorough self-review to ensure:
            - Your chosen move is from the List of Valid Moves.
            - Your response strictly adheres to the outlined format.
            - Your explanation clearly justifies your move in a step-by-step manner.
    """

By Abhimanyu

Unwrapping the fast-evolving AI popular culture.