13 Apple researchers published a research paper outlining how they learned the best practices to build a multimodal large language model that can deliver quality results and perform better than others.
In a research paper, Apple researchers outline their process of finding out what works and what doesn’t when building a multimodal large language model (MLLM). They created a “performant” MLLM dubbed MM1 to further the understanding of the scientific AI community. The paper has many real-world learnings that can help the future development of LLMs. For example, the researchers found that image encoder with image resolution and token have a great impact while vision-language connector design is comparatively negligible in an MLLM.
World’s largest MLLMs include OpenAI’s GPT-4, Anthropic’s Claude 3, and Google’s Gemini Ultra. Those are models backing consumer-facing products. In contrast, the MM1 is more like the result of the research by Apple into what makes a good MLLM, without tying any consumer-facing tools, capabilities, or products to it.
The research paper titled “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training” is available on the pre-publish site arXiv. This model can interpret images and text as well as do the other tasks we’re accustomed to, such as reasoning.
The research team claims to have made major practical advancements in the discipline of MLLMs. The MM1 is poised to be a family of language models that can go up to 30B parameters. In comparison, Claude 3 Opus is supposed to be trained on 2 trillion parameters, Gemini Ultra on 175B parameters, and GPT-4 about 1.76 trillion parameters. In other words, MM1 is fairly small. But the team is already working on more robust and bigger models. This experiment was just about finding the best approach, in many ways.
In its coverage, Wired’s Will Knight called it a sleeping giant waking up – “the biggest sign yet that Apple is developing generative AI capabilities.” The company has certainly upped its game, releasing several research papers on AI and how to best use it. All of these papers have been freely distributed online, however, for everyone to learn. There’s nothing proprietary right now. Either Apple is working on much more than it’s letting mere mortals on, or it’s still about to dip its toes into consumer AI tech.
Do note that Apple’s machine learning arm isn’t devoid of innovative ideas and breakthroughs. The company might not have released an AI-focused product yet, but it’s certainly at the forefront.