The AI department of Microsoft has released 3 weights of a new family of language models called Phi-3 that primarily displays the importance of training in keeping a model’s sizes small while improving the quality of its output.
Phi-3 is the name of the family of AI models that Microsoft has unveiled recently. It’s going to come in three weights – Phi-3-mini with 3.8B parameters, Phi-3-small with 7B parameters, and Phi-3-medium with 14B parameters. If 14B is “medium” then what’s going to be the large, if they’re planning one? Notably, these are small language models, not large language models or LLMs.
In this class, Phi-3 is the only model that has a context window of 128K tokens, which is insane. In the official blog post, Phi-3 was compared with Gemma-7b, Mistral-7b, Mistral’s 8x7b, Llama-3-8b, Claude-3-Sonnet, and GPT3.5-Turbo. The benchmarks display a striking performance even for the Mini, which is trained on considerably lesser data compared to some other names on this list.
What this tells us is that it’s not necessarily the size of the data that matters. It’s how you use it. The paper written by the researchers is titled “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.”
Microsoft has pretty much proven that companies, if they do their due diligence in improving fine-tuning, can create SLMs and LLMs that are lighter (so, easier to run on consumer hardware) while being pretty capable.
Other qualities of the Phi-3 family include a safety-first design, native running capabilities (in fact, you can run it right now using Ollama), and cost constraints for smaller tasks. The use case seems to be for corporations who can fine-tune the Phi-3 model while being able to run it on low-end hardware as well.