Microsoft’s Copilot Will Run on Windows Machines Locally

Intel confirmed that the Copilot app will be able to run locally on Windows PCs with more advanced hardware, particularly NPUs.

Local LLMs are one of the most important needs of the future. Apple is reportedly working on one, and now Microsoft has teased officially that future Windows PCs will run Copilot locally. To the uninitiated, currently, all LLMs maintain a connection to the cloud. All inferencing is done on rented hardware in data centers. So, each query you send to ChatGPT, Gemini, Bing, Claude, etc. is processed by the company’s GPUs. If you try to run an LLM locally (you can only do that with open-source ones, such as Llama), the process is painfully slow unless you have a high-end GPU like the RTX 3090 or RTX 4080.

That’s about to change. Copilot is Microsoft’s answer to a generative AI personal assistant that you can launch in the sidebar on any screen as long as your PC is on. The local instances will be powered by dedicated NPUs or Neural Processing Units that have 40 TOPS or more. NPUs are more energy-efficient compared to GPUs (naturally, because GPUs are meant to play games whereas NPUs are specialized AI inferencing hardware).

We’re entering the world of AI PCs. Tom’s Hardware’s Paul Alcorn broke the story on March 27. He published Intel’s responses as follows:

… there’s going to be a continuum or an evolution, where then we’re going to go to the next-gen AI PC with a 40 TOPS requirement in the NPU. We have our next-gen product that’s coming that will be in that category … And as we go to that next-gen, it’s just going to enable us to run more things locally, just like they will run Copilot with more elements of Copilot running locally on the client. That may not mean that everything in Copilot is running local, but you’ll get a lot of key capabilities that will show up running on the NPU.”
Todd Lewellen, Vice President, Intel Client Computing Group

Notably, PCs with these higher specs are already shipping. Over time, as PCs become more advanced and include these next-gen chips and NPUs, running local LLM instances will become easier than ever and you won’t have to rely on sending queries to cloud data centers. This will certainly make things more private. That being said, Copilot will be able to see all your prompts, of course, for training purposes.

By Abhimanyu