Apple Develops an Efficient Method to Run LLMs Locally on Devices

Apple has tested the practicality of running LLMs locally using flash memory, which can allow models to run on devices with lower RAM, such as iPhones.

It was hinted that Apple was working on its own AI model and would offer the service to run locally on the device, as compared to services like ChatGPT or Bard, that are processed in data centers and then sent to the user. Running an LLM locally is taxing on the hardware and requires a lot of processing power, power that is simply lacking on smartphone devices. Even running a simple text-to-image prompt on Stable Diffusion locally requires a good GPU for realistic results, for example.

In what could be called a breakthrough, Apple has devised a way to run an LLM locally on an iPhone device. iPhones and other Apple devices have limited memory, and the company plans to use flash memory to compensate for that.

This was published in a research paper [PDF] titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” and it outlines exactly that – how to run an LLM locally on a device with low RAM by using the available flash memory.

Note that this is only Apple researchers figuring out a way to run LLMs locally using flash memory, and not a confirmation of Apple actually releasing an app or embedding AI in its coming updates. Using a multithreading approach, the data retrieval process was made quite efficient, allowing for simultaneous access to multiple data segments. This is the first step toward ensuring practicality of such kind.

The researchers and developers used plenty of optimizations and used the OPT 6.7B model to do the training and comparisons. They also tested t on the Falcon 7B model.

Artificial intelligence inferencing is a complex task and requires specialized hardware such as GPUs. Using Apple’s architecture of SoCs such as the M1 chip, this test yielded positive results. However, the actual implementation on an iPhone will not be as easy, as iPhones use the A-series chips, comparatively inferior to the M-series chips used in Mac systems.

By Abhimanyu