Apple has reportedly developed an internal service akin to ChatGPT, intended to assist employees in testing new features, summarizing text, and answering questions based on accumulated knowledge.
In July, Mark Gurman suggested that Apple was in the process of creating its own AI model, with the central focus on a new framework named Ajax. The framework has the potential to offer various capabilities, with a ChatGPT-like application, unofficially dubbed "Apple GPT," being just one of the many possibilities. Recent indications from an Apple research paper suggest that Large Language Models (LLMs) may run on Apple devices, including iPhones and iPads.
This research paper, initially discovered by VentureBeat, is titled "LLM in a flash: Efficient Large Language Model Inference with Limited Memory." It addresses a critical issue related to on-device deployment of Large Language Models (LLMs), particularly on devices with constrained DRAM capacity.
LLMs are characterized by billions of parameters, and running them on devices with limited DRAM presents a significant challenge. Reportedly, the proposed solution in the paper involves on-device execution of LLMs by storing the model parameters in flash memory and retrieving them as needed into DRAM.
Keivan Alizadeh, a Machine Learning Engineer at Apple and the primary author of the paper, explained, "Our approach entails developing an inference cost model that aligns with the characteristics of flash memory, directing us to enhance optimization in two crucial aspects: minimizing the amount of data transferred from flash and reading data in larger, more cohesive segments."
The team employed two main strategies: "Windowing" and "row-column bundling." Windowing involves the reuse of previously activated neurons to minimize data transfer, while row-column bundling entails enlarging the size of data chunks read from flash memory. Implementing these techniques resulted in a notable 4-5 times enhancement in the Apple M1 Max System-on-Chip (SoC).
Theoretically, this adaptive loading based on context could enable the execution of Large Language Models (LLMs) on devices with constrained memory, like iPhones and iPads.
0 Comments