Stability AI partners with Arm to develop technology to run music generation AI directly on smartphones



Stability AI, an AI company developing the image generation AI 'Stable Diffusion,' has partnered with semiconductor company Arm to optimize its music generation model ' Stable Audio Open ' to run directly on Arm's CPU. By combining Arm's KleidiAI library with Stability AI's cutting-edge technology, music generation time on smartphones has been increased by 30 times, from several minutes to just a few seconds.

Stability AI and Arm Bring On-Device Generative Audio to Smartphones — Stability AI
https://stability.ai/news/stability-ai-and-arm-bring-on-device-generative-audio-to-smartphones

On-device Audio Generation Accelerated by 30x with Arm Kleidi - Arm Newsroom
https://newsroom.arm.com/blog/stability-ai-arm-kleidi-text-to-audio-generation

Stable Audio Open is an open source music generation AI model released in June 2024 that can generate up to 47 seconds of music using only written instructions.

Stable Diffusion developer releases free music generation AI 'Stable Audio Open', capable of generating soundtracks up to 47 seconds long from text - GIGAZINE



Even if Stable Audio Open was run directly on the CPU of a conventional smartphone, it would take more than 240 seconds (4 minutes) to generate a single piece of music, which was not practical. Therefore, Stability AI distilled the Stable Audio Open model and compressed it to a smaller number of parameters for mobile devices.

In addition, Arm says it has integrated the KleidiAI library, which provides performance-focused routines called 'microkernels' specialized for Arm CPUs, with XNNPack and ExecuTorch . XNNPack is a deep learning calculation library optimized for mobile devices, and ExecuTorch is a framework that streamlines model execution on mobile devices. Arm reports that this integration has significantly accelerated 8-bit integer matrix multiplication.

In addition, optimizations were made to take advantage of the characteristics of the CPU cores in the Armv9 architecture. Armv9 has an extended instruction set for machine learning workloads, which enables more efficient execution.

As a result of combining these optimization techniques, it is now possible to run Stable Audio Open directly even on smartphones that are not online. For example, when run on the same Arm-based CPU, music generation time was increased by 30 times, from 240 seconds to less than 8 seconds. Arm also claims that the performance improvement was particularly dramatic when generating 11-second audio clips.

The following movie shows music being generated using Stable Audio Open, which runs in a local environment on a smartphone.

Arm and Stability AI Audio Generation Demo - YouTube


Prem Akkarajoo, CEO of Stability AI, said, 'As more professional and creative people and businesses incorporate generative AI into their production pipelines, it is important that our models and workflows are built and available everywhere creators can create. This is why Stability AI is partnering with Arm. Arm's ubiquity across the ecosystem, from servers to smartphones, and its work to accelerate AI models by integrating KleidiAI libraries into their software stacks across all popular frameworks made it a natural choice.'



In addition, at the Arm booth at the mobile device trade fair 'MWC 2025' held in Barcelona, Spain from March 3 to 6, 2025, a demonstration of Stable Audio Open running locally on a smartphone equipped with an Armv9 CPU is being held.

in Mobile,   Software, Posted by log1i_yk