Needle, a lightweight version of Gemini's tool invocation functionality designed to run on smartphones, has been released, with developers touting its usefulness in building AI agents for mobile devices.



AI company Cactus Compute has released ' Needle ,' an AI model for calling tools with 26 million parameters.

GitHub - cactus-compute/needle: 26m function call model that runs on incredibly small devices · GitHub
https://github.com/cactus-compute/needle

needle/docs/simple_attention_networks.md at main · cactus-compute/needle · GitHub
https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model | Hacker News
https://news.ycombinator.com/item?id=48111896

Needle is a model developed by distilling the tool calling functionality of Google's AI model ' Gemini-3.1-Flash-Lite '. It runs locally on devices intended for general users and can perform high-speed execution, with prefill processing at 6000 tokens per second and decoding processing at 1200 tokens per second.

Needle's pre-training was performed using 16 TPU v6e processors over 27 hours, and post-training was completed in 45 minutes using a tool call dataset generated with Gemini.

Henry Ndubuaku, the developer of Needle, commented, 'We were frustrated that there was little effort being made to develop an AI agent that would work on low-cost smartphones. Our analysis revealed that AI agents are built on tool invocation, and large models are overkill.' He emphasized that Needle's focus on tool invocation ensures its lightweight design, allowing it to run on edge devices such as smartphones.

Needle is available at the following link. It is licensed under the MIT License.

Cactus-Compute/needle · Hugging Face
https://huggingface.co/Cactus-Compute/needle

Cactus Compute also develops 'Cactus Chat,' an AI execution app for smartphones. Instructions on how to use Cactus Chat can be found at the following link.

Review of 'Cactus Chat,' a free app that lets you run AI models locally and chat on both Android and iPhone smartphones - GIGAZINE



It should be noted that Needle has publicly stated that it was developed by distilling Gemini-3.1-Flash-Lite, but Google prohibits the extraction and distillation of Gemini.

Gemini API Additional Terms of Use | Google AI for Developers
https://ai.google.dev/gemini-api/terms?hl=ja



in AI, Posted by log1o_hf