Mar 09, 2025 13:00:00

Intel announces that 'Run llama.cpp Portable Zip on Intel GPU with IPEX-LLM', which allows various AIs to run on a local Windows PC, also supports DeepSeek

In recent years, many advanced generative AIs and large-scale language models have appeared, but to run them, you need expensive GPUs and other equipment. However, Intel's PyTorch extension '

IPEX-LLM ' makes it possible to run AIs such as Gemma and Llama on Intel's discrete GPUs. Intel has now announced that IPEX-LLM is now compatible with DeepSeek R1 .

ipex-llm/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md at main · intel/ipex-llm · GitHub
https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

IPEX-LLM, released by Intel, is an extension for PyTorch that enables the latest AI to run on PCs equipped with Intel CPUs and GPUs.

This time, Intel announced that it is now possible to run llama.cpp directly on Intel GPUs by using 'llama.cpp Portable Zip' based on the open source software library

llama.cpp on IPEX-LLM. As a result, it has become possible to run DeepSeek-R1-671B-Q4_K_M with llama.cpp Portable Zip.

Intel has provided instructions on GitHub on how to install llama.cpp Portable Zip, how to run llama.cpp, and how to run each AI for different distributions, such as Windows and Linux.

ipex-llm/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md at main · intel/ipex-llm · GitHub
https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

Intel lists the operating conditions for llama.cpp Portable Zip as ' Intel Core Ultra processor ', '11th to 14th generation Core processor', ' Intel Arc A series GPU ', and ' Intel Arc B series GPU '. In addition, to operate DeepSeek-R1-671B-Q4_K_M, a PC equipped with an ' Intel Xeon ' processor and one or two ' Arc A770 ' cards is required.

Intel Fellow and Chief Architect Jinkan Dai has published a demonstration of running DeepSeek-R1-Q4_K_M on Intel Xeon and Arc A770 using llama.cpp Portable Zip.

In response to this, one commenter on the message board site Hacker News pointed out that ' prompts with around 10 tokens don't cause the issues seen in this demo, but if you add more contexts you'll quickly run into a computational bottleneck.'

Related Posts:

Mar 09, 2025 13:00:00 in Software, Posted by log1r_ut