Cohere releases 'Command A+', a multimodal AI built for agent tasks, a high-performance open-source model for enterprises that can be deployed in their own environments.



Cohere has open-sourced Command A+ , which it claims is the fastest and most powerful in its Command series of language models. Command A+ is an enterprise-grade model that handles complex inference, multimodal processing, multilingual support, and AI agent-like tasks, and runs on a minimum configuration of two NVIDIA H100s or one Blackwell-generation B200.

Introducing Command A+ | Cohere

https://cohere.com/blog/command-a-plus

CohereLabs/command-a-plus-05-2026-w4a4 · Hugging Face
https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4

Command A+ is a model developed based on Cohere's one year of experience deploying its enterprise AI workspace, 'North,' to customers. Cohere positions this model as the foundation for enabling ' sovereign AI ' that companies can run, manage, and adapt within their own environments.

Command A+ is also a model that integrates the functions of the conventional ' Command A ' series into one. While Command A Reasoning focused on inference, Command A Vision on multimodal processing, and Command A Translate on multilingual processing, Command A+ handles inference, multimodal processing, tool usage, and support for 48 languages all in one model.



The model name is 'command-a-plus-05-2026' and it is developed under the Apache 2.0 license. The architecture is '

Sparse MoE ,' a neural network that dramatically improves performance while reducing the computational cost of large AI models, with a total of 218 billion parameters. In fact, 25 billion active parameters are enabled for each token. The input context length is 128K, the maximum growth is 64K, and inputs include text, images, and tool usage.



Output supports text, inference, and tool usage, and multilingual support has expanded from 23 languages to 48. Command A+ employs a new tokenizer, reducing the number of tokens required to generate the same response, with improvements in token efficiency particularly noticeable in Arabic (20%), Korean (16%), and Japanese (18%).



Command A+ is a model developed by Cohere and Cohere Labs, optimized for agent processing, multilingual processing, heavy inference tasks, and visual information processing including image input. The publicly available model includes quantization versions of BF16, FP8, and W4A4, and can be tried out at Hugging Face Space.

The minimum GPU requirements for each quantization are four

B200 or eight H100 GPUs for BF16, two B200 or four H100 GPUs for FP8, and one B200 or two H100 GPUs for W4A4. Cohere states that the benchmark quality difference between the three quantizations is very small, and recommends W4A4 for most applications due to its speed, latency, and lower hardware requirements.

In terms of performance, significant improvements have been observed compared to Command A Reasoning. τ 2 -Bench Telecom improved from 37% to 85%, Terminal-Bench Hard from 3% to 25%, IFBench from 36% to 74%, AIME 25 from 57% to 90%, and SciCode from 30% to 38%.



Internal evaluations for North also showed that Command A+ demonstrated improvements in processing intended for enterprise use. Agent Question Answering improved from 45% to 65%, Data Analysis from 13% to 45%, and Memory Usage Quality from 39% to 54%, indicating improved performance in agent processing that uses memory for cloud file systems, spreadsheets, and past sessions.



The following are the results of a comparison of multimodal performance between Command A+ and Command A Vision. Command A+ achieved 63% on

MMMU Pro and 75.1% on MMMU . MathVista improved from 73.5% to 80.6%, and CharXiv reasoning improved from 46.9% to 52.7%, with Cohere highlighting the overall improvement in Command A+'s document comprehension tasks.



Efficiency is another major feature of Command A+. Cohere claims that compared to Command A Reasoning under the same quantization and parallel execution conditions, the output tokens per second (TFT) is improved by up to 63%, and the time to return the first token (TTFT) is reduced by up to 17%.

W4A4 quantization is said to provide a further 47% speed improvement and a 13% latency reduction. In addition, speculative decoding optimized for the MoE architecture enables 1.5 to 1.6 times faster inference for both text and multimodal inputs.



According to Cohere, the W4A4 quantized version applies NVFP4 W4A4 quantization, using 4-bit weights and activations only to the expert portion of MoE, while maintaining full precision for QKV, output projection, KV cache, and attention calculations. Furthermore, to minimize quality degradation after quantization, it uses Quantization Aware Distillation , which brings the quantized model closer to the output distribution of the full-precision model.

Cohere also shared comments from Vivek Mahajan, CTO of System Platforms at Fujitsu, with whom they are partnering in their AI business. Mahajan stated, 'Command A+'s MoE architecture and agent performance align with Fujitsu and Cohere's jointly developed enterprise LLM ' Takane ' and Fujitsu's AI platform ' Fujitsu Kozuchi Enterprise AI Factory ' in providing sovereign AI solutions.'

Command A+ can obtain model parameters via Hugging Face and can also be deployed to an inference environment managed by Model Vault. For a free trial, you can use Hugging Face Space or a Cohere API key, and it supports vLLM and Transformers. However, running the W4A4 version with vLLM requires vLLM 0.21.0 or later, and accurate response parsing requires the Cohere melody library.

in AI, Posted by log1i_yk