Apr 21, 2026 20:00:00

A tool called 'Kimi Vendor Verifier (KVV)' has been released to verify whether derivative APIs of the open AI model 'Kimi series' are working with the same accuracy as the official API.

Moonshot AI, a Chinese AI startup that develops the open-model AI series 'Kimi,' has released ' Kimi Vendor Verifier (KVV), ' a tool for verifying the accuracy of AI operations, as open source.

Kimi Vendor Verifier
https://www.kimi.com/blog/kimi-vendor-verifier

Moonshot AI has been creating promising models, including the 'Kimi K2.6,' an open model boasting performance equivalent to the Claude Opus 4.6, which was released on April 21, 2026.

Kimi K2.6, a Chinese-made AI model with performance equivalent to Claude Opus 4.6, has been released as an open model - GIGAZINE

Moonshot AI had been receiving frequent feedback from the community that 'benchmark scores were inaccurate.' After investigation, it was found that in most cases the cause was incorrect decoding parameters, and a fix was implemented to force several parameters to be set on the API side.

Further investigation of third-party APIs revealed significant differences between them and the official API. While some vendors, like Fireworks AI , strive to extract the best performance from their models, others provide low-quality models.

While Moonshot AI's models are open models that anyone can run, it has become clear that controlling their quality is difficult. When users use Moonshot AI models through third-party vendors and receive low-quality responses, the inability to determine whether the problem lies with the model or the third-party vendor's settings will lead to a loss of user trust.

Therefore, Moonshot AI developed and released as open source a tool called 'KVV' to verify whether the AI is correctly performing at the same level as the official specifications.

GitHub - MoonshotAI/Kimi-Vendor-Verifier: Kimi-Vendor-Verifier · GitHub
https://github.com/MoonshotAI/Kimi-Vendor-Verifier

KVV will be using the following six benchmarks to verify the model's performance.

• Preliminary verification
This verifies that the API parameters are being applied correctly.

OCRBench
A 5-minute smoke test for a multimodal pipeline.

MMMU Pro
We test diverse visual inputs and verify preprocessing.

AIME2025
Long-term output stress tests detect KV cache bugs and quantization degradation that cannot be found in short-term benchmarks.

K2VV ToolCall
It measures consistency and the accuracy of the JSON schema, and detects tool errors.

SWE-Bench
This is a fully agent-based coding test. It is not included in the open-source version due to its sandbox dependencies.

Furthermore, Moonshot AI will provide vendors with early access to test models, allowing them to check their systems before public release. They also plan to review vendor APIs and publish performance rankings.

The KVV execution reportedly took about 15 hours on a server equipped with two NVIDIA H20 8GPUs.

In announcing the release of KVV, Moonshot AI stated, 'Opening up the model is only half the battle. The other half is ensuring it works correctly elsewhere,' expressing their willingness to allow third-party vendors to properly implement Kimi K2.6.

Related Posts:

Apr 21, 2026 20:00:00 in AI, Posted by log1d_ts