Quick Take

What this page helps answer

A source-first tracker of benchmark claims made by Asian AI companies and labs, focused on official release surfaces and how to interpret them.

Who, How, Why

Who: Asian Intelligence Editorial Team
How: Prepared from cited public sources and reviewed against the site’s editorial standards.
Why: To give readers sourced context on AI policy, company strategy, and technology development in Asia.

Region Asia Topic AI policy, company strategy, and technology development 2 min read

Published by Asian Intelligence Editorial Team Published Mar 31, 2026 Updated Mar 31, 2026

Editorial team Editorial standards Corrections and contact

Report Navigation

Asian AI Benchmark Claims Tracker

A provenance-first tracker for benchmark claims made on official release surfaces as of March 29, 2026.

How To Use This Page

This tracker is not trying to decide which model is "best." It is trying to show where benchmark claims actually come from, which benchmarks are being foregrounded, and what kind of release surface made the claim. That matters because the same company can cite one set of benchmarks in a GitHub repo, another in a product page, and a third in a media interview.

Verified Claim Surfaces

Actor	Country	Official source	Benchmarks explicitly surfaced	What readers should notice
DeepSeek-V3.2-Exp	China	DeepSeek GitHub repository	Humanity's Last Exam 19.8, AIME 2025 89.3, MMLU-Pro 85.0, GPQA-Diamond 79.9, LiveCodeBench 74.1	The repo is explicit and numeric. This is a clean example of a Chinese model team using public benchmark tables as part of product positioning.
Kimi K2 Instruct	China	MoonshotAI Kimi K2 repository	Humanity's Last Exam text-only 5.7, GPQA-Diamond 75.1, plus extensive AIME, coding, and SWE-bench tables	Moonshot emphasizes breadth. The value here is not one score; it is the way the repo mixes reasoning, coding, and agentic evaluation surfaces.
Solar Pro 2	South Korea	Upstage launch page	Ko-MMLU, Hae-Rae, Ko-IFEval, Ko-Arena-Hard-Auto, MMLU, MMLU-Pro, HumanEval, Math500, AIME, SWE-Bench Agentless	Upstage's claim surface is more product-page oriented. It foregrounds Korean leadership and practical reasoning strength rather than only one universal frontier score.
Qwen3	China	Qwen3 repository	Official repo publishes benchmark tables and a technical report across multiple variants and reasoning modes	Qwen is a reminder that model-family tracking matters. Variant confusion is one of the easiest ways benchmark comparisons become misleading.

What Makes Asian Benchmark Claims Distinctive

Three patterns stand out. First, Chinese model teams frequently publish benchmark tables directly inside GitHub repos, which makes the claim surface relatively easy to audit. Second, South Korean release pages tend to emphasize local-language strength and usable enterprise capabilities alongside general benchmarks. Third, the most informative claims are often the ones that reveal the company's real product strategy: multilingual strength, tool use, coding, agentic work, or evaluation breadth.

How To Read These Claims Correctly

Check whether the benchmark table lives on an official repo, an official product page, or only a press interview.
Check whether the claim is attached to a specific variant, such as instruct, thinking, or experimental mode.
Check whether the release is emphasizing one benchmark because it flatters the model's real commercial strength.
Use language and deployment benchmarks seriously in Asia, because they often matter more than one leaderboard headline.

Primary Sources Used

Distribution

Share, follow, and reuse this page

Push the page into social, email, feeds, or CSV workflows without losing the canonical route.

Share on X Share on LinkedIn Share by email RSS feed CSV exports

Follow The Coverage

Follow the latest AI in Asia reporting

Use the weekly digest to keep new reports, topic hubs, and briefing updates in the same reading loop.

Prefer feeds or direct links? Use the RSS feed or download the structured CSV exports.