Skip to main content

Quick Take

What this page helps answer

A source-first tracker of benchmark claims made by Asian AI companies and labs, focused on official release surfaces and how to interpret them.

Who, How, Why

Who
Asian Intelligence Editorial Team
How
Prepared from cited public sources and reviewed against the site’s editorial standards.
Why
To give readers sourced context on AI policy, company strategy, and technology development in Asia.
Region Asia Topic AI policy, company strategy, and technology development 2 min read
Published by Asian Intelligence Editorial Team Published Updated

Asian AI Benchmark Claims Tracker

A provenance-first tracker for benchmark claims made on official release surfaces as of March 29, 2026.

How To Use This Page

This tracker is not trying to decide which model is "best." It is trying to show where benchmark claims actually come from, which benchmarks are being foregrounded, and what kind of release surface made the claim. That matters because the same company can cite one set of benchmarks in a GitHub repo, another in a product page, and a third in a media interview.

Verified Claim Surfaces

Actor Country Official source Benchmarks explicitly surfaced What readers should notice
DeepSeek-V3.2-Exp China DeepSeek GitHub repository Humanity's Last Exam 19.8, AIME 2025 89.3, MMLU-Pro 85.0, GPQA-Diamond 79.9, LiveCodeBench 74.1 The repo is explicit and numeric. This is a clean example of a Chinese model team using public benchmark tables as part of product positioning.
Kimi K2 Instruct China MoonshotAI Kimi K2 repository Humanity's Last Exam text-only 5.7, GPQA-Diamond 75.1, plus extensive AIME, coding, and SWE-bench tables Moonshot emphasizes breadth. The value here is not one score; it is the way the repo mixes reasoning, coding, and agentic evaluation surfaces.
Solar Pro 2 South Korea Upstage launch page Ko-MMLU, Hae-Rae, Ko-IFEval, Ko-Arena-Hard-Auto, MMLU, MMLU-Pro, HumanEval, Math500, AIME, SWE-Bench Agentless Upstage's claim surface is more product-page oriented. It foregrounds Korean leadership and practical reasoning strength rather than only one universal frontier score.
Qwen3 China Qwen3 repository Official repo publishes benchmark tables and a technical report across multiple variants and reasoning modes Qwen is a reminder that model-family tracking matters. Variant confusion is one of the easiest ways benchmark comparisons become misleading.

What Makes Asian Benchmark Claims Distinctive

Three patterns stand out. First, Chinese model teams frequently publish benchmark tables directly inside GitHub repos, which makes the claim surface relatively easy to audit. Second, South Korean release pages tend to emphasize local-language strength and usable enterprise capabilities alongside general benchmarks. Third, the most informative claims are often the ones that reveal the company's real product strategy: multilingual strength, tool use, coding, agentic work, or evaluation breadth.

How To Read These Claims Correctly

  • Check whether the benchmark table lives on an official repo, an official product page, or only a press interview.
  • Check whether the claim is attached to a specific variant, such as instruct, thinking, or experimental mode.
  • Check whether the release is emphasizing one benchmark because it flatters the model's real commercial strength.
  • Use language and deployment benchmarks seriously in Asia, because they often matter more than one leaderboard headline.

Primary Sources Used

  1. DeepSeek-V3.2-Exp official repository
  2. Kimi K2 official repository
  3. Upstage Solar Pro 2 launch page
  4. Qwen3 official repository

Distribution

Share, follow, and reuse this page

Push the page into social, email, feeds, or CSV workflows without losing the canonical route.

Follow the latest AI in Asia reporting

Use the weekly digest to keep new reports, topic hubs, and briefing updates in the same reading loop.

Prefer feeds or direct links? Use the RSS feed or download the structured CSV exports.