Maintained by
Asian Intelligence Editorial Team
Tracker page
Use this tracker when you want benchmark claims monitored as positioning signals instead of repeated as marketing copy. It keeps official repos, product pages, and benchmark names visible in one place so readers can tell what is actually being claimed and why.
Maintained by
Asian Intelligence Editorial Team
Review standard
Reviewed against the site methodology, source hierarchy, and update posture.
Reference links
Use the methodology and research-assets pages when you want to verify sourcing posture, page types, and exportable reference layers.
Methodology Research assetsAt A Glance
The point of this tracker is provenance, not leaderboard worship.
It is most useful when Chinese and Korean model teams are foregrounding different benchmarks for different strategic reasons.
Use it with model-race and company pages so benchmark claims stay connected to product strategy.
Analysis
Use these sections when a quick summary is not enough and you want the structural read behind the headline theme.
Why this tracker exists
In Asian AI coverage, benchmark tables often do double duty. They look like neutral evaluation, but they also tell readers what a company wants the market to notice.
That is why the claim surface matters. A score in a GitHub repository is different from a score in a polished product page or a one-off media interview. The same benchmark can signal technical transparency, product maturity, or a deliberate attempt to anchor a model in a specific comparison set.
This tracker helps readers keep that distinction visible. The useful question is not just who posted a number, but what kind of release surface they used, which variant they were talking about, and which part of their product strategy the benchmark was meant to strengthen.
Best current lens
Actors
Who is making the claim
Chinese platform-model teams, Korean product companies, and open model projects often use benchmark tables differently.
Surfaces
Where the claim lives
Official GitHub repos are usually easier to audit than a product page or an indirect interview summary.
Families
What kind of benchmark is being emphasized
Reasoning, coding, local-language, and agentic benchmarks usually reveal product priorities more clearly than one overall score.
What to watch next
Common Questions
These routes and search chips help readers move from a question into the most useful briefing, topic page, or report.
Tracker page
Use the China model-race tracker when benchmark claims need to be placed back into company rivalry, release cadence, and model-family positioning.
Open China trackerCompany hub
Open Moonshot AI when benchmark claims need to be tied back to specific Kimi variants, official docs, and release surfaces.
Open Moonshot hubCompany hub
Open DeepSeek when the benchmark discussion needs a cleaner example of an official GitHub-centered claim surface.
Open DeepSeek hubState-of page
Use the regional state-of page when benchmark claims need to be interpreted inside the wider strategic pattern instead of in isolation.
Comparison page
Use the compute comparison page when benchmark claims need to be paired with the infrastructure that may or may not support them.
Adjacent Routes
These links connect the hub to the main briefing, topic, and market layers so readers can change depth without starting over.
Country briefing
Start here for China’s AI policy stack, compute constraints, major companies, and strategic posture.
Country briefing
Start here for South Korea’s sovereign-AI push, industrial scale, compute buildout, and policy execution.
Topic hub
Language models, compute layers, chips, and the infrastructure choices shaping capability across the region.
Topic hub
Profiles, executive context, and company strategy for the organizations and people shaping AI execution across Asia.
What To Watch
Which Asian AI teams are making benchmark claims on official surfaces?
How should benchmark claims be read without collapsing into leaderboard theater?
Why do some teams emphasize local-language or agentic benchmarks while others foreground frontier reasoning tests?
Watchlist
Watch which actors keep publishing auditable benchmark tables on official surfaces rather than relying on vague comparative language.
Track where benchmark focus shifts toward agentic work, coding, or local-language capability because that usually reveals deeper product intent.
Monitor whether more Asian teams start using the same benchmark families often enough to make cross-market comparison more stable.
FAQ
This tracker is for official benchmark claim surfaces, benchmark explainers, and the model-positioning materials that shape how those claims should be interpreted across Asian AI markets.
A model-race tracker follows company movement and release cadence. This tracker follows the evaluation language and official claim surfaces that companies use to justify their positioning.
Archive Links
These are the archive entries most directly relevant to this hub right now.
Published March 30, 2026 Updated March 30, 2026
Why it matters: A source-first tracker of benchmark claims made by Asian AI companies and labs, focused on official release surfaces and how to interpret them.
Published March 30, 2026 Updated March 30, 2026
Why it matters: A source-first explainer on Humanity's Last Exam, its official milestone timeline, and how major Asian AI teams are already using the benchmark in release materials.
Distribution
Push the page into social, email, feeds, or CSV workflows without losing the canonical route.
Follow The Coverage
Use the digest to follow related briefings, topic hubs, trackers, and new archive entries tied to this recurring question.
Prefer feeds or direct links? Use the RSS feed or download the structured CSV exports.