Agent benchmarks and model release updates #48

TaeyoungPark

June 15, 2026 — 2 min read

Today's Letter

XiaomiMiMo, MiMo-V2.5-Pro-FP4-DFlash page posted
NVIDIA, GB300 NVL72 tops AA-AgentPerf launch

XiaomiMiMo, MiMo-V2.5-Pro-FP4-DFlash page posted

Hugging Face model page for XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash was posted on 2026-06-08.
The page identifies MiMo-v2.5-pro as a Xiaomi MiMo Team model with 1T parameters.
A 1M-token context window is stated in the retrieved source text.
The repository name indicates an FP4 DFlash variant rather than a generic MiMo-v2.5-pro listing.
A MIT license is shown on the model page.
No pricing, benchmark table, or deployment requirements are clearly provided in the retrieved source text.

Source: huggingface.co
More: mimo.xiaomi.com

NVIDIA, GB300 NVL72 tops AA-AgentPerf launch

NVIDIA said its GB300 NVL72 posted the top launch result on Artificial Analysis' AA-AgentPerf coding benchmark.
AA-AgentPerf measures how many concurrent coding agents a system can serve while meeting model-specific SLO targets.
The launch benchmark uses DeepSeek-V4-Pro and reports results normalized per accelerator and per megawatt.
SLO tiers are defined by output speed and P95 time-to-first-token: 30/10s, 100/5s, and 300/3s.
Test trajectories are based on public code-repository issues, cover 12+ programming languages, and include tool use.
Request lengths range from 5K to 131K tokens, with an average of about 27K tokens per request.
Tool-call latency is simulated with a shared CPU baseline using a one-second median delay across tested systems.
NVIDIA said GB300 NVL72 delivered up to 20x higher concurrent agent throughput per megawatt than H200.

Source: developer.nvidia.com
More: blogs.nvidia.com

Jocoletter curates AI, software, and product trends for developers and builders.

#NVIDIA #XiaomiMiMo

Subscribe to Jocoletter

Anthropic results and NVIDIA training guide #55

Today's Letter 1. Anthropic, Project Fetch Phase Two results published 2. NVIDIA, Low-Precision Transformer Training Guide Anthropic, Project Fetch Phase Two results published * Anthropic published Phase Two results for Project Fetch on June 18, 2026, testing Claude Opus 4.7 on robodog setup and autonomy tasks first run

Agent Infrastructure and Copilot Metrics #54

Today's Letter 1. GitHub adds per-user AI credit metrics to Copilot API 2. AWS, Web Search for Bedrock AgentCore GA GitHub adds per-user AI credit metrics to Copilot API * GitHub added an `ai_credits_used` field to the Copilot usage metrics API for per-user AI credit consumption tracking

Agent deployment, Copilot model shifts, enterprise controls #53

Today's Letter 1. Cloudflare, temporary accounts for agent deployments 2. GitHub Copilot, Opus 4.6 (fast) retirement set for June 29 3. AWS, SageMaker inference metrics on CloudWatch 4. OpenAI adds ChatGPT Enterprise usage analytics and spend controls Cloudflare, temporary accounts for agent deployments * Cloudflare introduced Temporary Cloudflare

Agent Stack, Copilot, and Async Inference #52

Today's Letter 1. GitHub Copilot, context handling and Auto routing update 2. AWS, SageMaker Async Inference adds inline payloads 3. Vercel, Agent Stack and eve unveiled 4. Hugging Face, agentic tooling benchmark for open models published GitHub Copilot, context handling and Auto routing update * GitHub published a June