Claude Limits, Training Gains, API Pricing #12

May 8, 2026 — 3 min read

Today's Letter

Anthropic said it increased usage limits for Claude Code and the Claude API, with all listed changes effective on May 6, 2026
Claude Code five-hour rate limits were doubled for Pro, Max, Team, and seat-based Enterprise plans
Anthropic also removed peak-hours limit reductions for Claude Code on Pro and Max accounts
API rate limits were raised for Claude Opus models, though the announcement did not include the full table in the provided text
The company signed an agreement to use all compute capacity at SpaceX's Colossus 1 data center, adding more than 300 megawatts and over 220,000 NVIDIA GPUs within a month
Anthropic said this new capacity will directly improve availability for Claude Pro and Claude Max subscribers
The announcement adds to earlier compute agreements, including up to 5 GW with Amazon, 5 GW with Google and Broadcom starting in 2027, and a $30 billion Azure capacity partnership with Microsoft and NVIDIA
Anthropic said some new capacity will be deployed internationally, including additional inference capacity in Asia and Europe for enterprise compliance and data residency needs

Source: anthropic.com
More: techzine.eu · business-standard.com · dqindia.com

Unsloth said its collaboration with NVIDIA made LLM training about 25% faster, with the new optimizations enabled automatically after updating Unsloth
The post describes three changes: packed-sequence metadata caching, double-buffered async gradient checkpointing, and faster MoE routing for GPT-OSS training
Unsloth reported that caching packed-sequence metadata improved per-batch training speed by 14.3% on Qwen3-14B QLoRA SFT, with a 43.3% gain in forward pass and 5.8% in backward
The metadata path is reused across transformer layers, so caching avoids rebuilding masks and sequence metadata on every layer and reduces repeated synchronization overhead
For checkpointing, Unsloth uses two buffers so activation reloads from pinned CPU memory to GPU can overlap with backward compute instead of running in a serialized copy-then-compute pattern
Unsloth said this double-buffered checkpoint reload path produced an 8% speedup on larger dense-model runs on NVIDIA Blackwell B200 GPUs
The company also said GPT-OSS training became 15% faster by using argsort and bincount during mixture-of-experts routing
Unsloth said the new changes are additive to its earlier 2-5x training speedups and apply across RTX laptops, data center GPUs, and DGX Spark systems

Source: unsloth.ai
More: techflowpost.com

DeepSeek says the 75% discount on DeepSeek-V4-Pro remains in effect until 2026-05-31 15:59 UTC, according to its API pricing page
Discounted V4 Pro rates are listed at $0.003625 per 1M input tokens on cache hit, $0.435 on cache miss, and $0.87 per 1M output tokens
The standard V4 Pro prices shown on the same page are $0.0145 for cache-hit input, $1.74 for cache-miss input, and $3.48 for output, implying the temporary cut is already reflected in the active billing table
DeepSeek-V4-Pro supports both thinking and non-thinking modes, with 1M context length and up to 384K maximum output
JSON output, tool calls, chat prefix completion beta, and FIM completion beta are listed as supported features for the model
DeepSeek also notes that cache-hit input pricing for all models was reduced to one-tenth of launch pricing from 2026-04-26 12:15 UTC
The pricing page says product prices can change and recommends checking the page regularly for the latest billing terms

Source: api-docs.deepseek.com

Jocoletter curates AI, software, and product trends for developers and builders.

#Anthropic #DeepSeek #Unsloth