Gemma 4 on Bedrock, NVIDIA MoE Kernels #50

TaeyoungPark

June 17, 2026 — 2 min read

Today's Letter

Amazon Bedrock adds Google Gemma 4 model family
NVIDIA, fused MoE kernels for faster training

Amazon Bedrock adds Google Gemma 4 model family

AWS announced Gemma 4 availability on Amazon Bedrock on June 15, 2026
The release includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B
All three support text and image input, built-in reasoning mode, and native function calling for agent workflows
Gemma 4 31B and 26B-A4B provide 256K-token context windows, while E2B provides 128K tokens
The 26B-A4B variant uses a mixture-of-experts design with 25.2B total parameters and 3.8B active parameters per token
AWS says the models are served through the Bedrock Mantle endpoint, its OpenAI-compatible API for the next-generation inference engine
AWS says prompts and completions are not used for model training, and customer content is not shared with third parties
Google DeepMind released Gemma 4 under the Apache 2.0 license, and AWS positions the Bedrock launch at multimodal agents, document pipelines, and software engineering workloads

Source: aws.amazon.com

NVIDIA, fused MoE kernels for faster training

NVIDIA published new fused MLP kernels for dense and mixture-of-experts training on June 15, 2026.
The company says the kernels deliver 1.3x-2x kernel-level speedup versus unfused execution paths.
The design targets three MoE bottlenecks: activation overhead, CPU-side routing overhead, and quantization cost.
NVIDIA built the kernels with CuTe DSL and says they enable sync-free MoE execution with full-iteration CUDA Graphs.
Supported GLU variants include SwiGLU, GeGLU, and sReLU, with clamping and scaling options.
The kernels also handle MXFP8 and NVFP4 quantization within the fused execution path.
NVIDIA says the optimization contributed an 8% end-to-end gain in its DeepSeek-V3 pre-training setup.
In NVIDIA's GPT-OSS pre-training setup, the same optimization is reported to have delivered a 93% end-to-end gain.
The kernels are available in cuDNN Frontend and can be accessed through Transformer Engine and Megatron-Core.

Source: developer.nvidia.com

Jocoletter curates AI, software, and product trends for developers and builders.

#AWS #NVIDIA

Subscribe to Jocoletter

Anthropic results and NVIDIA training guide #55

Today's Letter 1. Anthropic, Project Fetch Phase Two results published 2. NVIDIA, Low-Precision Transformer Training Guide Anthropic, Project Fetch Phase Two results published * Anthropic published Phase Two results for Project Fetch on June 18, 2026, testing Claude Opus 4.7 on robodog setup and autonomy tasks first run

Agent Infrastructure and Copilot Metrics #54

Today's Letter 1. GitHub adds per-user AI credit metrics to Copilot API 2. AWS, Web Search for Bedrock AgentCore GA GitHub adds per-user AI credit metrics to Copilot API * GitHub added an `ai_credits_used` field to the Copilot usage metrics API for per-user AI credit consumption tracking

Agent deployment, Copilot model shifts, enterprise controls #53

Today's Letter 1. Cloudflare, temporary accounts for agent deployments 2. GitHub Copilot, Opus 4.6 (fast) retirement set for June 29 3. AWS, SageMaker inference metrics on CloudWatch 4. OpenAI adds ChatGPT Enterprise usage analytics and spend controls Cloudflare, temporary accounts for agent deployments * Cloudflare introduced Temporary Cloudflare

Agent Stack, Copilot, and Async Inference #52

Today's Letter 1. GitHub Copilot, context handling and Auto routing update 2. AWS, SageMaker Async Inference adds inline payloads 3. Vercel, Agent Stack and eve unveiled 4. Hugging Face, agentic tooling benchmark for open models published GitHub Copilot, context handling and Auto routing update * GitHub published a June