Gemma 4 on Bedrock, NVIDIA MoE Kernels #50
Today's Letter
Amazon Bedrock adds Google Gemma 4 model family
- AWS announced Gemma 4 availability on Amazon Bedrock on June 15, 2026
- The release includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B
- All three support text and image input, built-in reasoning mode, and native function calling for agent workflows
- Gemma 4 31B and 26B-A4B provide 256K-token context windows, while E2B provides 128K tokens
- The 26B-A4B variant uses a mixture-of-experts design with 25.2B total parameters and 3.8B active parameters per token
- AWS says the models are served through the Bedrock Mantle endpoint, its OpenAI-compatible API for the next-generation inference engine
- AWS says prompts and completions are not used for model training, and customer content is not shared with third parties
- Google DeepMind released Gemma 4 under the Apache 2.0 license, and AWS positions the Bedrock launch at multimodal agents, document pipelines, and software engineering workloads
Source: aws.amazon.com
NVIDIA, fused MoE kernels for faster training

- NVIDIA published new fused MLP kernels for dense and mixture-of-experts training on June 15, 2026.
- The company says the kernels deliver 1.3x-2x kernel-level speedup versus unfused execution paths.
- The design targets three MoE bottlenecks: activation overhead, CPU-side routing overhead, and quantization cost.
- NVIDIA built the kernels with CuTe DSL and says they enable sync-free MoE execution with full-iteration CUDA Graphs.
- Supported GLU variants include SwiGLU, GeGLU, and sReLU, with clamping and scaling options.
- The kernels also handle MXFP8 and NVFP4 quantization within the fused execution path.
- NVIDIA says the optimization contributed an 8% end-to-end gain in its DeepSeek-V3 pre-training setup.
- In NVIDIA's GPT-OSS pre-training setup, the same optimization is reported to have delivered a 93% end-to-end gain.
- The kernels are available in cuDNN Frontend and can be accessed through Transformer Engine and Megatron-Core.
Source: developer.nvidia.com
Jocoletter curates AI, software, and product trends for developers and builders.
#AWS #NVIDIA