Gemma 4 on Bedrock, NVIDIA MoE Kernels #50

Gemma 4 on Bedrock, NVIDIA MoE Kernels #50

Today's Letter

  1. Amazon Bedrock adds Google Gemma 4 model family
  2. NVIDIA, fused MoE kernels for faster training

Amazon Bedrock adds Google Gemma 4 model family

  • AWS announced Gemma 4 availability on Amazon Bedrock on June 15, 2026
  • The release includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B
  • All three support text and image input, built-in reasoning mode, and native function calling for agent workflows
  • Gemma 4 31B and 26B-A4B provide 256K-token context windows, while E2B provides 128K tokens
  • The 26B-A4B variant uses a mixture-of-experts design with 25.2B total parameters and 3.8B active parameters per token
  • AWS says the models are served through the Bedrock Mantle endpoint, its OpenAI-compatible API for the next-generation inference engine
  • AWS says prompts and completions are not used for model training, and customer content is not shared with third parties
  • Google DeepMind released Gemma 4 under the Apache 2.0 license, and AWS positions the Bedrock launch at multimodal agents, document pipelines, and software engineering workloads

Source: aws.amazon.com


NVIDIA, fused MoE kernels for faster training

NVIDIA, fused MoE kernels for faster training
  • NVIDIA published new fused MLP kernels for dense and mixture-of-experts training on June 15, 2026.
  • The company says the kernels deliver 1.3x-2x kernel-level speedup versus unfused execution paths.
  • The design targets three MoE bottlenecks: activation overhead, CPU-side routing overhead, and quantization cost.
  • NVIDIA built the kernels with CuTe DSL and says they enable sync-free MoE execution with full-iteration CUDA Graphs.
  • Supported GLU variants include SwiGLU, GeGLU, and sReLU, with clamping and scaling options.
  • The kernels also handle MXFP8 and NVFP4 quantization within the fused execution path.
  • NVIDIA says the optimization contributed an 8% end-to-end gain in its DeepSeek-V3 pre-training setup.
  • In NVIDIA's GPT-OSS pre-training setup, the same optimization is reported to have delivered a 93% end-to-end gain.
  • The kernels are available in cuDNN Frontend and can be accessed through Transformer Engine and Megatron-Core.

Source: developer.nvidia.com


Jocoletter curates AI, software, and product trends for developers and builders.

#AWS #NVIDIA

Subscribe to Jocoletter

Read more