DiffusionGemma rollout and Copilot CLI updates #45

DiffusionGemma rollout and Copilot CLI updates #45

Today's Letter

  1. Google DeepMind, DiffusionGemma unveiled
  2. GitHub Copilot CLI, language server integration outlined
  3. NVIDIA, DiffusionGemma deployment stack detailed

Google DeepMind, DiffusionGemma unveiled

Google DeepMind, DiffusionGemma unveiled
  • Google DeepMind introduced DiffusionGemma on 2026-06-10 as a new text generation model.
  • The company says the release targets 4x faster text generation.
  • Google lists DiffusionGemma as a 26B model and says it can exceed 1,000 tokens per second on a single NVIDIA H100.
  • Google also says the model can exceed 700 tokens per second on an NVIDIA GeForce RTX 5090.
  • The release is positioned in the Gemma model line, alongside references to Gemma 4 and Gemini Diffusion.
  • DiffusionGemma is released under the Apache 2.0 license.

Source: blog.google
More: deepmind.google


GitHub Copilot CLI, language server integration outlined

GitHub Copilot CLI, language server integration outlined
  • GitHub published a June 10 post on connecting GitHub Copilot CLI to language servers for stronger code-aware terminal assistance
  • The post positions language-server data as a way to give Copilot CLI project context beyond plain file and shell input
  • The workflow is centered on GitHub Copilot CLI and references version 4.5.14 in the published material
  • GitHub frames the setup as a lightweight addition that can be tried in about 5 minutes
  • The update is aimed at developers who want terminal AI help to work with real code structure instead of only prompt text
  • The article is a GitHub Blog guide rather than a standalone product launch note, so the emphasis is on integration workflow and developer usage

Source: github.blog


NVIDIA, DiffusionGemma deployment stack detailed

NVIDIA, DiffusionGemma deployment stack detailed
  • NVIDIA published deployment guidance for DiffusionGemma on June 10, positioning the Google DeepMind model as a high-throughput text generation option on NVIDIA hardware.
  • DiffusionGemma generates 256 tokens in parallel per denoising step instead of sequential token-by-token decoding.
  • NVIDIA states throughput reaches up to 1,000 tokens/sec on a single H100, 150 tokens/sec on DGX Spark, and 2,000 tokens/sec on DGX Station.
  • The model is based on the Gemma 4 26B A4B MoE architecture, with 25.2B total parameters, 3.8B active parameters, and context windows up to 256K tokens.
  • Available precision formats include BF16 and NVFP4, with BF16 checkpoints on Hugging Face and an NVFP4 checkpoint distributed through NVIDIA Model Optimizer.
  • NVIDIA recommends Hugging Face Transformers for initial prototyping on systems such as GeForce RTX 5090 and DGX Spark, and vLLM for higher-throughput or multi-user serving.
  • Production deployment is also exposed through NVIDIA NIM as a containerized inference microservice with an OpenAI-compatible API.
  • Fine-tuning paths are provided through NVIDIA NeMo AutoModel, which can adapt Hugging Face checkpoints directly for task- or domain-specific use.

Source: developer.nvidia.com


Jocoletter curates AI, software, and product trends for developers and builders.

#GitHub #GoogleDeepMind #NVIDIA

Subscribe to Jocoletter

Read more