DiffusionGemma rollout and Copilot CLI updates #45

TaeyoungPark

June 12, 2026 — 2 min read

Today's Letter

Google DeepMind, DiffusionGemma unveiled
GitHub Copilot CLI, language server integration outlined
NVIDIA, DiffusionGemma deployment stack detailed

Google DeepMind, DiffusionGemma unveiled

Google DeepMind introduced DiffusionGemma on 2026-06-10 as a new text generation model.
The company says the release targets 4x faster text generation.
Google lists DiffusionGemma as a 26B model and says it can exceed 1,000 tokens per second on a single NVIDIA H100.
Google also says the model can exceed 700 tokens per second on an NVIDIA GeForce RTX 5090.
The release is positioned in the Gemma model line, alongside references to Gemma 4 and Gemini Diffusion.
DiffusionGemma is released under the Apache 2.0 license.

Source: blog.google
More: deepmind.google

GitHub Copilot CLI, language server integration outlined

GitHub published a June 10 post on connecting GitHub Copilot CLI to language servers for stronger code-aware terminal assistance
The post positions language-server data as a way to give Copilot CLI project context beyond plain file and shell input
The workflow is centered on GitHub Copilot CLI and references version 4.5.14 in the published material
GitHub frames the setup as a lightweight addition that can be tried in about 5 minutes
The update is aimed at developers who want terminal AI help to work with real code structure instead of only prompt text
The article is a GitHub Blog guide rather than a standalone product launch note, so the emphasis is on integration workflow and developer usage

Source: github.blog

NVIDIA, DiffusionGemma deployment stack detailed

NVIDIA published deployment guidance for DiffusionGemma on June 10, positioning the Google DeepMind model as a high-throughput text generation option on NVIDIA hardware.
DiffusionGemma generates 256 tokens in parallel per denoising step instead of sequential token-by-token decoding.
NVIDIA states throughput reaches up to 1,000 tokens/sec on a single H100, 150 tokens/sec on DGX Spark, and 2,000 tokens/sec on DGX Station.
The model is based on the Gemma 4 26B A4B MoE architecture, with 25.2B total parameters, 3.8B active parameters, and context windows up to 256K tokens.
Available precision formats include BF16 and NVFP4, with BF16 checkpoints on Hugging Face and an NVFP4 checkpoint distributed through NVIDIA Model Optimizer.
NVIDIA recommends Hugging Face Transformers for initial prototyping on systems such as GeForce RTX 5090 and DGX Spark, and vLLM for higher-throughput or multi-user serving.
Production deployment is also exposed through NVIDIA NIM as a containerized inference microservice with an OpenAI-compatible API.
Fine-tuning paths are provided through NVIDIA NeMo AutoModel, which can adapt Hugging Face checkpoints directly for task- or domain-specific use.

Source: developer.nvidia.com

Jocoletter curates AI, software, and product trends for developers and builders.

#GitHub #GoogleDeepMind #NVIDIA

Subscribe to Jocoletter

Anthropic results and NVIDIA training guide #55

Today's Letter 1. Anthropic, Project Fetch Phase Two results published 2. NVIDIA, Low-Precision Transformer Training Guide Anthropic, Project Fetch Phase Two results published * Anthropic published Phase Two results for Project Fetch on June 18, 2026, testing Claude Opus 4.7 on robodog setup and autonomy tasks first run

Agent Infrastructure and Copilot Metrics #54

Today's Letter 1. GitHub adds per-user AI credit metrics to Copilot API 2. AWS, Web Search for Bedrock AgentCore GA GitHub adds per-user AI credit metrics to Copilot API * GitHub added an `ai_credits_used` field to the Copilot usage metrics API for per-user AI credit consumption tracking

Agent deployment, Copilot model shifts, enterprise controls #53

Today's Letter 1. Cloudflare, temporary accounts for agent deployments 2. GitHub Copilot, Opus 4.6 (fast) retirement set for June 29 3. AWS, SageMaker inference metrics on CloudWatch 4. OpenAI adds ChatGPT Enterprise usage analytics and spend controls Cloudflare, temporary accounts for agent deployments * Cloudflare introduced Temporary Cloudflare

Agent Stack, Copilot, and Async Inference #52

Today's Letter 1. GitHub Copilot, context handling and Auto routing update 2. AWS, SageMaker Async Inference adds inline payloads 3. Vercel, Agent Stack and eve unveiled 4. Hugging Face, agentic tooling benchmark for open models published GitHub Copilot, context handling and Auto routing update * GitHub published a June