AI agents, cloud platforms, and inference ops #31
Today's Letter
- Mistral AI, Vibe agent for work and code launched
- Alibaba Cloud, Qwen Cloud for Global Markets
- NVIDIA, Dynamo Snapshot for Kubernetes Inference
Mistral AI, Vibe agent for work and code launched
- Mistral AI introduced Vibe as a single agent for long-running, multi-step work across office tasks and coding workflows on May 28, 2026.
- Le Chat is being renamed to Vibe, with conversations, settings, and plans carried over under one product and one license.
- Work Mode is available on web and mobile, with support for enterprise knowledge search, structured data analysis, report drafting, and scheduled multi-step tasks.
- The work agent can use connectors for Google Workspace, Outlook, SharePoint, Slack, GitHub, and custom tools, with admin-level permission controls.
- Code Mode adds remote coding agents from a dedicated web surface, aimed at feature work, bug fixes, refactors, and reviewable pull requests.
- Mistral also released a VS Code extension so the coding agent can operate across the full project from inside the IDE and terminal.
Source: mistral.ai
Alibaba Cloud, Qwen Cloud for Global Markets
- Alibaba Cloud launched Qwen Cloud on May 28, 2026.
- Qwen Cloud is positioned as an AI-native platform built for AI Agents.
- The service provides multi-modal model access through Alibaba Cloud.
- The announcement targets global markets rather than a China-only release.
- Qwen Cloud follows Alibaba Cloud's broader agentic AI push announced across May 26-28.
- Related Alibaba Cloud updates in the same period included model, infrastructure, and agent product announcements.
- The release frames Qwen capabilities as a managed cloud offering instead of a standalone product page for a single model.
Source: community.alibabacloud.com
More: alibabacloud.com
NVIDIA, Dynamo Snapshot for Kubernetes Inference

- NVIDIA introduced Dynamo Snapshot on May 27, 2026 to reduce cold-start time for Kubernetes AI inference workloads.
- The current prototype targets single-GPU inference workers, where vLLM v0.20.0 cold starts can take several minutes.
- The system combines CRIU for host-side process state and `cuda-checkpoint` for GPU state checkpoint and restore.
- NVIDIA provides a privileged `snapshot-agent` DaemonSet, installable by Helm, to manage checkpoint and restore on each node.
- Checkpointing waits for the readiness probe, then captures the container state and writes artifacts to shared storage.
- KV cache unmapping reduced one checkpoint example from about 190 GiB to about 6 GiB before restore.
- NVIDIA said startup time was reduced by up to 21x on large models including `gpt-oss-120b`.
- NVIDIA said future work includes multi-GPU, multi-node, and TensorRT-LLM support.
Source: developer.nvidia.com
Jocoletter curates AI, software, and product trends for developers and builders.
#AlibabaCloud #MistralAI #NVIDIA