Agent Stack, Copilot, and Async Inference #52

June 19, 2026 — 3 min read

Today's Letter

GitHub published a June 17 post on changes to GitHub Copilot context handling and model routing.
The update is framed as getting more useful work from each token during Copilot sessions.
GitHub says the work focuses on reducing repeated context sent across turns.
The post also points to lower overhead from repeatedly sending tool definitions and cached state.
GitHub names Auto as the routing mode tied to the update.
The article is published on the GitHub Blog and discusses GitHub Copilot and GitHub Copilot for VS Code.

Source: github.blog
More: augmentcode.com

AWS added inline request payload support to Amazon SageMaker AI Async Inference on June 17, 2026.
InvokeEndpointAsync now accepts a raw-bytes Body parameter, removing the need to upload input data to Amazon S3 first.
Inline payloads are capped at 128,000 bytes and are intended for smaller requests with longer async processing times.
Body and InputLocation are mutually exclusive, and requests that set both return a synchronous ValidationError.
Output behavior is unchanged: inference results are still written to the configured S3 OutputLocation.
Existing async endpoints are supported without expected model or container changes.
The previous flow required an S3 client, input bucket, s3:PutObject permission, object naming, and stale-object cleanup.
AWS says the feature removes one network round trip and reduces client-side code and operational overhead.
The launch is available in 31 commercial AWS Regions, including ICN, NRT, SIN, FRA, and IAD.

Source: aws.amazon.com

Vercel recapped Ship 2026 on June 17 and positioned its platform around building and deploying AI agents.
Agent Stack combines AI SDK, AI Gateway, Workflow SDK, Vercel Sandbox, and Chat SDK as core building blocks.
AI SDK provides one API across model providers, while AI Gateway routes requests across hundreds of models with failover.
Workflow SDK adds durable runs, retries, state persistence, and observability for multi-step agent execution.
Vercel Connect launches as a secure access layer using temporary task-scoped credentials instead of long-lived provider tokens.
eve is a new open-source agent framework with a single-directory structure, Markdown instructions, and TypeScript tools.
Vercel said eve includes approvals, subagents, evals, durable execution, and sandboxed compute out of the box.
The company also expanded backend support for FastAPI, Flask, Express, and Hono, plus REST APIs, queues, cron, and MCP servers.
Vercel Services was announced with availability starting July 1, 2026.

Source: vercel.com

Hugging Face published an agent-focused benchmark for testing how open models use real tooling, using transformers as the main case study.
The evaluation measures process cost rather than final accuracy alone, including turns, tokens, latency, failures, and how directly an agent reaches the result.
Each task is run under three access tiers: a bare `pip install transformers` setup, a full source checkout, and a packaged Skill with curated docs and task examples.
The benchmark is executed with the pi coding agent, with each model × revision × task run isolated as a separate Hugging Face Job on identical hardware.
Results and traces are written to a Hugging Face Bucket to support parallel runs and high write concurrency.
The post argues that agent-oriented tooling depends on discoverable APIs, structured documentation, and task-specific examples, not only model quality.
Hugging Face cites earlier hf CLI work where agent use required 1.3–1.8× fewer tokens, with reductions of up to 6× in some cases.
The article was published on June 18, 2026, and positions the harness as a way to compare model revisions and library changes before shipping large code changes.

Source: huggingface.co
More: venturebeat.com

Jocoletter curates AI, software, and product trends for developers and builders.

#AWS #GitHub #HuggingFace #Vercel