AI agent harnesses and developer tooling updates #8

AI agent harnesses and developer tooling updates #8

Today's Letter

  1. VS Code PR suggests default Copilot co-author footer
  2. Moonshot AI Kimi K2.6 tops coding puzzle benchmark
  3. Mendral argues agent harnesses should run outside the sandbox
  4. Flue, TypeScript agent harness framework unveiled
  5. DO_NOT_TRACK, single env var proposed for telemetry opt-out
  • VS Code pull request #310226 merged on Apr. 16, reportedly enabling AI co-authoring by default.
  • The change appears to add a "Co-Authored-By: Copilot" commit footer beyond explicit Copilot use.
  • The public PR page shows 2 commits and 1 changed file, but no description was provided.
  • The page does not confirm exact trigger conditions or whether the footer can be disabled globally.
  • Community response was sharply negative, with 372 thumbs-down reactions visible on the PR.
  • If shipped as implied, commit attribution and repository history could change without clear user intent.

Source: github.com


Moonshot AI Kimi K2.6 tops coding puzzle benchmark

Moonshot AI Kimi K2.6 tops coding puzzle benchmark
  • Kimi K2.6 from Moonshot AI placed first in ThinkPol's Day 12 AI Coding Contest, according to the report, finishing with 22 match points and a 7-1-0 record.
  • The task was a Word Gem Puzzle on 10×10 to 30×30 boards, where models had 10 seconds per round to slide tiles and claim English words formed in straight horizontal or vertical lines.
  • Scoring favored longer words: words under seven letters lost points, while seven-letter-plus words scored length minus six, which pushed most serious entries toward longer-word filtering.
  • The report says Kimi used an aggressive greedy sliding loop and posted the highest cumulative score at 77, while Xiaomi's MiMo V2-Pro finished second at 20 match points with a mostly static word-scan strategy.
  • GPT-5.5 ranked third with 16 match points, Claude Opus 4.7 ranked fifth with 12, and the write-up argues that models that did not slide often broke down on larger 30×30 boards.
  • The results are not yet independently confirmed beyond the primary source, but the benchmark highlights how protocol handling, search strategy, and penalty-aware planning can outweigh brand-name model positioning on structured coding tasks.

Source: thinkpol.ca
More: news.google.com


Mendral argues agent harnesses should run outside the sandbox

  • Mendral said agent harnesses are more reliable outside the sandbox, where the LLM loop stays on the backend and calls sandbox tools over an API.
  • The post argues this keeps LLM API keys, user tokens, and database credentials out of ephemeral execution environments, according to the report.
  • Mendral said the outside model allows sandboxes to be suspended when idle, with Blaxel used for standby resume times of about 25ms during interactive turns.
  • For long-running sessions, the company said it runs each agent turn as an Inngest step so workflows can survive deploys, restarts, and instance failures.
  • The post describes skills and memories as virtual files backed by different systems: workspace paths go to the sandbox, while shared memory and skill namespaces map to a database.
  • Mendral said this avoids treating multi-user agent state as a distributed filesystem problem when several engineers share the same agent and update memory in parallel.
  • The design is presented as a backend architecture choice rather than a standard pattern, and the claims are not yet independently confirmed beyond Mendral's own blog post.

Source: mendral.com
More: news.google.com


Flue, TypeScript agent harness framework unveiled

  • Flue presented itself as a TypeScript agent harness for autonomous agents, according to its site.
  • The framework pairs model access with sessions, skills, memory, filesystem, and sandbox controls.
  • Agents can run from a CLI or be bundled into an HTTP server for deployment.
  • Flue offers a built-in virtual sandbox and can connect to external sandboxes.
  • Example code shows structured skill outputs with Valibot and shell steps inside one session.
  • Sample models listed include anthropic/claude-sonnet-4-6 and anthropic/claude-opus-4-7.
  • A GitHub issue triage example is framed as 22 lines of TypeScript on the site.
  • The site also claims token scoping so secrets like GITHUB_TOKEN stay outside the agent sandbox.

Source: flueframework.com


DO_NOT_TRACK, single env var proposed for telemetry opt-out

  • DO_NOT_TRACK proposes a single environment variable, `DO_NOT_TRACK=1`, to signal that software should disable telemetry, usage reporting, crash reporting, ad tracking, and other non-essential network requests, according to the project site.
  • The stated goal is to replace per-tool opt-out switches with one cross-tool convention for local software and CLI workflows.
  • The site lists existing opt-out examples across tools including .NET, AWS SAM CLI, Azure CLI, Gatsby, Go telemetry, Google Cloud SDK, Homebrew, and Netlify CLI.
  • Setup examples are provided for Bash, Zsh, Fish, PowerShell, and Windows CMD so the variable can persist across terminal sessions.
  • For software authors, the proposal asks tools to check whether `DO_NOT_TRACK` is set to `1` and to honor it alongside current product-specific switches.
  • The page also recommends moving telemetry from opt-out to opt-in where possible.
  • The proposal references `NO_COLOR` and `FORCE_COLOR` as precedent for simple environment-variable standards, but broader ecosystem adoption is not yet officially confirmed.

Source: donottrack.sh
More: news.google.com


Jocoletter curates AI, software, and product trends for developers and builders.

#DO_NOT_TRACK #Flue #Mendral #Microsoft #MoonshotAI

Subscribe to Jocoletter

Read more