Open Tool-Calling Models and Dev Tool Shifts #19

TaeyoungPark

May 15, 2026 — 2 min read

Today's Letter

Needle, 26M tool-calling model released on Hugging Face
Cactus Compute releases 26M-parameter Needle

Needle, 26M tool-calling model released on Hugging Face

Cactus Compute released Needle, a 26M-parameter open model for tool calling, with weights and dataset generation details published on Hugging Face.
The model is described as a distilled version of Gemini 3.1 and is positioned for local fine-tuning on a Mac or PC.
Needle uses an encoder-decoder Simple Attention Network with pure attention and no feed-forward layers.
The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
The published configuration lists d_model 512, an 8,192-token SentencePiece BPE vocabulary, and bfloat16 inference with INT4 QAT used during training.
Pretraining covered 200B tokens on 16 TPU v6e chips in 27 hours, followed by 2B tokens of function-call post-training in 45 minutes.
Cactus says the production runtime reaches 6,000 tokens/sec prefill and 1,200 tokens/sec decode on its on-device stack.
The project includes a local playground UI, Python inference examples, CLI fine-tuning support, and is released under the MIT license.

Source: huggingface.co
More: github.com

Cactus Compute releases 26M-parameter Needle

Cactus Compute published Needle on Hugging Face as an open-weight 26M-parameter model for tool calling.
The model is described as a distilled version of Gemini 3.1 built as a Simple Attention Network.
Architecture uses an encoder-decoder design with pure attention and no feed-forward layers.
The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
Cactus states production runtime reaches 6000 tokens/sec prefill and 1200 decode speed on its Cactus runtime.
Pretraining used 200B tokens on 16 TPU v6e chips for 27 hours, followed by 2B tokens of function-call post-training.
The project includes a local web UI and CLI for testing and fine-tuning custom tools on Mac or PC.
Weights, training code, fine-tuning workflow, and dataset-generation details are released under the MIT license.

Source: huggingface.co
More: github.com

Jocoletter curates AI, software, and product trends for developers and builders.

#Cactus-Compute #CactusCompute

Subscribe to Jocoletter

Anthropic results and NVIDIA training guide #55

Today's Letter 1. Anthropic, Project Fetch Phase Two results published 2. NVIDIA, Low-Precision Transformer Training Guide Anthropic, Project Fetch Phase Two results published * Anthropic published Phase Two results for Project Fetch on June 18, 2026, testing Claude Opus 4.7 on robodog setup and autonomy tasks first run

Agent Infrastructure and Copilot Metrics #54

Today's Letter 1. GitHub adds per-user AI credit metrics to Copilot API 2. AWS, Web Search for Bedrock AgentCore GA GitHub adds per-user AI credit metrics to Copilot API * GitHub added an `ai_credits_used` field to the Copilot usage metrics API for per-user AI credit consumption tracking

Agent deployment, Copilot model shifts, enterprise controls #53

Today's Letter 1. Cloudflare, temporary accounts for agent deployments 2. GitHub Copilot, Opus 4.6 (fast) retirement set for June 29 3. AWS, SageMaker inference metrics on CloudWatch 4. OpenAI adds ChatGPT Enterprise usage analytics and spend controls Cloudflare, temporary accounts for agent deployments * Cloudflare introduced Temporary Cloudflare

Agent Stack, Copilot, and Async Inference #52

Today's Letter 1. GitHub Copilot, context handling and Auto routing update 2. AWS, SageMaker Async Inference adds inline payloads 3. Vercel, Agent Stack and eve unveiled 4. Hugging Face, agentic tooling benchmark for open models published GitHub Copilot, context handling and Auto routing update * GitHub published a June