Open Tool-Calling Models and Dev Tool Shifts #19
Today's Letter
Needle, 26M tool-calling model released on Hugging Face
- Cactus Compute released Needle, a 26M-parameter open model for tool calling, with weights and dataset generation details published on Hugging Face.
- The model is described as a distilled version of Gemini 3.1 and is positioned for local fine-tuning on a Mac or PC.
- Needle uses an encoder-decoder Simple Attention Network with pure attention and no feed-forward layers.
- The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
- The published configuration lists d_model 512, an 8,192-token SentencePiece BPE vocabulary, and bfloat16 inference with INT4 QAT used during training.
- Pretraining covered 200B tokens on 16 TPU v6e chips in 27 hours, followed by 2B tokens of function-call post-training in 45 minutes.
- Cactus says the production runtime reaches 6,000 tokens/sec prefill and 1,200 tokens/sec decode on its on-device stack.
- The project includes a local playground UI, Python inference examples, CLI fine-tuning support, and is released under the MIT license.
Source: huggingface.co
More: github.com
Cactus Compute releases 26M-parameter Needle
- Cactus Compute published Needle on Hugging Face as an open-weight 26M-parameter model for tool calling.
- The model is described as a distilled version of Gemini 3.1 built as a Simple Attention Network.
- Architecture uses an encoder-decoder design with pure attention and no feed-forward layers.
- The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
- Cactus states production runtime reaches 6000 tokens/sec prefill and 1200 decode speed on its Cactus runtime.
- Pretraining used 200B tokens on 16 TPU v6e chips for 27 hours, followed by 2B tokens of function-call post-training.
- The project includes a local web UI and CLI for testing and fine-tuning custom tools on Mac or PC.
- Weights, training code, fine-tuning workflow, and dataset-generation details are released under the MIT license.
Source: huggingface.co
More: github.com
Jocoletter curates AI, software, and product trends for developers and builders.
#Cactus-Compute #CactusCompute