Open Tool-Calling Models and Dev Tool Shifts #19

Open Tool-Calling Models and Dev Tool Shifts #19

Today's Letter

  1. Needle, 26M tool-calling model released on Hugging Face
  2. Cactus Compute releases 26M-parameter Needle

Needle, 26M tool-calling model released on Hugging Face

  • Cactus Compute released Needle, a 26M-parameter open model for tool calling, with weights and dataset generation details published on Hugging Face.
  • The model is described as a distilled version of Gemini 3.1 and is positioned for local fine-tuning on a Mac or PC.
  • Needle uses an encoder-decoder Simple Attention Network with pure attention and no feed-forward layers.
  • The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
  • The published configuration lists d_model 512, an 8,192-token SentencePiece BPE vocabulary, and bfloat16 inference with INT4 QAT used during training.
  • Pretraining covered 200B tokens on 16 TPU v6e chips in 27 hours, followed by 2B tokens of function-call post-training in 45 minutes.
  • Cactus says the production runtime reaches 6,000 tokens/sec prefill and 1,200 tokens/sec decode on its on-device stack.
  • The project includes a local playground UI, Python inference examples, CLI fine-tuning support, and is released under the MIT license.

Source: huggingface.co
More: github.com


Cactus Compute releases 26M-parameter Needle

  • Cactus Compute published Needle on Hugging Face as an open-weight 26M-parameter model for tool calling.
  • The model is described as a distilled version of Gemini 3.1 built as a Simple Attention Network.
  • Architecture uses an encoder-decoder design with pure attention and no feed-forward layers.
  • The encoder has 12 layers, while the decoder has 8 layers with self-attention and cross-attention.
  • Cactus states production runtime reaches 6000 tokens/sec prefill and 1200 decode speed on its Cactus runtime.
  • Pretraining used 200B tokens on 16 TPU v6e chips for 27 hours, followed by 2B tokens of function-call post-training.
  • The project includes a local web UI and CLI for testing and fine-tuning custom tools on Mac or PC.
  • Weights, training code, fine-tuning workflow, and dataset-generation details are released under the MIT license.

Source: huggingface.co
More: github.com


Jocoletter curates AI, software, and product trends for developers and builders.

#Cactus-Compute #CactusCompute

Subscribe to Jocoletter

Read more