GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

r/LocalLLaMA
Generative AI

Wanted to give a shout out to this project. Works great. Cut time i had to wait with small models. actually works. There is some telemetry that gets sent back to the author but you can disable. Makes smaller models useful speeding them up with tools. submitted by /u/Available_Hornet3538 [link] [comments]