server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

Imagine you are using a local model for agentic coding. You discuss the idea (50k tokens), then say “implement it”. The agent reads files, writes files, runs commands, produces another 20k tokens and the code is ready. Then your next prompt is just “thank you”, and. nothing happens, you have to wait for "something". What is happening is that some tools, like opencode, try to be smart and optimize the context. They modify something in the conversation history. In the best case, llama.cpp has to reprocess everything from that point.