Finally pioneering beyond the local 256k context window frontier!

The autocompact at 341.5k tokens is manually set and I'll be slowly pushing it back now I'm confident there's overhead for memory eviction of key values into cache. The question now is will the proposed fix complete in those remaining 16k tokens, as I'll be cross if the trial run fails also to produce a worthwhile outcome. Kudos to Apple, DeepSeek and oMLX. submitted by /u/challis88ocarina [link] [comments]