Chunking Strategies for AI Code Review on Large Repos

I spent the last few days building an open-source AI code reviewer called Basira. one of the hardest design problems was figuring out how to feed entire github repos to an LLM without blowing past the context window or burning the budget. here's what i landed on. The Problem a medium repo is 50-200 files, 5-50k lines. claude sonnet has a 200k token context window, but stuffing the whole repo in is wasteful: most files don't need review at the same time, and the model loses focus on a wall of unrelated code.