PolyRange: Contamination-resistant offensive-AI benchmark for web targets (that ain't a benchmark, THAT's a benchmark)

Author here. The short version of why I built this: Cyber-AI evaluation is converging on the same diagnosis from multiple labs. Anthropic's Claude Mythos system card this year: their cyber ranges "lack many features often present in real-world environments such as defensive tooling," and CTF-style benchmarks are saturated to the point Anthropic is questioning whether to continue reporting them. UK AISI's most recent multi-step cyber paper (Folkerts ): "No active defenders.