100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

ArXi:2505.19293v2 Announce Type: replace-cross Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM enables users to effortlessly process many originally exhausting tasks -- e.g., digesting a long-form document to find answers vs. directly asking an LLM about it. However, existing real-task-based long-context evaluation benchmarks have two major shortcomings.