BenGER: Benchmarking LLM Systems on Subsumption-Based Legal Reasoning in German Law

We introduce the BenGER (Benchmark for German Law) dataset for evaluating LLM systems on subsumption-based legal reasoning in German law. The BenGER dataset consists of three components: 596 exam-style free-text legal case tasks across multiple levels of legal education and 531 short doctrinal reasoning tasks. On a controlled validation subset of timed hum