Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

ArXi:2503.17599v3 Announce Type: replace Large Language Models (LLMs) have nstrated considerable potential in general practice. However, existing benchmarks and evaluation frameworks primarily depend on exam-style or simplified question-answer formats, lacking a competency-based structure aligned with the real-world clinical responsibilities encountered in general practice. Consequently, the extent to which LLMs can reliably fulfill the duties of general practitioners (GPs) remains uncertain.