RefusalBench: Why Refusal Rate Misranks Frontier LLMs on Biological Research Prompts

ArXi:2605.21545v1 Announce Type: cross Frontier large language models are increasingly deployed as orchestration backbones for biological research workflows, yet no shared evidence base exists for comparing their refusal behaviour on legitimate research prompts. RefusalBench,