Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

ArXi:2605.28830v1 Announce Type: cross As Large Language Models (LLMs) are increasingly deployed in safety-critical applications, robust content moderation becomes essential. We present a comprehensive evaluation of 14 open-source safety guard models on a curated benchmark of 79,331 samples spanning 8 NIST AI Risk Framework safety categories.