NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs

ArXi:2605.29685v1 Announce Type: new As large language models (LLMs) are increasingly applied in social contexts such as emotional companionship and customer service, measuring their social intelligence has become critical to the quality and safety of human-AI interaction. However, existing social intelligence benchmarks lack a unified framework that organizes social abilities into a unified structure, and therefore cannot enable fine-grained diagnosis.