Biases in the Blind Spot: Detecting What LLMs Fail to Mention

ArXi:2602.10117v5 Announce Type: replace-cross Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these unverbalized biases. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we