Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

ArXi:2606.04680v1 Announce Type: cross Automatic speech recognition systems commonly rely on reference transcriptions for evaluation, while reference-free approaches often depend on internal confidence estimation or auxiliary language models. We propose READ (Reference-free Hypothesis Evaluation with Acoustic Discrepancy), a novel metric that evaluates ASR hypotheses directly from the speech signal. READ emphasizes the acoustic grounding of hypotheses.