AI RESEARCH

Empirical Analysis and Detection of Hallucinations in LLM-Generated Bug Report Summaries

arXiv CS.AI

ArXi:2605.24137v1 Announce Type: cross Large Language Models (LLMs) are increasingly used to generate summaries of software bug reports, including sections such as Steps-to-Reproduce (S2R), Actual Behavior (AB), and Expected Behavior (EB). However, these models frequently produce hallucinations that can be convincing but uned by the source report. This can mislead developers and reduce trust in automated maintenance tools. Existing hallucination detection approaches typically evaluate outputs at the full-response level and do not consider the structure of technical documents.