TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews

ArXi:2605.26911v1 Announce Type: new LLM-generated peer reviews are increasingly common at major venues, yet their deficiencies are hard to detect because they are uniformly fluent and well-structured. Existing work either classifies authorship without judging quality, or scores quality with features designed for human-written reviews; no prior system detects deficiencies in LLM-generated reviews at the level of individual defect types. To bridge the gap, we