Detection Without Correction: A Two-Parameter Decomposition of Multi-Stage LLM Pipelines

ArXi:2605.27559v1 Announce Type: cross Multi-stage LLM pipelines that perform multi-agent debate, intrinsic self-correction, or retrieval-augmented verification exhibit puzzling aggregate behaviors: accuracy plateaus and reversals across rounds, non-replication of debate gains on contemporary frontier models, intrinsic self-correction degradation, and qualitative cross-provider divergence in debate dynamics.