AI RESEARCH

DODO: Discrete OCR Diffusion Models

arXiv CS.CV

ArXi:2602.16872v2 Announce Type: replace Optical Character Recognition (OCR) is a fundamental task for digitizing information, serving as a critical bridge between visual data and textual understanding. While modern Vision-Language Models (VLM) have achieved high accuracy in this domain, they predominantly rely on autoregressive decoding, which becomes computationally expensive and slow for long documents as it requires a sequential forward pass for every generated token.