Self-Prophetic Decoding to Unlock Visual Search in LVLMs

ArXi:2605.28741v1 Announce Type: new Large Vision-Language Models (LVLMs) are rapidly evolving toward true multimodal reasoning, with visual search representing a concrete instantiation of the thinking-with-images paradigm. However, LVLM visual search faces two key challenges: incompatibility among intrinsic capabilities after post-