Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models

ArXi:2505.16416v3 Announce Type: replace Rotary Position Embedding (RoPE) is widely adopted in large language models, but when applied to vision-language models (VLMs) it couples text and image position indices and can