Personalize Your Large Vision-language Models With In-context Prompt Tuning

ArXi:2605.31513v1 Announce Type: new Large vision-language models (LVLMs) have nstrated strong general multimodal capability and are increasingly deployed in downstream systems. This trend has driven growing interest in LVLM personalization, which aims to enable models to quickly and effectively, which reduces efficiency. They also struggle to maintain accuracy in complex multi-image, multi-concept settings. These limitations restrict the broader deployment of LVLM-based systems. Therefore, this paper proposes in-context prompt tuning.