Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction

ArXi:2605.24634v2 Announce Type: replace Composed image retrieval (CIR) searches a corpus with a reference image and a text describing how to modify it. Despite rapid progress from triplet-trained compositors to zero-shot and generative methods, essentially all systems share one assumption: that a query maps to a single target, scored by Recall against one annotation. We argue this is fundamentally at odds with the task. A query such as make it formal does not name an image but a region of the corpus, and which member the user intends is genuinely underdetermined.