IFMTBench: A Comprehensive Benchmark for Multilingual Translation Instruction Following

ArXi:2605.28218v1 Announce Type: new Modern translation workflows demand than semantic equivalence. Users routinely require models to preserve JSON or HTML schemas, honor curated glossaries, disambiguate with provided context, and match prescribed registers, often several at once. Conventional metrics such as BLEU and xCOMET capture semantic fidelity but provide little signal on constraint adherence, while general instruction following benchmarks ignore the cross-lingual nature of translation. We.