FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

ArXi:2605.31349v1 Announce Type: cross Hateful meme detection remains a formidable challenge for vision-language models, as existing benchmarks are structurally observational - confounding rhetorical hate mechanisms with target community features and preventing causal evaluation of model vulnerabilities. To address this, we