PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

ArXi:2605.23168v1 Announce Type: cross When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere. We