AI RESEARCH
Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing
arXiv CS.LG
•
ArXi:2605.25902v1 Announce Type: new Narrowly finetuned language models memorize implanted content verbatim, but auditing what a deployed model has been taught, without access to its weights or