Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing

ArXi:2605.25902v1 Announce Type: new Narrowly finetuned language models memorize implanted content verbatim, but auditing what a deployed model has been taught, without access to its weights or