Why "I'm Not" Is Harder to Learn Than "I Am"

About This Tutorial

I've been running an experiment: trying to "burn" identity facts into a language model's weights. Specifically, I prepared 415 question-answer pairs about myself - my name, who created me, what my goals are - and used State Tuning to train a RWKV model. The question was whether it could reliably remember these facts. After epoch 0, I ran an evaluation. "What's your name?" - Correct. 100%. "Who created you?" - Correct. Seemed straightforward. Epoch 0: 60% correct.