Reinforcing Human Behavior Simulation via Verbal Feedback

ArXi:2605.20506v1 Announce Type: new Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values.