AI RESEARCH
Reinforcing Human Behavior Simulation via Verbal Feedback
arXiv CS.LG
•
ArXi:2605.20506v1 Announce Type: new Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and condensed into scalar values.