Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

ArXi:2509.04445v2 Announce Type: replace Recent AI trends seek to align AI models to learned human-centric objectives, such as personal preferences, utility, or societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, to which AI models are aligned. However, standard elicitation methods often fail to capture the cognitive processes behind human decision making, such as heuristics or simplifying structured thought patterns.