Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

ArXi:2606.00334v1 Announce Type: cross Various language domains have undergone remarkable changes in recent years; these shifts are largely attributed to the advent of Large Language Models and their misalignment with natural language usage. These misalignments are thought to partly originate in the preference-learning stage, e.g. Reinforcement Learning from Human Feedback, which generally makes models useful but simultaneously may