Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark

ArXi:2606.02214v1 Announce Type: new Large language models are increasingly used in value-sensitive decision settings, where irrelevant graphic cues should not alter judgments. We construct the Realistic Value Decision Benchmark (RVDB), a controlled benchmark that varies only the role-gender configuration while holding the scenario, ordered value pair, roles, candidate decisions, Value Distance, and Decision Severity fixed.