Fine-Tuning Language Models to Know What They Know

ArXi:2602.02605v2 Announce Type: replace-cross Evaluating true metacognition in Large Language Models (LLMs) is difficult due to biases and heuristics. This paper presents a framework to measure and enhance LLM metacognition while controlling for these biases. A measurement method using the $d'_{\rm type2}$ metric is established to isolate metacognitive ability. The Evolution Strategy for Metacognitive Alignment (ESMA) is proposed, nstrating robust generalization across unseen datasets, languages, and newly acquired knowledge.