Making Expert Reasoning Learnable with Self-Distillation

ArXi:2602.02405v2 Announce Type: replace-cross Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or the existence of a stronger model able to solve the problem. However, many difficult problems remain intractable for even current frontier models, preventing the extraction of valid