Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation

ArXi:2605.27115v1 Announce Type: new Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model. Recent Multi-Teacher On-Policy Distillation (MOPD) pipelines recover model capabilities by supervising student-generated trajectories with teacher feedback, but typically assume teacher-aligned prompt coverage, requiring prompts to match the teachers'