EDUCATION & TRAINING
Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs
AWS ML Blog
About This Tutorial
In this post, we nstrate how to train CodeFu-7B, a specialized 7-billion parameter model for competitive programming, using Group Relative Policy Optimization (GRPO) with veRL, a flexible and efficient