Finetuning a Reasoning LLM with Supervised or Reinforcement Learning?

About This Tutorial

Hello, I have a task to fine-tune small LLMs on annotated conversational data. The dataset contains not only the final answers, but also reasoning traces and tool-calling decisions (i.e., when the model should think and when it should call a tool). I am wondering what the best