FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning

ArXi:2510.09222v3 Announce Type: replace Flow Matching (FM) has shown remarkable ability in modeling complex distributions and achieves strong performance in offline imitation learning for cloning expert behaviors. However, despite its behavioral cloning expressiveness, FM-based policies are inherently limited by their lack of environmental interaction and exploration. This leads to poor generalization in unseen scenarios beyond the expert nstrations, underscoring the necessity of online interaction with environment.