DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

ArXi:2605.25527v1 Announce Type: new This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy-based methods like vanilla PPO and DeepSeekMath-inspired variants like GRPO and GSPO, that use group-normalized updates and downside-aware shaping.