AI RESEARCH

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

arXiv CS.LG

ArXi:2605.21312v1 Announce Type: cross Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands.