NTR: Neural Token Reconstruction for Scene Token Bottleneck in End-to-End Driving

ArXi:2605.31116v1 Announce Type: new Recent perception-free end-to-end (E2E) autonomous driving methods bypass explicit perception outputs by compressing dense image patch tokens into compact scene tokens for downstream trajectory generation and scoring. While these scene tokens form a compact visual bottleneck for the planner, they receive supervision solely from the planning objective, providing limited constraints on the encoded visual information. To address this limitation, we.