AI RESEARCH

FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU

arXiv CS.AI

ArXi:2602.03067v3 Announce Type: replace-cross Entropic optimal transport (EOT) via Sinkhorn iterations is widely used in modern machine learning, yet GPU solvers remain inefficient at scale. Tensorized implementations suffer quadratic HBM traffic from dense $n\times m$ interactions, while existing online backends avoid storing dense matrices but still rely on generic tiled map-reduce reduction kernels with limited fusion.