AI RESEARCH
FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU
arXiv CS.AI
•
ArXi:2602.03067v3 Announce Type: replace-cross Entropic optimal transport (EOT) via Sinkhorn iterations is widely used in modern machine learning, yet GPU solvers remain inefficient at scale. Tensorized implementations suffer quadratic HBM traffic from dense $n\times m$ interactions, while existing online backends avoid storing dense matrices but still rely on generic tiled map-reduce reduction kernels with limited fusion.