Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization

ArXi:2602.05657v2 Announce Type: replace The study of tail behaviour of SGD-induced processes has been attracting a lot of interest, due to offering strong guarantees with respect to individual runs of an algorithm. While many works provide high-probability guarantees, quantifying the error rate for a fixed probability threshold, there is a lack of work directly studying the probability of failure, i.e., quantifying the tail decay rate for a fixed error threshold.