Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noise

ArXi:2509.15543v2 Announce Type: replace Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. For example, learning on language data typically leads to heavy-tailed gradient. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noise.