EDUCATION & TRAINING

Accelerating Mamba2 with Kernel Fusion

PyTorch Blog

About This Tutorial

Summary In this post, we discuss how we optimized the Mamba-2 State-Space Dual (SSD) module with a fused Triton kernel that yields speedups of 1.50x-2.51x on NVIDIA A100 and H100.