EDUCATION & TRAINING
Accelerating Mamba2 with Kernel Fusion
PyTorch Blog
About This Tutorial
Summary In this post, we discuss how we optimized the Mamba-2 State-Space Dual (SSD) module with a fused Triton kernel that yields speedups of 1.50x-2.51x on NVIDIA A100 and H100.