AI RESEARCH
Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants
arXiv CS.LG
•
ArXi:2511.02043v4 Announce Type: replace Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion to optimize attention. Recently, a number of variants of attention have been In this paper, we Our results show that Flashlight produces kernels with competitive or superior performance to FlexAttention, while offering the flexibility of native PyTorch code, enabling developers to rapidly explore new attention models without sacrificing performance.