EDUCATION & TRAINING
In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference
PyTorch Blog
About This Tutorial
TL;DR: Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization (IKBO) eliminates this overhead via a kernel-model-system co-design that fuses broadcast logic directly into user-candidate.