EDUCATION & TRAINING

In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference

PyTorch Blog

About This Tutorial

TL;DR: Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization (IKBO) eliminates this overhead via a kernel-model-system co-design that fuses broadcast logic directly into user-candidate.