EDUCATION & TRAINING

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

NVIDIA TensorRT Blog

September 17, 2025

About This Tutorial

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits.

Start Tutorial