EDUCATION & TRAINING

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

NVIDIA TensorRT Blog

December 08, 2025

About This Tutorial

Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory.

Start Tutorial