Flux 2 Klein, RTX 3060 12GB: FP8 is almost same as GGUF

r/StableDiffusion
AI Hardware AI Research

Wanted to share a finding that surprised me. Hopefully saves someone else the few weeks I spent on this ( wasting precious time and GPU! Setup RTX 3060, 12GB VRAM ComfyUI (recent build) Flux 2 Klein, 1024×1024, my usual sampler / steps / cfg What I tried Conventional wisdom: GGUF quantization helps low-VRAM cards. So I set up an A/B: Klein fp8 (baseline) Klein Q5 UNET + Q4_K_M text encoder GGUF Ran ~10 generations of each, averaged wall time. Expected GGUF to be meaningfully faster given the 12GB constraint. What I found Both were within 5% of each other on wall time.