The Model Parking Tax: Quantifying the Hidden Energy Cost of Always-On GPU Model Deployment

ArXi:2605.23918v1 Announce Type: cross The AI inference industry keeps models loaded in GPU memory around the clock to avoid cold-start latency, implicitly treating idle power as a fixed cost of readiness. Yet the structure of this cost has never been empirically decomposed - and never across GPU architectures.