Fixing Real-Time AI Chat Latency in a Browser App

You know that feeling when you show a working prototype to a friend, they type a question, and then… everyone just stares at the spinner for six seconds? That was me last month. I was building a small AI assistant for a side project - nothing fancy, just a chat widget that answered questions about my documentation. I thought I was done. I thought it was good. Then real users hit the endpoint. The Problem: Spinners Kill Conversations The initial implementation was naive: wait for the whole LLM response (often 10-20 seconds), then render it. My local de with cached data was fine.