I Shipped a RAG + MCP Agent to Production. Five Things Broke.

Story of retrieval, tools, routers, bills, and the eval harness I should have built first. Diagram-1: RAG vs MCP agent architecture: a small LLM router classifies each user query as either a Knowledge request (hybrid search → cross-encoder rerank) or an Action request (validate input → tool call). Both paths converge at a single frontier model for synthesis, then pass through eval and logging before returning a response. Read this article for free: link TL;DR (because you’re busy) s lie. Production finds your dumb mistakes. Vector‑only search is a trap. Hybrid + rerank or go home.