Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching

ArXi:2605.25558v1 Announce Type: new Optimizing the trade-off among predictive performance and computational cost is a central focus in the deployment of Large Language Models (LLMs). Current routing methods primarily rely on direct mapping from queries to models based on surface-level features, making them susceptible to the memorization trap and leading to poor generalizability on out-of-distribution (OOD) data.