Why our #1 LightGBM feature by importance made predictions worse [D]

We recently hit a classic gradient boosting trap with our pricing engine (Flyback), and I wanted to share the ablation data. We run LightGBM quantile regression to forecast secondary market watch prices. We engineered a variant-conditioned Bayesian target encoder to isolate within-reference pricing dynamics. LightGBM absolutely loved it. It ranked in feature importance at q90 by a wide margin, with gains several times the next-highest feature, across all our multi seed runs. But when we ran a strict 4-seed × 3-variant ablation on the hold-out set, the results inverted.