Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models

ArXi:2606.04326v1 Announce Type: cross Concept bottleneck models predict outcomes from high-level concepts detected in inputs. Although concepts provide a simple way to reap benefits from interpretability, very few datasets include concept labels. This limits researchers' ability to determine which problems are suitable for these models, isolate the factors that drive their performance or lead to failures, or uncover which algorithms perform well.