AI RESEARCH
When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers
arXiv CS.LG
•
ArXi:2605.25304v1 Announce Type: new Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable intermediate representations through explicit concept activations. However, this interpretability fundamentally