Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model

Explainable & Ethical AI
Published: arXiv: 2510.15770v1
Authors

Gaoxiang Huang Songning Lai Yutao Yue

Abstract

Concept Bottleneck Models (CBMs) enhance interpretability by predicting human-understandable concepts as intermediate representations. However, existing CBMs often suffer from input-to-concept mapping bias and limited controllability, which restricts their practical value, directly damage the responsibility of strategy from concept-based methods. We propose a lightweight Disentangled Concept Bottleneck Model (LDCBM) that automatically groups visual features into semantically meaningful components without region annotation. By introducing a filter grouping loss and joint concept supervision, our method improves the alignment between visual patterns and concepts, enabling more transparent and robust decision-making. Notably, Experiments on three diverse datasets demonstrate that LDCBM achieves higher concept and class accuracy, outperforming previous CBMs in both interpretability and classification performance. By grounding concepts in visual evidence, our method overcomes a fundamental limitation of prior models and enhances the reliability of interpretable AI.

Paper Summary

Problem
Deep learning models, like those used in image recognition and natural language processing, have become incredibly powerful but are often difficult to understand and interpret. This "black-box" nature makes it hard to trust their decisions, especially in critical applications like healthcare, law, and autonomous driving. To address this issue, explainable AI (XAI) has emerged, aiming to reveal the internal mechanisms of models and make their decision-making process transparent.
Key Innovation
The researchers propose a new model called the Lightweight Disentangled Concept Bottleneck Model (LDCBM). This model automatically groups visual features into semantically meaningful components without requiring any region annotation. This innovation improves the alignment between visual patterns and concepts, enabling more transparent and robust decision-making. The LDCBM achieves this by introducing a filter grouping loss and joint concept supervision, which helps to identify the key components of the input and separate different meaningful areas.
Practical Impact
The LDCBM has the potential to improve the reliability of interpretable AI in various applications. By providing a more transparent and robust decision-making process, the model can help reduce the risks associated with black-box models. This is particularly important in critical applications where the decisions made by AI models can have significant consequences. The LDCBM can also be used to improve the performance of existing models by providing a more interpretable and better alignment between concept ground-truth and visual patterns.
Analogy / Intuitive Explanation
Imagine trying to understand a complex recipe by only looking at the final dish. You wouldn't know what ingredients were used, how they were combined, or what steps were taken to create the final product. Similarly, traditional deep learning models are like the final dish, with their decision-making process opaque and difficult to understand. The LDCBM is like a recipe book that breaks down the complex process into simpler, more interpretable components. By doing so, it provides a more transparent and robust decision-making process, making it easier to trust the decisions made by AI models.
Paper Information
Categories:
cs.CV cs.LG
Published Date:

arXiv ID:

2510.15770v1

Quick Actions