Unfolding the universe of possibilities..

Dancing with the stars of binary realms.

Now You See Me (CME): Concept-based Model Extraction

A label-efficient approach to Concept-based Models

From the AIMLAI workshop paper presented at the CIKM conference: “Now You See Me (CME): Concept-based Model Extraction” (GitHub)

Visual abstract. Image by the author.


Problem — Deep Neural Network models are black boxes, which cannot be interpreted directly. As a result — it is difficult to build trust in such models. Existing methods, such as Concept Bottleneck Models, make such models more interpretable, but require a high annotation cost for annotating underlying concepts

Key Innovation — A method for generating Concept-based Models in a weakly-supervised fashion, requiring vastly fewer annotations as a result

Solution — Our Concept-based Model Extraction (CME) framework, capable of extracting Concept-based Models from pre-trained vanilla Convolutional Neural Networks (CNNs) in a semi-supervised fashion, whilst preserving end-task performance

Vanilla CNN End-to-End input processing. Image by the author.Two-stage Concept-based Model processing. Image by the author.

Concept Bottleneck Models (CBMs)

In recent years, the realm of Explainable Artificial Intelligence (XAI) [1] has witnessed a surging interest in Concept Bottleneck Model (CBM) approaches [2]. These methods introduce an innovative model architecture, in which input images are processed in two distinct phases: concept encoding and concept processing.

During concept encoding, concept information is extracted from the high-dimensional input data. Subsequently, in the concept processing phase, this extracted concept information is used to generate the desired output task label. A salient feature of CBMs is their reliance on a semantically-meaningful concept representation, serving as an intermediate, interpretable representation for downstream task predictions, as shown below:

Concept Bottleneck Model Processing. Image by the author.

As shown above, CBM models are trained with a combination of task loss for ensuring accurate task label prediction, as well as concept loss, ensuring accurate intermediate concept prediction. Importantly, CBMs enhance model transparency, since the underlying concept representation provides a way to explain and better-understand underlying model behaviour.

Concept Bottleneck Models offer a novel type of CNNs interpretable-by-design, allowing users to encode existing domain knowledge into models via concepts.

Overall, CBMs serve as an important innovation, bringing us closer to more transparent and trustworthy models.

Challenge: CBMs have a high concept annotation cost

Unfortunately, CBMs require a high amount of concept annotations during training.

At present, CBM approaches require all training samples to be annotated explicitly with both end-task, and concept annotations. Hence, for a dataset with N samples and C concepts, the annotation cost rises from N annotations (one task label per sample), to N*(C+1) annotations (one task label per sample, and one concept label for every concept). In practice, this can quickly get unwieldy, particularly for datasets with a large amount of concepts and training samples.

For example, for a dataset of 10,000 images with 50 concepts, the annotation cost will increase by 50*10,000=500,000 labels, i.e. by half a million extra annotations.

Unfortunately, Concept Bottleneck Models require a substantial amount of concept annotations for training.

Leveraging Semi-Supervised Concept-based Models with CME

CME relies on a similar observation highlighted in [3], where it was observed that vanilla CNN models often retain a high amount of information pertaining to concepts in their hidden space, which may be used for concept information mining at no extra annotation cost. Importantly, this work considered the scenario where the underlying concepts are unknown, and had to be extracted from a model’s hidden space in an unsupervised fashion.

With CME, we make use of the above observation, and consider a scenario where we have knowledge of the underlying concepts, but we only have a small amount of sample annotations for each these concepts. Similarly to [3], CME relies on a given pre-trained vanilla CNN and the small amount of concept annotations in order to extract further concept annotations in a semi-supervised fashion, as shown below:

CME model processing. Image by the author.

As shown above, CME extracts the concept representation using a pre-trained model’s hidden space in a post-hoc fashion. Further details are given below.

Concept Encoder Training: instead of training concept encoders from scratch on the raw data, as done in case of CBMs, we setup our concept encoder model training in a semi-supervised fashion, using the vanilla CNN’s hidden space:

We begin by pre-specifying a set of layers L from the vanilla CNN to use for concept extraction. This can range from all layers, to just the last few, depending on available compute capacity.Next, for each concept, we train a separate model on top of the hidden space of each layer in L to predict that concept’s values from the layer’s hidden spaceWe proceed to selecting the model and corresponding layer with the best model accuracy as the “best” model and layer for predicting that concept.Consequently, when making concept predictions for a concept i, we first retrieve the hidden space representation of the best layer for that concept, and then pass it through the corresponding predictive model for inference.

Overall, the concept encoder function can be summarised as follows (assuming there are k concepts in total):

CME Concept Encoder equation. Image by the author.Here, p-hat on the LHS represents the concept encoder functionThe gᵢ terms represent the hidden-space-to-concept models trained on top of the different layer hidden spaces, with i representing the concept index, ranging from 1 to k. In practice, these models can be fairly simple, such as Linear Regressors, or Gradient Boosted ClassifiersThe f(x) terms represent the sub-models of the original vanilla CNN, extracting the input’s hidden representation at a particular layerIn both cases above, superscripts specify the “best” layers these two models are operating on

Concept Processor Training: concept processor model training in CME is setup by training models using task labels as outputs, and concept encoder predictions as inputs. Importantly, these models are operating on a much more compact input representation, and can consequently be represented directly via interpretable models, such as Decision Trees (DTs), or Logistic Regression (LR) models.

CME Experiments & Results

Our experiments on both synthetic (dSprites and shapes3d) and challenging real-world datasets (CUB) demonstrated that CME models:

Achieve high concept predictive accuracy comparable to that of CBMs in many cases, even on concepts irrelevant to the end-task:Concept accuracies of CBM and CME models, plotted for all concepts across three different predictive tasks. Image by the author.Enable human interventions on concepts — i.e. allowing humans to quickly improve model performance by fixing small sets of chosen concepts:CME and CBM model performance changes for different degress of concept interventions. Image by the author.Explain model decision-making in terms of concepts, by allowing practitioners to plot concept processor models directly:An example of a concept processor model visualised directly for one of the chosen tasks. Image by the author.Help understand model processing of concepts by analysing the hidden space of underlying concepts across model layers:An example of hidden space visualisation for a few layers of the vanilla CNN. The columns represent the different layers. The rows represent the different concepts, with every row’s colour corresponding to that concept’s values. The “best” CME layers are indicated by a *. Image by the author.By defining Concept-based Models in the weakly-supervised domain with CME, we can develop significantly more label-efficient Concept-based Models

Take Home Message

By leveraging pre-trained vanilla Deep Neural Networks, we may obtain concept annotations and Concept-based Models at a vastly lower annotation cost, compared to standard CBM approaches.

Furthermore, this does not strictly apply just to concepts that are highly correlated to the end-task, but in certain cases also applies to concepts that are independent of the end-task.


[1] Chris Molnar. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/

[2] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR (2020).

[3] Amirata Ghorbani, James Wexler, James Zou, and Been Kim. Towards Automatic Concept-based Explanations. In Advances in neural information processing systems32.

Now You See Me (CME): Concept-based Model Extraction was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

1 Comment

  • tlovertonet


    Some truly fantastic work on behalf of the owner of this web site, utterly outstanding written content.


Leave a Comment