Unfolding the universe of possibilities..

Navigating the waves of the web ocean

What If We Could Easily Explain Overly Complex Models?

Generating counterfactual explanations got a lot easier with CFNOW, but what are counterfactual explanations, and how can I use them?

Image generated with Illusion Diffusion model with CFNOW text as illusion (try to squint your eyes and look from a certain distance) | Image by the author using Stable Diffusion model (license)

This article is based on the following article: https://www.sciencedirect.com/science/article/abs/pii/S0377221723006598

And here is the address for the CFNOW repository: https://github.com/rmazzine/CFNOW

If you are reading this, you may know how pivotal Artificial Intelligence (AI) is becoming in our world today. However, it’s important to note that the seemingly effective, novel machine learning approaches, combined with their widespread popularity, can lead to unforeseen/undesirable consequences.

This brings us to why eXplainable Artificial Intelligence (XAI) is a crucial component in ensuring AI’s ethical and responsible development. This area shows that explaining models that consist of millions or even billions of parameters is not a trivial question. The answer to this is multifaceted, as there are numerous methods revealing different aspects of the model, with LIME [1] and SHAP [2] being popular examples.

However, the complexity of the explanations generated by these methods can result in intricate charts or analyses, that potentially can lead to misinterpretations by those other than well-informed experts. One possible way to circumvent this complexity is a simple and natural method to explain things called Counterfactual Explanations [3].

Counterfactual Explanations leverage a natural human behavior to explain things — creating “alternate worlds” where altering a few parameters can change the outcome. It’s a common practice, you probably already did something like that— “if only I woke up a bit earlier, I wouldn’t miss the bus”, this type of explanation highlights the main reasons for an outcome in a straightforward manner.

Delving deeper, counterfactuals extend beyond just mere explanations; they can serve as guidance for changes, assist in debugging anomalous behavior, and verify if some features can potentially modify predictions (while not being so impactful on scoring). This multifunctional nature emphasizes the importance of explaining your predictions. It’s not just a matter of responsible AI; it’s also a path to improving models and using them beyond the scope of predictions. A remarkable feature of counterfactual explanations is their decision-driven nature, making them directly correspond to a change in prediction [6], unlike LIME and SHAP which are more suited to explaining scores.

Given the evident benefits, one might wonder why counterfactuals aren’t more popular. It’s a valid question! The primary barriers to the widespread adoption of counterfactual explanations are threefold [4, 5]: (1) the absence of user-friendly and compatible counterfactual generation algorithms, (2) algorithm inefficiency in generating counterfactuals, (3) and the lack of comprehensive visual representation.

But I have some good news for you! A new package, CFNOW (CounterFactuals NOW or CounterFactual Nearest Optimal Wololo), is stepping up to address these challenges. CFNOW is a versatile Python package capable of generating multiple counterfactuals for various data types such as tabular, image, and textual (embedding) inputs. It adopts a model-agnostic approach, requiring only minimal data — (1) the factual point (point to be explained) and (2) the prediction function.

Moreover, CFNOW is structured to allow the development and integration of new strategies for finding and fine-tuning counterfactuals based on custom logic. It also features CounterPlots, a novel strategy for visually representing counterfactual explanations.

Central to CFNOW is a framework that converts data to a single structure manageable by the CF generator. Following this, a two-step process locates and optimizes the found counterfactual. To prevent local minimums, the package implements Tabu Search, a matheuristics method, allowing it to explore for new regions where the objective function might be better optimized.

The subsequent sections of this text will focus on demonstrating how CFNOW can be proficiently utilized to generate explanations for tabular, image, and textual (embedding) classifiers.

Tabular Classifiers

Here, we show the usual stuff, you have tabular data with multiple types of data. In the example below, I will use a dataset that has numerical continuous, categorical binary, and categorical one-hot encoded data to showcase CFNOW in its full power.

First things first, you need to install the CFNOW package, the requirement is a Python version superior to 3.8:

pip install cfnow(here is the full code for this example: https://colab.research.google.com/drive/1GUsVfcM3I6SpYCmsBAsKMsjVdm-a6iY6?usp=sharing)

In this first part, we will make a classifier with Adult Dataset. Then, there is not much news here:

import warnings

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
warnings.filterwarnings(“ignore”, message=”X does not have valid feature names, but RandomForestClassifier was fitted with feature names”)

We import basic packages to make the classification model and, we also deactivate the warnings related to making predictions without the columns’ names.

Then, we proceed to write the classifier where class 1 represents an income lower or equal to 50k (<=50K) and class 0 represents high income.

# Make the classifier
import warnings

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
warnings.filterwarnings(“ignore”, message=”X does not have valid feature names, but RandomForestClassifier was fitted with feature names”)

# Load the Adult dataset
dataset_url = “https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data”
column_names = [‘age’, ‘workclass’, ‘fnlwgt’, ‘education’, ‘education-num’, ‘marital-status’,
‘occupation’, ‘relationship’, ‘race’, ‘sex’, ‘capital-gain’, ‘capital-loss’,
‘hours-per-week’, ‘native-country’, ‘income’]

data = pd.read_csv(dataset_url, names=column_names, na_values=” ?”, skipinitialspace=True)

# Drop rows with missing values
data = data.dropna()

# Identify the categorical features that are not binary
non_binary_categoricals = [column for column in data.select_dtypes(include=[‘object’]).columns
if len(data[column].unique()) > 2]

binary_categoricals = [column for column in data.select_dtypes(include=[‘object’]).columns
if len(data[column].unique()) == 2]

cols_numericals = [column for column in data.select_dtypes(include=[‘int64’]).columns]

# Apply one-hot encoding to the non-binary categorical features
data = pd.get_dummies(data, columns=non_binary_categoricals)

# Convert the binary categorical features into numbers
# This will also binarize the target variable (income)
for bc in binary_categoricals:
data[bc] = data[bc].apply(lambda x: 1 if x == data[bc].unique()[0] else 0)

# Split the dataset into features and target variable
X = data.drop(‘income’, axis=1)
y = data[‘income’]

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a RandomForestClassifier
clf = RandomForestClassifier(random_state=42)

# Train the classifier
clf.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = clf.predict(X_test)

# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(“Accuracy:”, accuracy)

With the code above, we create a dataset, pre-process it, create a classification model, and make a prediction and evaluation over the test set.

Now, let’s take one point (the first from the test set) and verify its prediction:

# Result: 0 -> High income

Now it is time to use CFNOW to calculate how we can change this prediction by minimally modifying the features:

from cfnow import find_tabular
# Then, we use CFNOW to generate the minimum modification to change the classification
cf_res = find_tabular(
feat_types={c: ‘num’ if c in cols_numericals else ‘cat’ for c in X.columns},

The code above we:

factualAdd the factual instance as a pd.Seriesfeat_typesSpecify the feature types (“num” for numerical continuous and “cat” for categorical)has_oheIndicate that we have OHE features (it automatically detects OHE features by aggregating those that have the same prefix followed by an underscore, e.g., country_brazil, country_usa, country_ireland).model_predict_probaIncludes a prediction functionlimit_seconds Defines a total time threshold for running, this is important because the fine-tuning step can keep going indefinitely (default is 120 seconds)

Then, after some time, we can first evaluate the class of the best counterfactual (first index of cf_res.cfs)

# Result: 1-> Low income

And here comes some differences with CFNOW, since it also integrates CounterPlots, we can plot their charts and have more insightful information like the below:

CounterShapley Chart for our CF | Image by the author

The CounterShapley plot below shows the relative importance of each feature to generate the counterfactual prediction. Here, we have some interesting insights showing that marial_status (if combined) represents more than 50% of the contribution to the CF class.

Greedy Chart for our CF | Image by the author

The Greedy chart shows something very similar to the CounterShapley, the main difference here is the sequence of changes. While the CounterShapley does not consider any specific sequence (calculating contributions using Shapley’s values), the Greedy chart uses the greediest strategy to modify the factual instance, each step changing the feature that most contributes to the CF class. This might be useful for situations where some guidance is given in a greedy way (each step choosing the best approach to achieve the objective).

Constellation Chart for our CF | Image by the author

Finally, we have the most complex analysis, the Constellation chart. Despite its daunting look, it is actually pretty straightforward to interpret it. Each large red dot represents one single feature change (respective to the label), and the smaller dots represent the combination of two or more features. Finally, the big blue dot represents the CF score. Here, we can see the only way to obtain a CF with these features is by modifying all of them to their respective values (i.e., there is no subset that generates a CF). We can also deep dive and investigate the relationship between features and potentially find interesting patterns.

In this particular case, it was interesting to observe that a prediction of high income would change if the person were a Female, Divorced, and with an own child. This counterfactual can lead to further discussions of the economic impacts on different social groups.

Image Classifiers

As already mentioned, CFNOW can work with diverse types of data, so it can also generate counterfactuals for Image data. However, what does it mean to have a counterfactual for an image dataset?

The response can vary because there are several ways in which you can generate counterfactuals. It can be replacing single pixels with random noise (a method used by adversarial attacks) or something more complex, involving advanced segmentation methods.

CFNOW uses a segmentation method called quickshift, which is a reliable and fast method to detect “semantic” segments. However, it is possible to integrate (and I invite you to do so) other segmentation techniques.

Segment detection alone is not sufficient to generate counterfactual explanations. We also need to modify the segments, replacing them with modified versions. To this modification, CFNOW has four options defined in the parameter replace_mode, where we can have: (default) blur — that adds a blur filter to the replaced segments, mean which replaces the segments by the average color, random that replaces it with random noise, and inpaint, which reconstructs the image based on neighborhood pixels.

If you want the whole code you can find here: https://colab.research.google.com/drive/1M6bEP4x7ilSdh01Gs8xzgMMX7Uuum5jZ?usp=sharing

Following, I will show the code implementation of CFNOW for this type of data:

First, again, let’s install the CFNOW package if you have not done it yet.

pip install cfnow

Now, let’s add some additional packages to load a pre-trained model:

pip install torch torchvision Pillow requests

Then let’s load the data, load the pre-trained model and create a prediction function that is compatible to the data format CFNOW must receive:

import requests
import numpy as np
from PIL import Image
from torchvision import models, transforms
import torch

# Load a pre-trained ResNet model
model = models.resnet50(pretrained=True)

# Define the image transformation
transform = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

# Fetch an image from the web
image_url = “https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Sunflower_from_Silesia2.jpg/320px-Sunflower_from_Silesia2.jpg”
response = requests.get(image_url, stream=True)
image = np.array(Image.open(response.raw))

def predict(images):
if len(np.shape(images)) == 4:
# Convert the list of numpy arrays to a batch of tensors
input_images = torch.stack([transform(Image.fromarray(image.astype(‘uint8’))) for image in images])
elif len(np.shape(images)) == 3:
input_images = transform(Image.fromarray(images.astype(‘uint8’)))
raise ValueError(“The input must be a list of images or a single image.”)

# Check if a GPU is available and if not, use a CPU
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
input_images = input_images.to(device)

# Perform inference
with torch.no_grad():
outputs = model(input_images)

# Return an array of prediction scores for each image
return torch.asarray(outputs).cpu().numpy()

LABELS_URL = “https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json”
def predict_label(outputs):
# Load the labels used by the pre-trained model
labels = requests.get(LABELS_URL).json()

# Get the predicted labels
predicted_idxs = [np.argmax(od) for od in outputs]
predicted_labels = [labels[idx.item()] for idx in predicted_idxs]

return predicted_labels

# Check the prediction for the image
predicted_label = predict([np.array(image)])
print(“Predicted labels:”, predict_label(predicted_label))

Most of the code work is related to building the model, getting the data, and adjusting it, because to generate counterfactuals with CFNOW we just need to:

from cfnow import find_image

cf_img = find_image(img=image, model_predict=predict)

cf_img_hl = cf_img.cfs[0]
print(“Predicted labels:”, predict_label(predict([cf_img_hl])))

# Show the CF image

In the example above, we used all default optional parameters, therefore, we used quickshift to segment the image and replace the segments with blurred images. As result, we have this factual prediction below:

Factual image classified as a “daisy” | Image title: Sunflower (Helianthus L). Słonecznik by Pudelek (Edit by Yzmo and Vassil) from Wikimedia under GNU Free Documentation License, Version 1.2

To the following:

CF image classified as a “bee” | Image title: Sunflower (Helianthus L). Słonecznik by Pudelek (Edit by Yzmo and Vassil) from Wikimedia under GNU Free Documentation License, Version 1.2

So, what are the outcomes from this analysis? Actually, image counterfactuals can be extremely useful tools to detect how the model is making the classifications. This can be applied in cases where: (1) we want to verify why the model made correct classifications — ensuring it is using correct image features: in this case, although it misclassified the sunflower as a daisy, we can see that blurring the flower (and not a background feature) makes it to change the prediction. It also can (2) help to diagnose misclassified images, which can lead to better insights for image processing and/or data acquisition.

Textual Classifiers

Finally, we have textual classifiers based on embeddings. Although simple textual classifiers (that use a data structure more like tabular data) can use the tabular counterfactual generator, textual classifiers based on embeddings, this is not as clear.

The justification is that embeddings have a variable number of inputs and words that can considerably affect the prediction score and classification.

CFNOW solves that with two strategies: (1) by removing evidence or (2) by adding antonyms. The first strategy is straightforward, to measure the impact of each word on the text, we simply remove them and see which ones we must remove to flip the classification. While adding antonyms, we can possibly keep a semantic structure (because removing a word can severely harm it).

Then, the code below shows how to use CFNOW in this context.

If you want the entire code, you can check it here: https://colab.research.google.com/drive/1ZMbqJmJoBukqRJGqhUaPjFFRpWlujpsi?usp=sharing

First, install the CFNOW package:

pip install cfnow

Then, install the necessary packages for the textual classification:

pip install transformers

Then, as in the previous sections, first, we will build the classifier:

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
from transformers import pipeline

import numpy as np

# Load pre-trained model and tokenizer for sentiment analysis
model_name = “distilbert-base-uncased-finetuned-sst-2-english”
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)

# Define the sentiment analysis pipeline
sentiment_analysis = pipeline(“sentiment-analysis”, model=model, tokenizer=tokenizer)

# Define a simple dataset
text_factual = “I liked this movie because it was funny but my friends did not like it because it was too long and boring.”

result = sentiment_analysis(text_factual)
print(f”{text_factual}: {result[0][‘label’]} (confidence: {result[0][‘score’]:.2f})”)

def pred_score_text(list_text):
if type(list_text) == str:
sa_pred = sentiment_analysis(list_text)[0]
sa_score = sa_pred[‘score’]
sa_label = sa_pred[‘label’]
return sa_score if sa_label == “POSITIVE” else 1.0 – sa_score
return np.array([sa[“score”] if sa[“label”] == “POSITIVE” else 1.0 – sa[“score”] for sa in sentiment_analysis(list_text)])

For this code, we will see our factual text has a NEGATIVE sentiment with a high confidence (≥0.9), then let’s try to generate the counterfactual for it:

from cfnow import find_text
cf_text = find_text(text_input=text_factual, textual_classifier=pred_score_text)
result_cf = sentiment_analysis(cf_text.cfs[0])
print(f”CF: {cf_text.cfs[0]}: {result_cf[0][‘label’]} (confidence: {result_cf[0][‘score’]:.2f})”)

With the code above, just changing a single word (but) the classification changed from NEGATIVE to POSITIVE with high confidence. This showcases how counterfactuals can be useful, since this minimal modifications can have implications on understanding how the model predicts sentences and/or help debugging undesirable behaviors.


This was a (relatively) brief introduction to CFNOW and Counterfactual explanations. There is an extensive (and increasing) literature regarding counterfactuals that you definitely should take a look if you want to deep dive, this seminal article [3] written by (my Ph.D. advisor, Prof. David Martens) is a great way to have a better introduction to Counterfactual Explanations. Additionally, there are good reviews like this one written by Verma et al [7]. In summary, counterfactual explanations are an easy and convinient way to explain complex machine learning algorithms decisions, and can do much more than explanations if correctly applied. CFNOW can provide an easy, fast, and flexible way to generate counterfactual explanations, allowing practitioners not just to explain, but also to leverage as much as possible the potential from their data and model.


[1] — https://github.com/marcotcr/lime
[2] — https://github.com/shap/shap
[3] — https://www.jstor.org/stable/26554869
[4] — https://www.mdpi.com/2076-3417/11/16/7274
[5] — https://arxiv.org/pdf/2306.06506.pdf
[6] — https://arxiv.org/abs/2001.07417
[7] — https://arxiv.org/abs/2010.10596

What If We Could Easily Explain Overly Complex Models? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

1 Comment

  • tlover tonet


    Together with every little thing which seems to be building inside this subject material, your perspectives tend to be quite radical. Even so, I am sorry, because I do not give credence to your whole plan, all be it refreshing none the less. It looks to everybody that your opinions are actually not completely rationalized and in actuality you are generally yourself not really completely certain of your point. In any event I did enjoy looking at it.


Leave a Comment