EXTREMUM: A Web-Based Tool to Generate and Explore Counterfactual Explanations on Tabular and Time-Series Data
DEMO Video
Click on image below to play video
Abstract:
There is an increasing need to include explainability on the machine learning (ML) models. Among the various approaches, counterfactual (CF) explanations allow the design of what-if scenarios and the interactive exploration of ML model behavior on sensitive decision-making domains. However, the generation of CF for tabular and time-series data requires technical skills that are not always available to the end-users of ML-powered systems. Therefore, we propose a modular web-based tool to easily generate, visualize, and interact with CF on any tabular or time-series dataset. The EXTREMUM platform provides access to state-of-the-art CF algorithms, where users can train ML models and explore CF on their tabular or time-series datasets with an intuitive user interface. The project is instantiated on two tabular datasets within healthcare and five time-series datasets with various domains. The open-source repository lets ML researchers adapt the existing ML tool to new application domains.
Citation
@article{lakes_extremum_2025,
title = {{EXTREMUM}: A Web-Based Tool to Generate and Explore Counterfactual Explanations on Tabular and Time-Series Data},
volume = {European Conference, ECML PKDD 2025, Machine Learning and Knowledge Discovery in Databases. Demo Track},
author = {Lakes, Athanasios and Quintero, Luis and Papapetrou, Panagiotis},
year = {2025},
doi = {TBD}
}
Installation
These instructions will help you set up the Django project locally with SCSS compilation and static asset handling.
1. Clone the repository
git clone git@gitea.dsv.su.se:DataScienceGroup/EXTREMUM-demo.git
cd EXTREMUM-demo
2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
3. Install Python dependencies
pip install -r requirements.txt
4. Install SCSS compiler
npm install -g sass
If you don't have Node.js/npm installed, visit https://nodejs.org first.
5. Compile SCSS to CSS
sass base/static/scss:base/static/css
6. Configure .env
Create a .env file in the project root with your local development settings:
DJANGO_SECRET_KEY=replace-this-secret-key
DJANGO_DEBUG=True
DJANGO_ALLOWED_HOSTS=127.0.0.1,localhost
Your settings.py should read from these environment variables using os.getenv()
# settings.py
import os
from dotenv import load_dotenv
load_dotenv()
SECRET_KEY = os.getenv("DJANGO_SECRET_KEY", "unsafe-default-key")
DEBUG = os.getenv("DJANGO_DEBUG", "True") == "True"
ALLOWED_HOSTS = os.getenv("DJANGO_ALLOWED_HOSTS", "127.0.0.1,localhost").split(",")
7. Run the development server
python manage.py runserver 8000
Visit http://127.0.0.1:8000/ in your browser.
Functionalities
Tabular datasets
Datasets
The EXTREMUM dashboard offers functionalities for tabular datasets irrespective of the domain of interest via a dynamic end-to-end workflow that allows for data upload, visualization, preprocessing, training and CF explanation generation. The default datasets are Breast Cancer Wisconsin (Diagnostic) Dataset and Stroke Prediction Dataset. Each default dataset has a few pre-trained models that can be used to generate explanations and observe performances, which may also be deleted or replaced by new trained models.
Tabular datasets can also be imported via the Upload Dataset section where datasets such as Heart Disease Prediction and Thyroid Disease Dataset have been used and tested.
Train
The available classifiers for training on tabular datasets are:
- XGBoost
- Linear Regression
- Random Forest
- Decision Tree
- SVM
Preprocessing options are provided in the form of check-boxes and are applied with training. The available preprocessing methods are:
- Standardization
- Imputations
- One Hot Encoding
- Test Set Ration configuration
Counterfactuals
For the generation of counterfactual explanations The EXTREMUM Dashboard uses DiCE ML.
Time Series dataset
Datasets
In the dataset selection the user can pick from 5 different timeseries datasets or upload new ones. The default timeseries datasets are:
Another well tested dataset that can be upload is Strawberry
Train
For timeseries models, Wildboar and Glacier are available during the training stage. More specifically, the available classifiers are Random Shapelet Forest (RSF) and K-Nearest Neighbour (KNN) for Wildboar and a shallow learning model of 1dCNN for Glacier. For counterfactuals and explainability the user can pick the pre trained classifier, decide on a example timeseries entry based on the class and see the computed counterfactual.
Wildboar classifiers training is allowed some explicit preprocessing that the user can decide on. The available options for preprocessing timeseries datasets for training a Wildboar classifier are:
- Normalization
- Denoising
- Imputations
For Glacier things are a bit more complicated. You can traing an 1dCNN shallow learning model with an addition option for an autoencoder use but there is not preprocessing options for the user to pick from since it is explicitly done by the Glacier package. In coming versions this will be also included in the workflow.
Counterfactuals
After the train the user can access the counterfactuals section from where using the pre trained classifier (1dCNN) he can run some experiments for specific contraints types. These experiments are the core concept of Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification, and essentially set the range and level of modification that the generated timeseries (counterfactual explanation) will have compared to the original timeseries.
More on that on:
@article{wang_glacier_2024,
title = {Glacier: guided locally constrained counterfactual explanations for time series classification},
volume = {113},
issn = {1573-0565},
doi = {10.1007/s10994-023-06502-x},
number = {3},
journal = {Machine Learning},
author = {Wang, Zhendong and Samsten, Isak and Miliou, Ioanna and Mochaourab, Rami and Papapetrou, Panagiotis},
month = mar,
year = {2024},
}
