This work was part of the EXTREMUM (Explainable and Ethical Machine Learning for Knowledge Discovery from Medical Data Sources) project, a collaboration work between the Department of Computers and Systems Science, Royal Institute of Technology and RISE Research Institute.

The goal of thid cross-disciplinary collaboration is to design and implement a novel data management and analytics framework for medical data sources. The focus is on explainable machine learning methods as well as on legal and ethical aspects of the predictive models.

The following version of the dashboard was presented as part of the Demo Track of the ECML/PKDD conference in Porto, Portugal, in 2025 and was published as a demo paper with the following doi: 10.1007/978-3-032-06129-4_37. The full version of the paper you can access in the preprints of the ECML/PKDD 2025 papers.

DEMO Video

Click on image below to play video

Abstract:

There is an increasing need to include explainability on the machine learning (ML) models. Among the various approaches, counterfactual (CF) explanations allow the design of what-if scenarios and the interactive exploration of ML model behavior on sensitive decision-making domains. However, the generation of CF for tabular and time-series data requires technical skills that are not always available to the end-users of ML-powered systems. Therefore, we propose a modular web-based tool to easily generate, visualize, and interact with CF on any tabular or time-series dataset. The EXTREMUM platform provides access to state-of-the-art CF algorithms, where users can train ML models and explore CF on their tabular or time-series datasets with an intuitive user interface. The project is instantiated on two tabular datasets within healthcare and five time-series datasets with various domains. The open-source repository lets ML researchers adapt the existing ML tool to new application domains.

Citation

@article{lakes_extremum_2025,
	title = {{EXTREMUM}: A Web-Based Tool to Generate and Explore Counterfactual Explanations on Tabular and Time-Series Data},
	volume = {European Conference, ECML PKDD 2025, Machine Learning and Knowledge Discovery in Databases. Demo Track},
	author = {Lakes, Athanasios and Quintero, Luis and Papapetrou, Panagiotis},
	year = {2025},
	doi = {10.1007/978-3-032-06129-4_37}
}

Installation

These instructions will help you set up the Django project locally with SCSS compilation and static asset handling.

1. Clone the repository

git clone https://gitea.dsv.su.se/DataScienceGroup/EXTREMUM-demo.git
cd EXTREMUM-demo

2. Install Python 3.10 and pip locally

sudo apt install python3.10 python3.10-venv python3.10-dev

Ensure python version

python3.10 --version

3. Create and activate a virtual environment

python3.10 -m venv venv
source venv/bin/activate

With the virtual environment activated, ensure python version

python --version

4. Install Python dependencies

pip install -r requirements.txt

5. Install Node.js / npm (required for frontend assets)

This project includes frontend assets written in SCSS, which must be compiled to CSS using Sass before running the Python application.

Node.js and npm are required only for this build step; they are not used by the Python runtime.

sudo apt install npm

6. Install Sass (required for building frontend styles)

sudo npm install -g sass

7. Compile SCSS to CSS

sass base/static/scss:base/static/css

8. Configure .env

Create a .env file in the project root with your local development settings:

DJANGO_SECRET_KEY=replace-this-secret-key
DJANGO_DEBUG=True
DJANGO_ALLOWED_HOSTS=127.0.0.1,localhost

Your settings.py should read from these environment variables using os.getenv()

# settings.py
import os
from dotenv import load_dotenv

load_dotenv()

SECRET_KEY = os.getenv("DJANGO_SECRET_KEY", "unsafe-default-key")
DEBUG = os.getenv("DJANGO_DEBUG", "True") == "True"
ALLOWED_HOSTS = os.getenv("DJANGO_ALLOWED_HOSTS", "127.0.0.1,localhost").split(",")

9. Run the development server

python manage.py runserver 8000

Visit http://127.0.0.1:8000/ in your browser.

Functionalities

Tabular datasets

Datasets

The EXTREMUM dashboard offers functionalities for tabular datasets irrespective of the domain of interest via a dynamic end-to-end workflow that allows for data upload, visualization, preprocessing, training and CF explanation generation. The default datasets are Breast Cancer Wisconsin (Diagnostic) Dataset and Stroke Prediction Dataset. Each default dataset has a few pre-trained models that can be used to generate explanations and observe performances, which may also be deleted or replaced by new trained models.

Tabular datasets can also be imported via the Upload Dataset section where datasets such as Heart Disease Prediction and Thyroid Disease Dataset have been used and tested.

Train

The available classifiers for training on tabular datasets are:

XGBoost
Linear Regression
Random Forest
Decision Tree
SVM

Preprocessing options are provided in the form of check-boxes and are applied with training. The available preprocessing methods are:

Standardization
Imputations
One Hot Encoding
Test Set Ration configuration

Counterfactuals

For the generation of counterfactual explanations The EXTREMUM Dashboard uses DiCE ML.

Time Series dataset

Datasets

In the dataset selection the user can pick from 5 different timeseries datasets or upload new ones. The default timeseries datasets are:

Another well tested dataset that can be upload is Strawberry

Train

For timeseries models, Wildboar and Glacier are available during the training stage. More specifically, the available classifiers are Random Shapelet Forest (RSF) and K-Nearest Neighbour (KNN) for Wildboar and a shallow learning model of 1dCNN for Glacier. For counterfactuals and explainability the user can pick the pre trained classifier, decide on a example timeseries entry based on the class and see the computed counterfactual.

Wildboar classifiers training is allowed some explicit preprocessing that the user can decide on. The available options for preprocessing timeseries datasets for training a Wildboar classifier are:

Normalization
Denoising
Imputations

For Glacier things are a bit more complicated. You can traing an 1dCNN shallow learning model with an addition option for an autoencoder use but there is not preprocessing options for the user to pick from since it is explicitly done by the Glacier package. In coming versions this will be also included in the workflow.

Counterfactuals

After the train the user can access the counterfactuals section from where using the pre trained classifier (1dCNN) he can run some experiments for specific contraints types. These experiments are the core concept of Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification, and essentially set the range and level of modification that the generated timeseries (counterfactual explanation) will have compared to the original timeseries.