Tabular dataset
Functionality for tabular data is implemented. Dataset selection provides a prediction of malignancy of breast cancer dataset and a stroke prediction dataset. Which the user can access visually using either the plots and the features based on his needs but also the actual data in the form of a table that is depicted in the page. Training of the classifiers for the tabular dataset has been implemented for some time now and produces adequete results. A user can choose the type of preprocessing to do, the classifier and the train/test ratio. Visualizing the training results is available through the charts.html page (Pre trained models and visualization) from where plots like Feature importance PCA classification reports and others are loaded. For counterfactuals explanations DICEMl is improved using not only the datapoint to compute the counterfactuals of, but also the features to vary parameter which can be exploited using the additional functionality of the dashboard that provides for selection of specific features to vary, select/deselect and sort based on the importance.
One thing to note is that for the stroke dataset, results of diceml are not so adequete and needs to be improved.
TODO: 1) Add more parameterization on the training 2) Improve Stroke counterfactuals
Timeseries dataset
Utilising timeseries datasets is furtherly improved in this version. In the dataset selection the user can pick from 4 different datasets proved by Wildboar and observe the data in the page for each of them. Moving to training, Wildboar classifiers are added and preprocessing is allowed in them as long as test set ratio. The available classifiers are RSF and KNN. For counterfactuals and explainability the user can pick the pre trained classifier, decide on a example timeseries entry based on the class and see the computed counterfactual live.
TODO: 1) Offer more example entries to pick from (done) 2) Add more paraterization in training 3) Compare different methods
For Glacier things are a bit more complicated and definetely need further improvement. Training of Glacier for now can only be 1dCNN with little configuration. A user can also decide on an autoencoder that itslef would maybe need more configuration. Then FOr counterfactuals and explainability, there is a computation process of the coutnerfactuals but is not efficient. A user can pick from a list of contraint factors, and a predicted margin weight value and then run some computations that would produce the counterfactual of the entry. That is not efficient because the counterfactuals are computed for all the X_test set which is 50 entries but then only 2 of them (one from each category) can be accessed. Also experiments with specific constraints cannot be accessed not presaved and need to be rerun.
TODO: 1) Run experiment first (constraint type, predicted margin weight etc) (done) 2) Give access to pre computed experiments (done) 3) Offer more examples entries to pick from (done) 4) Add more parameterization
Import
Importing a dataset is available in this version using the dataset selection navigator. It is important to state the type of the dataset (timeseries or tabular) for the backend. Timeseries and tabular dataset imports arre available.
TODO: 1) Navigate between the uploaded datasets 2) Remove uploaded files