The modelStudio
package automates the explanatory analysis of machine learning predictive models. It generates advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks (e.g. mlr/mlr3
, xgboost
, caret
, h2o
, parsnip
, tidymodels
, scikit-learn
, lightgbm
, keras/tensorflow
).
The main modelStudio()
function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.
explain COVID-19 R & Python examples More resources Interactive EMA
The modelStudio
package is a part of the DrWhy.AI universe.
# Install from CRAN:
install.packages("modelStudio")
# Install the development version from GitHub:
devtools::install_github("ModelOriented/modelStudio")
library("DALEX")
library("ranger")
library("modelStudio")
# fit a model
model <- ranger(score ~., data = happiness_train)
# create an explainer for the model
explainer <- explain(model,
data = happiness_test,
y = happiness_test$score,
label = "Random Forest")
# make a studio for the model
modelStudio(explainer)
Save the output in the form of a HTML file - Demo Dashboard.
The modelStudio()
function uses DALEX
explainers created with DALEX::explain()
or DALEXtra::explain_*()
.
# packages for the explainer objects
install.packages("DALEX")
install.packages("DALEXtra")
Make a studio for the regression ranger
model on the apartments
data.
# load packages and data
library(mlr)
library(DALEXtra)
library(modelStudio)
data <- DALEX::apartments
# split the data
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]
# fit a model
task <- makeRegrTask(id = "apartments", data = train, target = "m2.price")
learner <- makeLearner("regr.ranger", predict.type = "response")
model <- train(learner, task)
# create an explainer for the model
explainer <- explain_mlr(model,
data = test,
y = test$m2.price,
label = "mlr")
# pick observations
new_observation <- test[1:2,]
rownames(new_observation) <- c("id1", "id2")
# make a studio for the model
modelStudio(explainer, new_observation)
Make a studio for the classification xgboost
model on the titanic
data.
# load packages and data
library(xgboost)
library(DALEX)
library(modelStudio)
data <- DALEX::titanic_imputed
# split the data
index <- sample(1:nrow(data), 0.7*nrow(data))
train <- data[index,]
test <- data[-index,]
train_matrix <- model.matrix(survived ~.-1, train)
test_matrix <- model.matrix(survived ~.-1, test)
# fit a model
xgb_matrix <- xgb.DMatrix(train_matrix, label = train$survived)
params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
model <- xgb.train(params, xgb_matrix, nrounds = 500)
# create an explainer for the model
explainer <- explain(model,
data = test_matrix,
y = test$survived,
type = "classification",
label = "xgboost")
# pick observations
new_observation <- test_matrix[1:2, , drop=FALSE]
rownames(new_observation) <- c("id1", "id2")
# make a studio for the model
modelStudio(explainer, new_observation)
The modelStudio()
function uses dalex
explainers created with dalex.Explainer()
.
:: package for the Explainer object
pip install dalex -U
Use pickle
Python module and reticulate
R package to easily make a studio for a model.
# package for pickle load
install.packages("reticulate")
Make a studio for the regression Pipeline SVR
model on the fifa
data.
First, use dalex
in Python:
# load packages and data
import dalex as dx
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from numpy import log
= dx.datasets.load_fifa()
data = data.drop(columns=['overall', 'potential', 'value_eur', 'wage_eur', 'nationality'], axis=1)
X = log(data.value_eur)
y
# split the data
= train_test_split(X, y)
X_train, X_test, y_train, y_test
# fit a pipeline model
= Pipeline([('scale', StandardScaler()), ('svm', SVR())])
model
model.fit(X_train, y_train)
# create an explainer for the model
= dx.Explainer(model, data=X_test, y=y_test, label='scikit-learn')
explainer
# pack the explainer into a pickle file
open('explainer_scikitlearn.pickle', 'wb')) explainer.dump(
Then, use modelStudio
in R:
# load the explainer from the pickle file
library(reticulate)
explainer <- py_load_object("explainer_scikitlearn.pickle", pickle = "pickle")
# make a studio for the model
library(modelStudio)
modelStudio(explainer, B = 5)
Make a studio for the classification Pipeline LGBMClassifier
model on the titanic
data.
First, use dalex
in Python:
# load packages and data
import dalex as dx
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from lightgbm import LGBMClassifier
= dx.datasets.load_titanic()
data = data.drop(columns='survived')
X = data.survived
y
# split the data
= train_test_split(X, y)
X_train, X_test, y_train, y_test
# fit a pipeline model
= ['age', 'fare', 'sibsp', 'parch']
numerical_features = Pipeline(
numerical_transformer =[
steps'imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
(
]
)= ['gender', 'class', 'embarked']
categorical_features = Pipeline(
categorical_transformer =[
steps'imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
(
]
)
= ColumnTransformer(
preprocessor =[
transformers'num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
(
]
)
= LGBMClassifier(n_estimators=300)
classifier
= Pipeline(
model =[
steps'preprocessor', preprocessor),
('classifier', classifier)
(
]
)
model.fit(X_train, y_train)
# create an explainer for the model
= dx.Explainer(model, data=X_test, y=y_test, label='lightgbm')
explainer
# pack the explainer into a pickle file
open('explainer_lightgbm.pickle', 'wb')) explainer.dump(
Then, use modelStudio
in R:
# load the explainer from the pickle file
library(reticulate)
explainer <- py_load_object("explainer_lightgbm.pickle", pickle = "pickle")
# make a studio for the model
library(modelStudio)
modelStudio(explainer)
Save modelStudio
as a HTML file using buttons on the top of the RStudio Viewer or with r2d3::save_d3_html()
.
If you use modelStudio
, please cite our JOSS article:
@article{baniecki2019modelstudio,
title = {{modelStudio: Interactive Studio with Explanations for ML Predictive Models}},
author = {Hubert Baniecki and Przemyslaw Biecek},
journal = {Journal of Open Source Software},
year = {2019},
volume = {4},
number = {43},
pages = {1798},
url = {https://doi.org/10.21105/joss.01798}
}
For a description and evaluation of the Interactive EMA process, refer to our DAMI article:
@article{baniecki2023grammar,
title = {The grammar of interactive explanatory model analysis},
author = {Hubert Baniecki and Dariusz Parzych and Przemyslaw Biecek},
journal = {Data Mining and Knowledge Discovery},
year = {2023},
pages = {1--37},
url = {https://doi.org/10.1007/s10618-023-00924-w}
}
Introduction to the plots: Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models
Vignettes: perks and features, R & Python examples, modelStudio in R Markdown HTML
Changelog: NEWS
Conference poster: ML in PL 2019