vignettes/ms-perks-features.Rmd
ms-perks-features.Rmd
The modelStudio()
function computes various (instance
and dataset level) model explanations and produces a customisable
dashboard, which consists of multiple panels for plots with their short
descriptions. Easily save the dashboard and share it with others. Tools
for Explanatory Model Analysis unite
with tools for Exploratory Data Analysis to give a broad overview of the
model behavior.
Let’s use HR
dataset to explore modelStudio
parameters:
train <- DALEX::HR
train$fired <- as.factor(ifelse(train$status == "fired", 1, 0))
train$status <- NULL
head(train)
gender | age | hours | evaluation | salary | fired |
---|---|---|---|---|---|
male | 32.58 | 41.89 | 3 | 1 | 1 |
female | 41.21 | 36.34 | 2 | 5 | 1 |
male | 37.71 | 36.82 | 3 | 0 | 1 |
female | 30.06 | 38.96 | 3 | 2 | 1 |
male | 21.10 | 62.15 | 5 | 3 | 0 |
male | 40.12 | 69.54 | 2 | 0 | 1 |
Prepare HR_test
data and a ranger
model for
the explainer:
# fit a ranger model
library("ranger")
model <- ranger(fired ~., data = train, probability = TRUE)
# prepare validation dataset
test <- DALEX::HR_test[1:1000,]
test$fired <- ifelse(test$status == "fired", 1, 0)
test$status <- NULL
# create an explainer for the model
explainer <- DALEX::explain(model,
data = test,
y = test$fired)
# start modelStudio
library("modelStudio")
Pass data points to the new_observation
parameter for
instance explanations such as Break Down, Shapley Values and Ceteris Paribus
Profiles. Use new_observation_y
to show their true
labels.
new_observation <- test[1:3,]
rownames(new_observation) <- c("John Snow", "Arya Stark", "Samwell Tarly")
true_labels <- test[1:3,]$fired
modelStudio(explainer,
new_observation = new_observation,
new_observation_y = true_labels)
If new_observation = NULL
, then choose
new_observation_n
observations, evenly spread by the order
of y_hat
. This shall always include the observations, which
ids are which.min(y_hat)
and
which.max(y_hat)
.
modelStudio(explainer, new_observation_n = 5) # default is 3
Achieve bigger or smaller modelStudio
grid with
facet_dim
parameter.
# small dashboard with 2 panels
modelStudio(explainer,
facet_dim = c(1,2))
# large dashboard with 9 panels
modelStudio(explainer,
facet_dim = c(3,3))
Manipulate time
parameter to set animation length. Value
0 will make them invisible.
# slow down animations
modelStudio(explainer,
time = 1000)
# turn off animations
modelStudio(explainer,
time = 0)
N
is a number of observations used for calculation of
Partial
Dependence and Accumulated
Dependence Profiles (default is 300
).N_fi
is a number of observations used for calculation
of Feature
Importance (default is N*10
).N_sv
is a number of observations used for calculation
of Shapley Values
(default is N*3
).B
is a number of permutation rounds used for
calculation of Shapley
Values (default is 10
).B_fi
is a number of permutation rounds used for
calculation of Feature
Importance (default is B
).Decrease N
and B
parameters to lower the
computation time or increase them to get more accurate empirical
results.
# faster, less precise
modelStudio(explainer,
N = 200, B = 5)
# slower, more precise
modelStudio(explainer,
N = 500, B = 15)
Don’t compute the EDA plots if they are not needed. Set the
eda
parameter to FALSE
.
modelStudio(explainer,
eda = FALSE)
Hide computation progress bar messages with show_info
parameter.
modelStudio(explainer,
show_info = FALSE)
Change viewer
parameter to set where to display
modelStudio
. Best
described in r2d3
documentation.
modelStudio(explainer,
viewer = "browser")
Speed up modelStudio
computation by setting
parallel
parameter to TRUE
. It uses parallelMap
package to calculate local explainers faster. It is really useful when
using modelStudio
with complicated models, vast datasets or
many observations are being processed.
All options can be set outside of the function call. How to use parallelMap.
# set up the cluster
options(
parallelMap.default.mode = "socket",
parallelMap.default.cpus = 4,
parallelMap.default.show.info = FALSE
)
# calculations of local explanations will be distributed into 4 cores
modelStudio(explainer,
new_observation = test[1:16,],
parallel = TRUE)
Customize some of the modelStudio
looks by overwriting
default options returned by the ms_options()
function. Full list
of options.
# set additional graphical parameters
new_options <- ms_options(
show_subtitle = TRUE,
bd_subtitle = "Hello World",
line_size = 5,
point_size = 9,
line_color = "pink",
point_color = "purple",
bd_positive_color = "yellow",
bd_negative_color = "orange"
)
modelStudio(explainer,
options = new_options)
All visual options can be changed after the calculations using
ms_update_options()
.
old_ms <- modelStudio(explainer)
old_ms
# update the options
new_ms <- ms_update_options(old_ms,
time = 0,
facet_dim = c(1,2),
margin_left = 150)
new_ms
Use ms_update_observations()
to add more observations
with their local explanations to the modelStudio
.
old_ms <- modelStudio(explainer)
old_ms
# add new observations
plus_ms <- ms_update_observations(old_ms,
explainer,
new_observation = test[101:102,])
plus_ms
# overwrite old observations
new_ms <- ms_update_observations(old_ms,
explainer,
new_observation = test[103:104,],
overwrite = TRUE)
new_ms
Use the widget_id
argument and r2d3
package
to render the modelStudio
output in Shiny. See Using r2d3
with Shiny and consider the following example:
library(shiny)
library(r2d3)
ui <- fluidPage(
textInput("text", h3("Text input"),
value = "Enter text..."),
uiOutput('dashboard')
)
server <- function(input, output) {
#:# id of div where modelStudio will appear
WIDGET_ID = 'MODELSTUDIO'
#:# create modelStudio
library(modelStudio)
library(DALEX)
model <- glm(survived ~., data = titanic_imputed, family = "binomial")
explainer <- DALEX::explain(model,
data = titanic_imputed,
y = titanic_imputed$survived,
label = "Titanic GLM",
verbose = FALSE)
ms <- modelStudio(explainer,
widget_id = WIDGET_ID, #:# use the widget_id
show_info = FALSE)
ms$elementId <- NULL #:# remove elementId to stop the warning
#:# basic render d3 output
output[[WIDGET_ID]] <- renderD3({
ms
})
#:# use render ui to set proper width and height
output$dashboard <- renderUI({
d3Output(WIDGET_ID, width=ms$width, height=ms$height)
})
}
shinyApp(ui = ui, server = server)
Use explain_*()
functions from the DALEXtra package
to explain various models.
Bellow basic example of making modelStudio
for a
mlr
model using explain_mlr()
.
library(DALEXtra)
library(mlr)
# fit a model
task <- makeClassifTask(id = "task", data = train, target = "fired")
learner <- makeLearner("classif.ranger", predict.type = "prob")
model <- train(learner, task)
# create an explainer for the model
explainer_mlr <- explain_mlr(model,
data = test,
y = test$fired,
label = "mlr")
# make a studio for the model
modelStudio(explainer_mlr)