April 27, 2021

# Using SHAP with pSeven Core for Sensitivity Analysis

## Introduction

Model sensitivity analysis is becoming a more and more popular research method in engineering problems. Engineers often need to know whether it is possible to obtain a valid model using only a smaller subset of inputs, in order to speed up the design process and narrow the analysis. To decide which inputs are included in this subset, a task is commonly formulated with the question: which of the inputs are the most significant, and which ones affect the output least of all?

Sensitivity analysis techniques answer this question by providing a set of sensitivity values, which are numerical estimates of how the variability (uncertainty) in a model's output can be attributed to changes in its inputs. There are a lot of techniques to calculate these values. For example, the GTSDA (**S**ensitivity and **D**ependency **A**nalysis) module in pSeven Core implements the following:

**Sobol indices**— a variance-based sensitivity analysis method. It decomposes the output variance of a model into fractions, which can be attributed to inputs or sets of inputs.**Screening indices**— a qualitative sensitivity analysis method. It ranks model inputs in order of their influence on the output, but does not quantify the influence accurately.**Taguchi Indices**— a robust sensitivity analysis method for functions of discrete variables.

All of the above belong to global sensitivity analysis methods. They can be applied both to a data sample and to a blackbox model, which can evaluate the output function at any point in the design area. When analysing a blackbox model, the method itself generates an input sample evenly distributed in the design area.

Since version 6.20, pSeven Core supports one more sensitivity analysis method — calculating SHAP values.

**SHAP** (**SH**apley **A**dditive ex**P**lanations) is a game-theoretic approach for explaining the results of any machine learning model. It is based on the Shapley value concept, which in game theory allows evaluating the exact contribution of each player to the result of the game. Obviously, the game should be over by the time of analysis. Therefore, unlike global sensitivity analysis methods, SHAP refers to individual points in input, providing local explanations.

This tech tip illustrates using a GTApprox model to get its SHAP values for points in a given sample. A more detailed example of using SHAP with pSeven Core is also available from the pSeven Core User Manual (see SHapley Additive exPlanations).

## Calculating SHAP values

For a GTApprox model, SHAP values can be obtained by:

- using the
*gtapprox.Model.shap_value()*method from pSeven Core, or - using the GTApprox model with methods from the
*shap*module.

The *gtapprox.Model.shap_value()* is a fast optimized method, which is supported by the following GTApprox models:

- Any differentiable model — that is, a model with continuous variables.
- Any model trained with the
**GBRT**(**G**radient**B**oosted**R**egression**T**rees) technique — regardless of the types of variables.

The only restriction of the *gtapprox.Model.shap_value()* method is the lack of support for some models with categorical (discrete) variables. In turn, methods of the *shap* module are applicable to any GTApprox model, although they are usually more time-consuming.

This tech tip describes both ways of calculating SHAP values for a GTApprox model.

### Model

First of all, we have to train an example GTApprox model. The training sample in this example is the classic Boston Housing Dataset, which is also included in the SHAP examples and can be simply imported from the *shap* module. The dataset consists of 506 cases with 14 input features (home properties) and the home price as output.

#### Load the dataset

import shap X, y = shap.datasets.boston()

#### Train a model

from da.p7core import gtapprox builder = gtapprox.Builder() p7model = builder.build(X, y, options={"GTApprox/Technique": "GBRT", # technique "GTApprox/GBRTNumberOfTrees": 100, # number of trees "GTApprox/GBRTShrinkage": 0.01, # learning rate "GTApprox/GBRTMaxDepth": 2})

### Using *shap_value()* to get SHAP values

To get SHAP values from the trained GTApprox model, call its *shap_value()* method, passing a sample of inputs as an argument (see the pSeven Core User Manual for a detailed description).

The *shap_value()* method can work in two modes:

- The SHAP-compatible mode (default), in which it returns an explanation object that you can pass directly to methods of the
*shap*module. - The standalone mode, in which it returns SHAP values as arrays.

Both modes use the same optimized calculation method implemented in pSeven Core and differ only in the form of result presentation. Switching between modes is controlled by the *shap_compatible* argument of the *shap_value()* method.

#### SHAP-compatible mode example

The SHAP-compatible mode is default. In this mode, *shap_value()* returns a *shap.Explanation* object (see shap.Explanation). To initialize this object, *shap_value()* requires the *shap* module when it runs in the SHAP-compatible mode.

explanation = p7model.shap_value(X) shap.summary_plot(explanation.values, explanation.data, feature_names=explanation.feature_names)

#### Standalone mode example

The standalone mode is enabled if you pass *shap_compatible=False* to *shap_value()*. This mode **does not require** the *shap* module. In the standalone mode, *shap_value()* returns a tuple of two objects, where the first is the baseline and the second is the array of SHAP values.

baseline, shap_values = model.shap_value(X, shap_compatible=False) shap.dependence_plot("LSTAT", shap_values, X)

SHAP interaction values are supported by GBRT models only. To get SHAP interaction values, pass *interactions=True* to *shap_value()* (interactions are disabled by default).

baseline, shap_interactions = model.shap_value(X, interactions=True, shap_compatible=False) shap.summary_plot(shap_interactions, X)

### Using GTApprox models with the *shap* module

As an alternative to using the *gtapprox.Model.shap_value()* method from pSeven Core, you can also calculate SHAP values for any GTApprox model with methods implemented in the *shap* module:

- Create a new
*shap.PermutationExplainer*, initializing it with*gtapprox.Model.calc()*as the model function. - Use the created explainer to get SHAP values for the model.

#### Example

```
explainer = shap.PermutationExplainer(p7model.calc, X)
shap_explanation = explainer(X) # explain all training sample points
shap.summary_plot(shap_explanation.values,
shap_explanation.data,
feature_names=shap_explanation.feature_names)
```

*By Svetlana Chernova, Head of Software Testing, DATADVANCE*