April 27, 2021
Using SHAP with pSeven Core for Sensitivity Analysis
Model sensitivity analysis is becoming a more and more popular research method in engineering problems. Engineers often need to know whether it is possible to obtain a valid model using only a smaller subset of inputs, in order to speed up the design process and narrow the analysis. To decide which inputs are included in this subset, a task is commonly formulated with the question: which of the inputs are the most significant, and which ones affect the output least of all?
Sensitivity analysis techniques answer this question by providing a set of sensitivity values, which are numerical estimates of how the variability (uncertainty) in a model's output can be attributed to changes in its inputs. There are a lot of techniques to calculate these values. For example, the GTSDA (Sensitivity and Dependency Analysis) module in pSeven Core implements the following:
- Sobol indices — a variance-based sensitivity analysis method. It decomposes the output variance of a model into fractions, which can be attributed to inputs or sets of inputs.
- Screening indices — a qualitative sensitivity analysis method. It ranks model inputs in order of their influence on the output, but does not quantify the influence accurately.
- Taguchi Indices — a robust sensitivity analysis method for functions of discrete variables.
All of the above belong to global sensitivity analysis methods. They can be applied both to a data sample and to a blackbox model, which can evaluate the output function at any point in the design area. When analysing a blackbox model, the method itself generates an input sample evenly distributed in the design area.
Since version 6.20, pSeven Core supports one more sensitivity analysis method — calculating SHAP values.
SHAP (SHapley Additive exPlanations) is a game-theoretic approach for explaining the results of any machine learning model. It is based on the Shapley value concept, which in game theory allows evaluating the exact contribution of each player to the result of the game. Obviously, the game should be over by the time of analysis. Therefore, unlike global sensitivity analysis methods, SHAP refers to individual points in input, providing local explanations.
This tech tip illustrates using a GTApprox model to get its SHAP values for points in a given sample. A more detailed example of using SHAP with pSeven Core is also available from the pSeven Core User Manual (see SHapley Additive exPlanations).
Calculating SHAP values
For a GTApprox model, SHAP values can be obtained by:
- using the gtapprox.Model.shap_value() method from pSeven Core, or
- using the GTApprox model with methods from the shap module.
The gtapprox.Model.shap_value() is a fast optimized method, which is supported by the following GTApprox models:
- Any differentiable model — that is, a model with continuous variables.
- Any model trained with the GBRT (Gradient Boosted Regression Trees) technique — regardless of the types of variables.
The only restriction of the gtapprox.Model.shap_value() method is the lack of support for some models with categorical (discrete) variables. In turn, methods of the shap module are applicable to any GTApprox model, although they are usually more time-consuming.
This tech tip describes both ways of calculating SHAP values for a GTApprox model.
First of all, we have to train an example GTApprox model. The training sample in this example is the classic Boston Housing Dataset, which is also included in the SHAP examples and can be simply imported from the shap module. The dataset consists of 506 cases with 14 input features (home properties) and the home price as output.
Load the dataset
Train a model
Using shap_value() to get SHAP values
To get SHAP values from the trained GTApprox model, call its shap_value() method, passing a sample of inputs as an argument (see the pSeven Core User Manual for a detailed description).
The shap_value() method can work in two modes:
- The SHAP-compatible mode (default), in which it returns an explanation object that you can pass directly to methods of the shap module.
- The standalone mode, in which it returns SHAP values as arrays.
Both modes use the same optimized calculation method implemented in pSeven Core and differ only in the form of result presentation. Switching between modes is controlled by the shap_compatible argument of the shap_value() method.
SHAP-compatible mode example
The SHAP-compatible mode is default. In this mode, shap_value() returns a shap.Explanation object (see shap.Explanation). To initialize this object, shap_value() requires the shap module when it runs in the SHAP-compatible mode.
Standalone mode example
The standalone mode is enabled if you pass shap_compatible=False to shap_value(). This mode does not require the shap module. In the standalone mode, shap_value() returns a tuple of two objects, where the first is the baseline and the second is the array of SHAP values.
SHAP interaction values are supported by GBRT models only. To get SHAP interaction values, pass interactions=True to shap_value() (interactions are disabled by default).
Using GTApprox models with the shap module
As an alternative to using the gtapprox.Model.shap_value() method from pSeven Core, you can also calculate SHAP values for any GTApprox model with methods implemented in the shap module:
- Create a new shap.PermutationExplainer, initializing it with gtapprox.Model.calc() as the model function.
- Use the created explainer to get SHAP values for the model.
By Svetlana Chernova, Head of Software Testing, DATADVANCE