February 2, 2017

“Build. Validate. Explore.” - Part 2: Model Validator

In “Build. Validate. Explore.” - Part 1 we’ve explained how to create a surrogate model using Model Builder in pSeven. By default Model Builder with SmartSelection chooses technique automatically, yet sometimes you may need manual control over quality, e.g. when there are multiple quality criteria which lead to a set of Pareto optimal models. Then it's up to you to analyze and choose the suitable model.

In this post we will describe Model Validator - an interactive analysis tool that allows to estimate model’s quality (i.e. predictive performance) and compare different models. It allows to test models against reference data and find the most accurate model using error plots and statistics.

Visual Comparison

Model Validator can visualize two kinds of plots. You can switch them with Plot selector.

The first one is a quantile plot. It is useful for analyzing prediction accuracy error distribution. Each point shows percentiles - the fraction of sample points for which prediction error is lower than the value on the horizontal axis. The rule of thumb is: a steeper curve is better. It means that the longer the "tail" on top, the more accurate the model is.

Quantile plot of four different surrogate models (green model is the most accurate)

The second one is a scatter plot that directly compares reference sample outputs with model predictions. It is useful for spotting outliers and relations between the magnitude of errors in particular data domains.

Scatter plot of three different surrogate models

Absolute and normalized prediction error metrics

By default, the quantile plot and error metrics are based on absolute error values. Using the Errors selector you can switch them to normalized which is the absolute error divided by the standard deviation of the output from the reference sample. Normalized error is useful for estimating error significance considering the output value range.

The table at the bottom contains computed prediction accuracy error metrics. Best metric values are highlighted for your convenience. The metrics are:

R²: the coefficient of determination. Indicates the proportion of output variation that can be explained by the model
RMS: the root-mean-squared error
Maximum: the maximum prediction error
Q99: the 99th percentile. For 99% of reference points, prediction error is lower than this value
Q95: the 95th percentile. For 95% of reference points, prediction error is lower than this value
Median: the median of prediction error values
Mean: the arithmetic mean of prediction error values

R² (coefficient of determination) is the most robust metric; values closer to 1.0 are better. For other metrics, the lower the values, the better the prediction is.

Sources of Validation Sample Data

The Sample selector changes the source of reference data used for validation:

If “training” is selected, reference data is the model’s training sample. Validator estimates how well model fits its training sample.
If “test” is selected, reference data comes from the holdout test sample (specified in configuration). Validator estimates model’s ability to generalize on previously unseen data.
If “internal validation” is selected, validator estimates model’s performance using results of cross-validation.

It is recommended to use a test data sample when possible. Training sample validation tends to overestimate model accuracy. Low errors in the training sample (steeper error quantile curves) can actually be a sign of overfitting, especially if the same model shows significantly higher errors on a holdout test sample. When test data is not available, it is recommended to switch to internal validation: this data is obtained from internal validation tests that run during the building of the model.

In “Build. Validate. Explore.” - Part 3 we’ll see how to “look inside the model” and explore its behaviour with an interactive visual tool called Model Explorer. Stay tuned!

“Build. Validate. Explore.” - Part 2: Model Validator

Visual Comparison

Absolute and normalized prediction error metrics

Sources of Validation Sample Data

Interested in the solution?