October 13, 2016

“Build. Validate. Explore.” - Part 1: SmartSelection in Predictive Modeling Toolkit

What is SmartSelection?

SmartSelection is an intelligent model training technology in pSeven that automatically selects approximation technique and its options in order to obtain the most accurate approximation model.

SmartSelection allows the user to focus on solving particular task without delving into details of approximation techniques and methods by automating the trial-and-error process of approximation model construction. It features automatic model training technology which hides the complexity of underlying machine learning algorithms behind a user-friendly interface.

There’s a large set of approximation techniques under the hood. From simple linear regression or quadratic polynomial and splines to Gaussian processes,  original High Dimensional Approximation (HDA) method based on neural networks and gradient boosted regression trees. Every technique has its own tunable parameters. Each has its own strengths and weaknesses and no single technique is best for all possible and data sets, i.e. there’s No Free Lunch.

And the purpose of SmartSelection is to automate selection of the model with the best predictive performance by exploring different techniques and optimizing their parameters to find a minimum of approximation error on cross-validation or holdout test set.

Use Case

Let`s consider simple “Static mixer optimization” example from pSeven package. When an engineer wants to study certain process, the DOE is used to sample data. Data is used as training sample to create an approximation model.

In this example DOE samples 200 points:

  • 4 inputs: ‘Flow temperature’, ‘Pressure drop’, ‘1st flow velocity’, ‘2nd flow velocity’
  • 2 outputs: ‘Nozzle angle’, ‘Nozzle diameter’

Input Data Properties

Training sample is required at minimum. But additional data, if provided, may improve quality of approximation:

High-Level Hints

Three groups of high-level hints can be used to express domain knowledge, model requirements and time/quality constraints.

1. Domain knowledge about data underlying studied process

Any additional prior knowledge narrows search space of possible configurations and thus reduces training time and may influence the predictive performance of the final model.

2. Requirements for the model properties

3. Time constraints and quality management. Ballance time/quality tradeoff

  • or this example define acceptable quality with metric R2 = 0.99 on cross-validation
  • limit the time for the selection process: set nightly experiment

User Interface shows declared hints in a form of tags:

SmartSelection algorithm starts selection with given knowledge, requirements and time/quality tradeoff.

The quality of approximation can be measured in 3 different ways:

  1. Using Internal Validation
  2. Via splitting given training sample into train/test subsets
  3. Using additional holdout test sample

Optimal model is constructed for each outcome variable in case of vector (multidimensional) output.

Manual Mode vs. SmartSelection

For advanced users who want to get closer to core approximation techniques with all the knobs and switches Manual mode is available. But compared with it SmartSelection technology always gives similar or in most of the cases better approximation results.

Further Appliances

After the approximation model is built it can be validated on a new data and compared with other models using Model Validator, additionally smoothed, evaluated and exported in different formats (C, Octave, FMI etc.).

In “Build. Validate. Explore.” - Part 2 we will describe Model Validator - an interactive analysis tool that allows to estimate model’s quality (i.e. predictive performance) and compare different models. It allows to test models against reference data and find the most accurate model using error plots and statistics.

In “Build. Validate. Explore.” - Part 3 we’ll see how to “look inside the model” and explore its behaviour with an interactive visual tool called Model Explorer. Stay tuned!

Interested in the solution?

Click to request a free 30-day demo.

Request demo