November 18, 2019

Mixture of Approximators for model improvement

Introduction

Many industrial applications require training an approximation model for the functions with significantly different behavior in different areas of the parameter space, for example:

function to be approximated is spatially inhomogeneous, i.e. the output effective behavior is dramatically different in different parts of the domain,
training data is inhomogeneous in input space.

That is where a special technique of pSeven, called Mixture of approximators (MoA), comes to play. The idea behind it is to decompose input space into domains, so that in each domain the variability of a given function is lower than in all the design space. If approximations are constructed for such domains and then are “glued”, then a more accurate approximation model can be obtained compared to the global approximation model, constructed at once for all the design space.

The MoA technique detects clusters not only in the variable space \((X) \), but both in the variable space and the response space \(X, Y\) to correctly identify areas of different behavior, i.e. knowledge about the behavior of responses is used as early as at the stage of data preprocessing.

That also makes it possible to effectively use an initial model — existing approximation model – as a source of information for better clustering, and ultimately improving this model.

In this tech tip, we describe the most common situations when the MoA technique is especially effective.

“Abrupt” behavior

The first example of MoA effectiveness is a step function. A weakness of almost any standard approximator is the Gibbs phenomenon, which does not allow to build an appropriate surrogate model for step region (Fig. 1). In the figure below, an approximation model is constructed by Gaussian processes (GP) technique.

moa-1

Fig. 1. Approximation of step function by GP technique

The large oscillations near the step are observed, which is a typical artifact for such kind of data.

In Fig. 2, Mixture of approximators is used and the resulting approximation is much better predicting the step since each part of the function is described by its own local model.

moa-2

Fig. 2. Approximation of step function by MoA technique

"Peaky" function

Another case is a function with a strongly non-stationary behavior like peaks, shown in Fig. 3.

moa-3

Fig. 3. Example of function with peak in the signal

This test function is described by the following expression:

\(y =\frac{\sin(7πx)}{πx}\)

Let’s build the approximation models with GP and MoA techniques on training sample of 15 points. The result is shown in Fig. 4.

moa-4

Fig. 4. Approximations by different techniques

The figure above shows that the sample size is not enough for building an accurate model by the GP technique (blue line). The more accurate model, that can catch the peak even with limited amount of points, is trained by Mixture of Approximators, which is marked with a red line.

“Improvement” case

As mentioned, MoA can accept the existing model as “initial” and improve it. Let’s consider a given function

\(f(x,y)=x+y+θ[x^2+y^2-0.25]-2θ[(x-0.7)^2+(y-0.7)^2-1]\),

where \(θ[z]\) is a Heaviside function

\(θ[z]=\begin{cases} 0,ifz \geq 0 \\ 1,ifz \lt 0 \end{cases}\)

The function \(f(x,y)\) is presented in Fig. 5.

moa-5

Fig. 5. Original funtion

An approximation model was trained with GP technique using a training sample of 400 points (Fig. 6, a). As expected, GP technique tends to smooth the abrupt changes and produces the artificial instability and waves.

moa-6

moa-6-2

Fig. 6. Approximations by different techniques

Using the same training set and previously trained GP model, we can obtain much more accurate approximation of initial function with MoA (Fig. 6, b). For better representation, diagonal cross-sections of both approximations are shown in Fig. 7.

moa-7

Fig. 7. Cross-sections of GP and MoA(GP) approximations

So applying MoA to the previously trained model allows to improve it. We can use an independent test set of 64 points to calculate the error metrics for GP and MoA(GP) models to quantify this improvement (Table 1).

Model	R²	RMS	Maximum	Q99	Q95	Median	Mean
GP	0.9317	0.1863	0.8548	0.8548	0.3337	0.0390	0.0913
MoA(GP)	0.9998	0.0113	0.0637	0.0637	0.0001	1.6597e-5	0.0020

Model Validator in pSeven also allows to illustrate this difference with scatter and error quantiles plots (Fig. 8).

moa-8

Fig.8. Scatter and quantile plots on test sample

Usage of MoA technique

In pSeven 6.16 the MoA technique is not included in the set of SmartSelection techniques, therefore the manual selection and setup are required. The manual mode in Model Builder allows to specify the MoA explicitly and to set the basic parameters. Keep defaults in the dialog if you doubt about the exact settings (Fig. 9).

moa-9

Fig.9. Manual mode for model building

If you have an existing model trained with any technique, when you can apply MoA on top of it using “Update…” button in Model Builder menu (Fig. 10) – MoA technique will be applied to the given model to improve it, based on the provided dataset, same or new.

moa-10

Fig.10. Updating an existing model by MoA in Model Builder tool

Conclusions

In this tech tip, we briefly covered the benefits of Mixture of approximators (MoA) technique. The main feature of the MoA algorithm is clustering an input design space into local domains and building approximations of each domain. In many cases, this approach allows obtaining much more accurate predictions and avoiding numerical artifacts. Also, if a previously trained model is available, MoA can be trained on top of it, improving the final approximation.