November 18, 2019

# Mixture of Approximators for Model Improvement

## Introduction

Many industrial applications require training an approximation model for the functions with significantly different behavior in different areas of the parameter space, for example:

- function to be approximated is spatially inhomogeneous, i.e. the output effective behavior is dramatically different in different parts of the domain,
- training data is inhomogeneous in input space.

That is where a special technique of pSeven, called **Mixture of approximators (MoA)**, comes to play. The idea behind it is to decompose input space into domains, so that in each domain the variability of a given function is lower than in all the design space. If approximations are constructed for such domains and then are “glued”, then a more accurate approximation model can be obtained compared to the global approximation model, constructed at once for all the design space.

The **MoA** technique detects clusters not only in the variable space \((X) \), but both in the variable space and the response space \(X, Y\) to correctly identify areas of different behavior, i.e. knowledge about the behavior of responses is used as early as at the stage of data preprocessing.

That also makes it possible to effectively use an initial model — existing approximation model – as a source of information for better clustering, and ultimately improving this model.

In this tech tip, we describe the most common situations when the **MoA** technique is especially effective.

## “Abrupt” behavior

The first example of** MoA** effectiveness is a step function. A weakness of almost any standard approximator is the Gibbs phenomenon, which does not allow to build an appropriate surrogate model for step region (Fig. 1). In the figure below, an approximation model is constructed by **Gaussian processes (GP)** technique.

*Fig. 1. Approximation of step function by GP technique*

The large oscillations near the step are observed, which is a typical artifact for such kind of data.

In Fig. 2, **Mixture of approximators** is used and the resulting approximation is much better predicting the step since each part of the function is described by its own local model.

*Fig. 2. Approximation of step function by MoA technique*

## "Peaky" function

Another case is a function with a strongly non-stationary behavior like peaks, shown in Fig. 3.

*Fig. 3. Example of function with peak in the signal*

This test function is described by the following expression:

\(y =\frac{\sin(7πx)}{πx}\)

Let’s build the approximation models with **GP** and **MoA** techniques on training sample of 15 points. The result is shown in Fig. 4.

*Fig. 4. Approximations by different techniques*

The figure above shows that the sample size is not enough for building an accurate model by the **GP** technique (blue line). The more accurate model, that can catch the peak even with limited amount of points, is trained by **Mixture of Approximators**, which is marked with a red line.

## “Improvement” case

As mentioned, **MoA** can accept the existing model as “initial” and improve it. Let’s consider a given function

\(f(x,y)=x+y+θ[x^2+y^2-0.25]-2θ[(x-0.7)^2+(y-0.7)^2-1]\),

where \(θ[z]\) is a Heaviside function

\(θ[z]=\begin{cases} 0,ifz \geq 0 \\ 1,ifz \lt 0 \end{cases}\)

The function \(f(x,y)\) is presented in Fig. 5.

*Fig. 5. Original funtion*

An approximation model was trained with **GP** technique using a training sample of 400 points (Fig. 6, a). As expected, **GP** technique tends to smooth the abrupt changes and produces the artificial instability and waves.

*Fig. 6. Approximations by different techniques*

Using the same training set and previously trained **GP** model, we can obtain much more accurate approximation of initial function with **MoA** (Fig. 6, b). For better representation, diagonal cross-sections of both approximations are shown in Fig. 7.

*a)*

<

*b)*

*Fig. 7. Cross-sections of GP and MoA(GP) approximations*

So applying **MoA** to the previously trained model allows to improve it. We can use an independent test set of 64 points to calculate the error metrics for **GP** and **MoA(GP)** models to quantify this improvement (Table 1).

Model |
R^{2} |
RMS |
Maximum |
Q99 |
Q95 |
Median |
Mean |

GP |
0.9317 | 0.1863 | 0.8548 | 0.8548 | 0.3337 | 0.0390 | 0.0913 |

MoA(GP) |
0.9998 | 0.0113 | 0.0637 | 0.0637 | 0.0001 | 1.6597e-5 | 0.0020 |

Model Validator in pSeven also allows to illustrate this difference with scatter and error quantiles plots (Fig. 8).

*Fig.8. Scatter and quantile plots on test sample*

## Usage of MoA technique

In pSeven 6.16 the **MoA** technique is not included in the set of SmartSelection techniques, therefore the manual selection and setup are required. The manual mode in Model Builder allows to specify the **Mo****A** explicitly and to set the basic parameters. Keep defaults in the dialog if you doubt about the exact settings (Fig. 9).

*Fig.9. Manual mode for model building*

If you have an existing model trained with any technique, when you can apply **MoA** on top of it using “Update…” button in Model Builder menu (Fig. 10) – **MoA** technique will be applied to the given model to improve it, based on the provided dataset, same or new.

*Fig.10. Updating an existing model by MoA in Model Builder tool*

## Conclusions

In this tech tip, we briefly covered the benefits of **Mixture of approximators (MoA)** technique. The main feature of the **MoA** algorithm is clustering an input design space into local domains and building approximations of each domain. In many cases, this approach allows obtaining much more accurate predictions and avoiding numerical artifacts. Also, if a previously trained model is available, **MoA** can be trained on top of it, improving the final approximation.