# 4.7. Option Reference¶

- Basic options
- Common options
- GTApprox/Accelerator —
five-position switch to control the trade-off between speed and accuracy (
*updated in 6.23*). - GTApprox/AccuracyEvaluation — require accuracy evaluation.
- GTApprox/ExactFitRequired —
require the model to fit sample data exactly (
*updated in 6.15*). - GTApprox/InternalValidation — enable or disable internal validation.
- GTApprox/LinearityRequired — require the model to be linear.
- GTApprox/LogLevel — minimum log level.

- GTApprox/Accelerator —
five-position switch to control the trade-off between speed and accuracy (

- Common options
- Advanced options
- Common options
- GTApprox/CategoricalOutputs —
specifies categorical outputs (
*added in 6.22*). - GTApprox/CategoricalVariables —
specifies categorical inputs (
*added in 6.3*). - GTApprox/Componentwise —
perform componentwise approximation of the output (
*deprecated since 6.3*). - GTApprox/DependentOutputs —
specify the type of dependency between output components (
*added in 6.3, updated in 6.15*). - GTApprox/Deterministic —
controls the behavior of randomized initialization algorithms in certain techniques (
*added in 5.0*). - GTApprox/Heteroscedastic —
treat input sample as a sample containing heteroscedastic noise (
*added in 1.9.0*). - GTApprox/InputDomainType —
specifies the input domain for the model (
*added in 6.16*). - GTApprox/InputNanMode —
specifies how to handle non-numeric values in the input part of the training sample (
*added in 6.8, updated in 6.19*). - GTApprox/InputsTolerance —
specifies tolerance up to which each input variable will be rounded (
*added in 6.3*). - GTApprox/MaxAxisRotations —
use rotation transformations in the input space to iteratively improve model quality (
*added in 6.11 Service Pack 1*). - GTApprox/MaxExpectedMemory —
maximum expected amount of memory allowed for model training (
*added in 6.4*). - GTApprox/MaxParallel —
maximum number of parallel threads (
*added in 5.0 Release Candidate 1, updated in 6.17*). - GTApprox/OutputNanMode —
specifies how to handle non-numeric values in the output part of the training sample (
*added in 6.8*). - GTApprox/OutputTransformation —
before training, apply transformation to the training sample outputs (
*added in 6.13 Service Pack 1*). - GTApprox/PartialDependentOutputs/RRMSThreshold —
if training a model with linear dependency between outputs, specifies the RRMS error threshold for the internal model of that dependency (
*added in 6.29*). - GTApprox/Seed —
fixed seed used in the deterministic training mode (
*added in 5.0*). - GTApprox/StoreTrainingSample —
save a copy of training data with the model (
*added in 6.6*). - GTApprox/SubmodelTraining —
select whether to train submodels in parallel or sequentially (
*added in 6.14*). - GTApprox/Technique —
specify the approximation algorithm to use (
*added in 1.9.2, updated in 6.8*). - GTApprox/TrainingAccuracySubsetSize —
limit the number of points selected from the training set to calculate model accuracy (
*added in 1.9.0*).

- GTApprox/CategoricalOutputs —
specifies categorical outputs (
- Gradient Boosted Regression Trees (GBRT)
- GTApprox/GBRTColsampleRatio —
column subsample ratio (
*added in 5.1*). - GTApprox/GBRTMaxDepth —
maximum regression tree depth (
*added in 5.1*). - GTApprox/GBRTMinChildWeight —
minimum total weight of points in a regression tree leaf (
*added in 5.1*). - GTApprox/GBRTMinLossReduction —
minimum significant reduction of loss function (
*added in 5.1*). - GTApprox/GBRTNumberOfTrees —
the number of regression trees to include in a model (
*added in 5.1*). - GTApprox/GBRTShrinkage —
shrinkage step, or learning rate (
*added in 5.1*). - GTApprox/GBRTSubsampleRatio —
row subsample ratio (
*added in 5.1*).

- GTApprox/GBRTColsampleRatio —
column subsample ratio (
- Gaussian Processes (GP)
- GTApprox/GPInteractionCardinality —
allowed orders of additive covariance function (
*added in 1.10.3*). - GTApprox/GPLearningMode —
give priority to either model accuracy or robustness (
*added in 1.9.6, updated in 6.17*). - GTApprox/GPLinearTrend — deprecated since 3.2, kept for compatibility.
- GTApprox/GPMeanValue — mean of the model output mean values.
- GTApprox/GPPower —
the value of
*p*in the*p*-norm, which is used to measure the distance between input vectors. - GTApprox/GPTrendType —
select trend type (
*added in 3.2*). - GTApprox/GPType —
select the covariance function (kernel) type (
*updated in 6.17*).

- GTApprox/GPInteractionCardinality —
allowed orders of additive covariance function (
- High Dimensional Approximation (HDA)
- GTApprox/HDAFDGauss —
include Gaussian functions into functional dictionary used in construction of approximations (
*updated in 6.14*). - GTApprox/HDAFDLinear —
include linear functions into functional dictionary used in construction of approximations (
*updated in 6.14*). - GTApprox/HDAFDSigmoid —
include sigmoid functions into functional dictionary used in construction of approximations (
*updated in 6.14*). - GTApprox/HDAHessianReduction —
maximum proportion of data used in evaluating the Hessian (
*added in 1.6.1*). - GTApprox/HDAMultiMax —
maximum number of basic approximators constructed during one approximation phase (
*updated in 6.14*). - GTApprox/HDAMultiMin —
minimum number of basic approximators constructed during one approximation phase (
*updated in 6.14*). - GTApprox/HDAPhaseCount —
maximum number of approximation phases (
*updated in 6.14*). - GTApprox/HDAPMax —
maximum allowed approximator complexity (
*updated in 6.14*). - GTApprox/HDAPMin —
minimum allowed approximator complexity (
*updated in 6.14*).

- GTApprox/HDAFDGauss —
include Gaussian functions into functional dictionary used in construction of approximations (
- Internal Validation (IV)
- GTApprox/IVDeterministic —
controls the behavior of the pseudorandom algorithm selecting data subsets
in cross validation (
*added in 5.0*). - GTApprox/IVSavePredictions —
save model values calculated during internal validation (
*added in 2.0 Release Candidate 2*). - GTApprox/IVSeed —
fixed seed used in the deterministic cross validation mode (
*added in 5.0*). - GTApprox/IVSubsetCount —
the number of subsets into which the training sample is divided for cross validation (
*updated in 6.19*). - GTApprox/IVSubsetSize —
the size of a training sample subset used as test data in a cross validation session (
*added in 6.19*). - GTApprox/IVTrainingCount —
an upper limit for the number of training sessions in cross validation (
*updated in 6.19*).

- GTApprox/IVDeterministic —
controls the behavior of the pseudorandom algorithm selecting data subsets
in cross validation (
- Mixture of Approximators (MoA)
- GTApprox/MoACovarianceType —
type of covariance matrix to use in Gaussian Mixture Model (
*added in 1.10.0, updated in 6.11*). - GTApprox/MoANumberOfClusters —
the number of clusters (
*added in 1.10.0*). - GTApprox/MoAPointsAssignment —
select the technique for assigning points to clusters (
*added in 1.10.0*). - GTApprox/MoAPointsAssignmentConfidence —
confidence for points assignment technique based on Mahalanobis distance (
*added in 1.10.0*). - GTApprox/MoATechnique —
approximation technique for local models (
*added in 1.10.0, updated in 6.3*). - GTApprox/MoATypeOfWeights —
type of weights to use for construct final approximation (
*added in 1.10.0*). - GTApprox/MoAWeightsConfidence —
the value to control smoothness of weights based on sigmoid function
and Mahalanobis distance (
*added in 1.10.0*).

- GTApprox/MoACovarianceType —
type of covariance matrix to use in Gaussian Mixture Model (
- Response Surface Model (RSM)
- GTApprox/RSMCategoricalVariables —
specifies categorical variables (
*deprecated since 6.3*). - GTApprox/RSMElasticNet/L1_ratio —
specifies the ratio between L1 and L2 regularization in the ElasticNet type regularization (
*added in 6.1*). - GTApprox/RSMFeatureSelection —
specifies the regularization and term selection procedures (
*added in 6.1, updated in 6.17*). - GTApprox/RSMMapping — specifies mapping type for data pre-processing.
- GTApprox/RSMStepwiseFit/inmodel —
selects the starting model for stepwise-fit regression (
*updated in 6.14*). - GTApprox/RSMStepwiseFit/penter —
specifies
*p*-value of inclusion for stepwise-fit regression. - GTApprox/RSMStepwiseFit/premove —
specifies
*p*-value of exclusion for stepwise-fit regression. - GTApprox/RSMType —
specifies the type of response surface model (
*updated in 6.17*).

- GTApprox/RSMCategoricalVariables —
specifies categorical variables (
- Sparse Gaussian Processes (SGP)
- GTApprox/SGPNumberOfBasePoints —
the number of base points used to approximate the full covariance matrix
of the points from the training sample (
*updated in 6.14*).

- GTApprox/SGPNumberOfBasePoints —
the number of base points used to approximate the full covariance matrix
of the points from the training sample (
- Splines with Tension (SPLT)
- GTApprox/SPLTContinuity —
required approximation smoothness (
*updated in 6.17*).

- GTApprox/SPLTContinuity —
required approximation smoothness (
- Tensor Approximation (TA)
- GTApprox/EnableTensorFeature —
enable automatic selection of TA and iTA techniques (
*added in 1.9.2*). - GTApprox/TADiscreteVariables —
list of discrete input variables (
*deprecated since 6.3*). - GTApprox/TALinearBSPLExtrapolation —
use linear extrapolation for BSPL factors (
*added in 1.9.4*). - GTApprox/TALinearBSPLExtrapolationRange —
set linear BSPL extrapolation range (
*added in 1.9.4*). - GTApprox/TAModelReductionRatio —
sets the ratio of model complexity reduction (
*added in 6.2*). - GTApprox/TAReducedBSPLModel — deprecated since 6.2, kept for compatibility.
- GTApprox/TensorFactors — describes tensor factors to use in the Tensor Approximation technique.

- GTApprox/EnableTensorFeature —
enable automatic selection of TA and iTA techniques (

- Common options

**GTApprox/Accelerator**

Five-position switch to control the trade-off between training speed and model quality.

Value: integer in range from 1 (prefer quality, lower speed) to 5 (prefer speed, lower quality) Default: 1 (prefer quality) Changed in version 5.1:

GTApprox/Acceleratoraffects the GTApprox/GBRTMaxDepth and GTApprox/GBRTNumberOfTrees options in manual training.Changed in version 6.23:

GTApprox/Acceleratoraffects RSM parameters estimation in smart training.This option changes several internal parameters of approximation techniques, which allow the trade-off between training speed and model quality. When you use smart training or manual training with the GP, GBRT, or HDA technique,

GTApprox/Acceleratoralso implicitly changes values of some dependent options. You can override these changes by manually setting dependent options: if you set bothGTApprox/Acceleratorand some dependent option, GTApprox will use your value of this dependent option, not the value automatically set byGTApprox/Accelerator.In smart training (

`build_smart()`

), settingGTApprox/Acceleratorto 4 disables the stepwise regression and ElasticNet algorithms when tuning parameters of the RSM technique; setting it to 5 additionally disables the multiple ridge algorithm in RSM to speed up model training (see section Parameters Estimation in Response Surface Model for details). You can override these changes by setting the GTApprox/RSMFeatureSelection option manually.Dependent GP option in manual training is GTApprox/GPLearningMode: it is set to

`"Accurate"`

ifGTApprox/Acceleratoris 1 or 2.Dependent GBRT options in manual training are GTApprox/GBRTMaxDepth and GTApprox/GBRTNumberOfTrees.

GTApprox/Acceleratorsets them as follows:

GTApprox/Accelerator 1 2 3 4 5 GTApprox/GBRTMaxDepth 10 10 10 6 6 GTApprox/GBRTNumberOfTrees 500 400 300 200 100 HDA technique settings affected by

GTApprox/Acceleratorin manual training depend on input sample size. There are two cases:

- Common sample size (the sample contains less than 10 000 points).
- Big sample size (the sample contains 10 000 points or more).
In the case of a commonly sized sample, dependent options are GTApprox/HDAFDGauss, GTApprox/HDAMultiMax, GTApprox/HDAMultiMin and GTApprox/HDAPhaseCount.

GTApprox/Acceleratorsets them as follows:

GTApprox/Accelerator 1 2 3 4 5 GTApprox/HDAFDGauss 1 1 0 0 0 GTApprox/HDAMultiMax 10 6 4 4 2 GTApprox/HDAMultiMin 5 4 2 2 1 GTApprox/HDAPhaseCount 10 7 5 1 1 In the case of a big sized sample, dependent options are GTApprox/HDAFDGauss, GTApprox/HDAHessianReduction, GTApprox/HDAMultiMax, GTApprox/HDAMultiMin, GTApprox/HDAPhaseCount, GTApprox/HDAPMax, and GTApprox/HDAPMin.

GTApprox/Acceleratorsets them as follows:

GTApprox/Accelerator 1 2 3 4 5 GTApprox/HDAFDGauss 0 0 0 0 0 GTApprox/HDAHessianReduction 0.3 0.3 0 0 0 GTApprox/HDAMultiMax 3 2 2 2 1 GTApprox/HDAMultiMin 1 1 1 1 1 GTApprox/HDAPhaseCount 5 5 3 1 1 GTApprox/HDAPMax 150 150 150 150 150 GTApprox/HDAPMin 150 150 150 150 150

**GTApprox/AccuracyEvaluation**

Require accuracy evaluation.

Value: Boolean Default: off If this option is on (

`True`

), then, in addition to the approximation, constructed model will contain a function providing an estimate of the approximation error as a function on the design space.See Evaluation of accuracy in given point for details.

**GTApprox/CategoricalOutputs**

Specifies categorical outputs.

Value: a list of zero-based indexes of outputs Default: `[]`

(no discrete variables)New in version 6.22.

Treat listed outputs as categorical, which can take only predefined values (levels). For a categorical output, each unique value from the training sample becomes a level. The value of a categorical output predicted by the model is always one of the levels.

See section Categorical Outputs for more details.

Note

Instead of using this option, you can specify categorical outputs if you pass the output (response) part of the training sample as a

`pandas.DataFrame`

. Columns with dtype categorical, string, Boolean, or object are interpreted as categorical data.Note

Categorical outputs are not compatible with the dependent outputs mode and the partial linear dependency mode enabled by the GTApprox/DependentOutputs option.

**GTApprox/CategoricalVariables**

Specifies categorical input variables.

Value: a list of zero-based indexes of input variables Default: `[]`

(no categorical variables)New in version 6.3.

Treat listed variables as categorical. These variables can take only predefined values (levels). For every categorical variable, each unique value from the training sample becomes a level. Note that a categorical variable never takes a value not found in the training sample, and a model with categorical inputs cannot be evaluated for values of categorical variables that were not found in the training sample.

See section Categorical Variables for more details.

Note

Instead of using this option, you can specify categorical inputs if you pass the input (variable) part of the training sample as a

`pandas.DataFrame`

. Columns with dtype categorical, string, Boolean, or object are interpreted as categorical data.If you specify tensor factors for the TA and TGP techniques manually, you can select categorical variables with GTApprox/TensorFactors instead of specifying

GTApprox/CategoricalVariables. Using both these options at the same time is not recommended since they can conflict; see section Categorical Variables for TA, iTA and TGP techniques for more details.

**GTApprox/Componentwise**

Perform componentwise approximation (independent outputs).

Value: Boolean or `"Auto"`

Default: `"Auto"`

Deprecated since version 6.3: kept for compatibility, use GTApprox/DependentOutputs instead.

Prior to 6.3, this option was used to enable componentwise approximation which was disabled by default.

Since 6.3, componentwise approximation is enabled by default, and can be disabled with GTApprox/DependentOutputs. Now if

GTApprox/Componentwiseis default (`"Auto"`

), GTApprox/DependentOutputs takes priority. IfGTApprox/Componentwiseis not default while GTApprox/DependentOutputs is`"Auto"`

, thenGTApprox/Componentwisetakes priority. In case when values of these two options conflict, GTApprox raises the`InvalidOptionsError`

exception. However, this conflict is ignored if the output is 1-dimensional.

**GTApprox/DependentOutputs**

Specify type of dependency between output components.

Value: Boolean, `"PartialLinear"`

, or`"Auto"`

Default: `"Auto"`

New in version 6.3.

Changed in version 6.15: added the linear dependencies mode (

`"PartialLinear"`

).Selects which approximation mode to use when training a model with multidimensional output (see Output Dependency Modes for details).

`True`

: treat different components of the output as possibly dependent, do not use componentwise approximation.`False`

: assume that output components are independent and use componentwise approximation.`"PartialLinear"`

: before training, search for linear dependencies between outputs in the training data. If such dependencies are found, train a model which keeps the dependencies. In this case, the submodels of independent outputs are trained in the componentwise mode.`"Auto"`

(default): use componentwise approximation unless it is explicitly disabled by GTApprox/Componentwise.When

GTApprox/DependentOutputsis default (`"Auto"`

), componentwise approximation is enabled unless GTApprox/Componentwise is set to a non-default value, which takes priority. As a result, ifGTApprox/DependentOutputsis`"Auto"`

but GTApprox/Componentwise is`False`

, componentwise approximation is disabled. This is done to avoid conflicts with older versions. Note that GTApprox/Componentwise is a deprecated option that is kept for version compatibility only and should not be used since 6.3.Note

The TBL technique ignores this option. See Table Function for details.

Note

The dependent outputs mode and the partial linear dependency mode are not compatible with categorical outputs (GTApprox/CategoricalOutputs).

**GTApprox/Deterministic**

Controls the behavior of randomized initialization algorithms in certain techniques.

Value: Boolean Default: on New in version 5.0.

Several model training techniques in GTApprox feature randomized initialization of their internal parameters. These techniques include:

- GBRT, which can select random subsamples of the full training set when creating regression trees (see section Stochastic Boosting).
- HDA and HDAGP, which use randomized initialization of approximator parameters.
- MoA, if the approximation technique for its local models is set to HDA, HDAGP or SGP using GTApprox/MoATechnique, or the same selection is done automatically.
- SGP, which uses randomized selection of base points when approximating the full covariance matrix of the points from the training sample (Nystrom method).
- TA, if for some of its factors the HDA technique is specified manually or is selected automatically (see GTApprox/TensorFactors).
The determinacy of randomized techniques can be controlled in the following way:

- If
GTApprox/Deterministicis on (deterministic training mode, default), a fixed seed is used in all randomized initialization algorithms. The seed is set by GTApprox/Seed. This makes the technique behavior reproducible — for example, two models trained in deterministic mode with the same data, same GTApprox/Seed and other settings will be exactly the same, since a training algorithm is initialized with the same parameters.- Alternatively, if
GTApprox/Deterministicis off (non-deterministic training mode), a new seed is generated internally every time you train a model. As a result, models trained with randomized techniques may slightly differ even if all settings and training samples are the same. In this case, GTApprox/Seed is ignored. The generated seed that was actually used for initialization can be found in model info, so later the training run can still be reproduced exactly by switching to the deterministic mode and setting GTApprox/Seed to this value.In case of randomized techniques, repeated non-deterministic training runs may be used to try obtaining a more accurate approximation, because results will be slightly different. On the contrary, deterministic techniques always produce exactly the same model given the same training data and settings, and are not affected by

GTApprox/Deterministicand GTApprox/Seed. Deterministic techniques include:

**GTApprox/EnableTensorFeature**

Enable automatic selection of the TA and iTA techniques.

Value: Boolean Default: on New in version 1.9.2: allows the automatic selection of the iTA technique. Previously affected only the TA technique selection.

If on (

`True`

), makes TA and iTA techniques available for auto selection. If off (`False`

), neither TA nor iTA will ever be selected automatically based on decision tree. Has no effect if any approximation technique is selected manually using the GTApprox/Technique option.Note

This option does not enable the automatic selection of the TGP technique.

**GTApprox/ExactFitRequired**

Require the model to fit sample data exactly.

Value: Boolean Default: off If this option is on, the model fits the points of the training sample exactly — that is, model responses in the points which were included in the training sample are equal to the response values in the training sample.

If

GTApprox/ExactFitRequiredis off then no fitting condition is imposed, and the approximation can be either fitting or non-fitting depending on the training data. Typical example: if GTApprox finds that the sample is noisy, it does not create an exact-fitting model to avoid overtraining.Note that the exact fit mode is not supported by some approximation techniques. In particular, it is incompatible with the robust version of GP-based techniques (see GTApprox/GPLearningMode). For details on other techniques, see their descriptions in the Techniques section.

Changed in version 4.2: added the exact fit mode support to the TA technique (see Tensor Products of Approximations).

Changed in version 6.15: the HDAGP technique, which does not support the exact fit mode, now raises an

`InvalidOptionsError`

exception ifGTApprox/ExactFitRequiredis on. Previously HDAGP silently ignored this option.Changed in version 6.15: it is no longer possible to train a model with

GTApprox/ExactFitRequiredon and GTApprox/GPLearningMode set to`"Robust"`

. Now this combination is explicitly prohibited and raises an exception.Note

GTApprox/ExactFitRequiredis not compatible with output noise variance and point weighting (see the outputNoiseVariance and weights arguments to`build()`

). IfGTApprox/ExactFitRequiredis on and whichever of the above arguments is not`None`

,`build()`

raises an`InvalidOptionsError`

exception.For more information on the effects of this option, see section Exact Fit.

**GTApprox/GBRTColsampleRatio**

Column subsample ratio.

Works only forGradient Boosted Regression Treestechnique.

Value: floating point number in range \((0, 1]\) Default: 1.0 New in version 5.1.

The GBRT technique uses random subsamples of the full training set when training weak estimators (regression trees).

GTApprox/GBRTColsampleRatiospecifies the fraction of columns (input features) to be included in a subsample: for example, setting it to 0.5 will randomly select half of the input features to form a subsample.For more details, see section Stochastic Boosting.

**GTApprox/GBRTMaxDepth**

Maximum regression tree depth.

Works only forGradient Boosted Regression Treestechnique.

Value: non-negative integer Default: 0 (auto) New in version 5.1.

Sets the maximum depth allowed for each regression tree (GBRT weak estimator). Greater depth results in a more complex final model.

Default (0) means that the tree depth will be set by GTApprox/Accelerator as follows:

GTApprox/Accelerator 1 2 3 4 5 GTApprox/GBRTMaxDepth 10 10 10 6 6 For example, if both options are default (

GTApprox/GBRTMaxDepthis 0 and GTApprox/Accelerator is 1), actual depth setting is 10.For more details, see section Model Complexity.

**GTApprox/GBRTMinChildWeight**

Minimum total weight of points in a regression tree leaf.

Works only forGradient Boosted Regression Treestechnique.

Value: non-negative floating point number Default: 1 New in version 5.1.

The GBRT technique stops growing a branch of a regression tree if the total weight of points assigned to a leaf becomes less than

GTApprox/GBRTMinChildWeight. If the sample is not weighted, this is the same as limiting the number of points in a leaf. Zero minimum weight means that no such limit is imposed.For more details, see section Leaf Weighting.

**GTApprox/GBRTMinLossReduction**

Minimum significant reduction of loss function.

Works only forGradient Boosted Regression Treestechnique.

Value: non-negative floating point number Default: 0 New in version 5.1.

The GBRT technique stops growing a branch of a regression tree if the reduction of loss function (model’s mean square error over the training set) becomes less than

GTApprox/GBRTMinLossReduction.For more details, see section Model Complexity.

**GTApprox/GBRTNumberOfTrees**

The number of regression trees in the model.

Works only forGradient Boosted Regression Treestechnique.

Value: non-negative integer Default: 0 (auto) New in version 5.1.

Sets the number of weak estimators (regression trees) in a GBRT model, the same as the number of gradient boosting stages. Greater number results in a more complex final model.

Changed in version 5.2: 0 is allowed and means auto setting.

Default (0) means that the number of trees will be set by GTApprox/Accelerator as follows:

GTApprox/Accelerator 1 2 3 4 5 GTApprox/GBRTNumberOfTrees 500 400 300 200 100 For example, if both options are default (

GTApprox/GBRTNumberOfTreesis 0 and GTApprox/Accelerator is 1), the actual number of trees is 500.For more details, see section Model Complexity.

Note that in incremental training the auto (0) number of trees is not affected by GTApprox/Accelerator but depends on the number of trees in the initial model and training sample sizes — see Incremental Training for details.

**GTApprox/GBRTShrinkage**

Shrinkage step, or learning rate

Works only forGradient Boosted Regression Treestechnique.

Value: floating point number in range \((0, 1]\) Default: 0.3 New in version 5.1.

GBRT scales each weak estimator by a factor of

GTApprox/GBRTShrinkage, resulting in a klnd of regularization with smaller step values.For more details, see section Shrinkage.

**GTApprox/GBRTSubsampleRatio**

Row subsample ratio

Works only forGradient Boosted Regression Treestechnique.

Value: floating point number in range \((0, 1]\) Default: 1.0 New in version 5.1.

The GBRT technique uses random subsamples of the full training set when training weak estimators (regression trees).

GTApprox/GBRTSubsampleRatiospecifies the fraction of rows (sample points) to be included in a subsample: for example, setting it to 0.5 will randomly select half of the points to form a subsample.For more details, see section Stochastic Boosting.

**GTApprox/GPInteractionCardinality**

Allowed orders of additive covariance function.

Works forGaussian Processes, Sparse Gaussian ProcessandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: list of unique unsigned integers in range \([1, dim(X)]\) each Default: `[]`

(equivalent to`[1, n]`

, \(n = dim(X)\))New in version 1.10.3.

This option takes effect only when using the additive covariance function (GTApprox/GPType is set to

`"Additive"`

), otherwise it is ignored. In particular, the TGP technique always ignores this option since its covariance function is always`"Wlp"`

.The additive covariance function is a sum of products of one-dimensional covariance functions, where each additive component (a summand) depends on a subset of initial input variables.

GTApprox/GPInteractionCardinalitydefines the degree of interaction between input variables by specifying allowed subset sizes, which are in fact the allowed values of covariance function order.All values in the list should be unique, and neither of them can be greater than the number of input components, excluding constant inputs (the effective dimension of the input part of the training sample).

Consider an

n-dimensional \(X\) sample with \(m\) variable and \(n-m\) constant components (sample matrix columns). ValidGTApprox/GPInteractionCardinalitysettings then would be:

`[1, n]`

: simplified syntax, implicitly converts to`[1, m]`

.`[1, 2, ... m-1, m, m+1, ... k]`

, where \(m < k \le n\): treated as a consecutive list of interactions up to cardinalityk, implicitly converts to`[1, 2, ... m-1, m]`

. Note that in this case all values from 1 tomhave to be included in the list, otherwise it is considered invalid.`[i1, i2, ... ik]`

, where \(i_j \le m\): valid list of interaction cardinalities, no conversion needed.

**GTApprox/GPLearningMode**

Give priority to either model accuracy or robustness.

Value: `"Accurate"`

,`"Robust"`

, or`"Auto"`

Default: `"Auto"`

New in version 1.9.6.

Changed in version 6.15: added the

`"Auto"`

value which is now default (was`"Accurate"`

).Changed in version 6.17: the

`"Auto"`

behavior now depends on GTApprox/Accelerator.This option affects the Gaussian processes-based techniques: GP, TGP, and TA with GP factors. These techniques can use different versions of the training algorithm. The accurate version aims to minimize model errors, but is prone to unwanted effects related to overtraining, which decrease model quality. The robust version prevents overtraining at the cost of a possible decrease in model accuracy; this version is also incompatible with the exact fit mode (see GTApprox/ExactFitRequired).

Using the robust version is recommended. The

`"Auto"`

setting defaults to the robust version and selects the accurate version only when:

- GTApprox/ExactFitRequired is enabled, or
- GTApprox/Accelerator is
`1`

or`2`

.

**GTApprox/GPLinearTrend**

Deprecated since version 3.2: kept for compatibility only, use GTApprox/GPTrendType instead.

Since version 3.2 this option is deprecated by a more advanced GTApprox/GPTrendType option which allows to select linear, quadratic, or none trend type.

**GTApprox/GPMeanValue**

Specifies mean of model output mean values

Works forGaussian Processes, Sparse Gaussian Process, High Dimensional Approximation combined with Gaussian ProcessesandTensored Gaussian Processestechniques.

Value: list of floating point numbers Default: `[]`

(automatic estimate)Model output mean values are essential for constructing GP approximation. These values may be defined by user or estimated using the given sample (the bigger and more representative is the sample, the better is the estimate of model output mean values). Model output mean values misspecification leads to decrease in approximation accuracy: the larger the error in output mean values, the worse is the final approximation model. If left default (empty list), model output mean values are estimated using the given sample.

Option value is a list of floating point numbers. This list should either be empty or contain a number of elements equal to output dataset dimensionality.

**GTApprox/GPPower**

The value of

pin thep-norm which is used to measure the distance between input vectors

Works forGaussian Processes, Sparse Gaussian Process, High Dimensional Approximation combined with Gaussian ProcessesandTensored Gaussian Processestechniques.

Value: floating point number in range \([1, 2]\) Default: 2.0 The main component of the Gaussian Processes based regression is the covariance function measuring the similarity between two input points. The covariance between two input uses

p-norm of the difference between coordinates of these input points. The casep= 2 corresponds to the usual gaussian covariance function (better suited for modelling of smooth functions) and the casep= 1 corresponds to laplacian covariance function (better suited for modelling of non-smooth functions).For the GP technique, this option takes effect only if GTApprox/GPType is

`"Wlp"`

or`"Additive"`

. The TGP technique is always affected byGTApprox/GPPower, since it always uses the common covariance function (denoted`"Wlp"`

) and disregards the GTApprox/GPType setting.

**GTApprox/GPTrendType**

Specifies the trend type.

Works forGaussian Processes, Sparse Gaussian Process, High Dimensional Approximation combined with Gaussian ProcessesandTensored Gaussian Processestechniques.

Value: `"None"`

,`"Linear"`

,`"Quadratic"`

, or`"Auto"`

Default: `"Auto"`

New in version 3.2.

This option allows to take into account specific (linear or quadratic) behavior of the modeled dependency by selecting which type of trend to use.

`"None"`

— no trend.`"Linear"`

— linear trend.`"Quadratic"`

— polynomial trend with constant, linear and pure quadratic terms (no interaction terms, no feature selection).`"Auto"`

— automatic selection, defaults to no trend unless GTApprox/GPLinearTrend is on (provides compatibility with the deprecated GTApprox/GPLinearTrend option).

**GTApprox/GPType**

Select the kernel function type for the Gaussian processes-based techniques (GP, SGP, and HDAGP, excluding TGP).

Value: `"Additive"`

,`"Mahalanobis"`

,`"Wlp"`

,`"Periodic"`

, or`"Auto"`

Default: `"Auto"`

Changed in version 1.10.3: added the additive kernel function.

Changed in version 6.16: added the periodic kernel function.

Changed in version 6.17: added the

`"Auto"`

setting, which is now default.Selects the kernel function used in Gaussian processes. Available kernels:

`"Additive"`

: summarized coordinate-wise products of 1-dimensional Gaussian covariance functions. With this setting, GTApprox/GPInteractionCardinality may be used to set the degree of interaction between input variables.`"Mahalanobis"`

: squared exponential covariance function with Mahalanobis distance.`"Wlp"`

: common exponential Gaussian covariance function with weighted \(L_p\) distance.`"Periodic"`

: periodic covariance function. Using this kernel potentially allows to create an approximation model with periodic extrapolation.`"Auto"`

: primarily intended for compatibility with`build_smart()`

, where it explicitly “unlocks” the option for smart training. In`build()`

, defaults to`"Wlp"`

.If set to

`"Additive"`

when the input part of the training sample is 1-dimensional (that is, there is only 1 variable), then the additive covariance function is implicitly replaced with the common covariance function (denoted`"Wlp"`

), and the GTApprox/GPInteractionCardinality option value is ignored.Note

The TGP technique ignores this option and always uses the common covariance function (denoted

`"Wlp"`

).

**GTApprox/Heteroscedastic**

Treat input sample as a sample containing heteroscedastic noise.

Value: Boolean or `"Auto"`

Default: `"Auto"`

New in version 1.9.0.

If this option is on (

`True`

), the builder assumes that heteroscedastic noise variance is present in the input sample. Default value (`"Auto"`

) currently means that option is off.This option has certain limitations:

- It is valid for GP and HDAGP techniques only. For other techniques the value is ignored (treated as always off).
- Heteroscedasticity is incompatible with covariance functions other than
`"Wlp"`

: ifGTApprox/Heteroscedasticis`True`

and GTApprox/GPType is not`"Wlp"`

, exception will be thrown.- If noise variance is given, the
GTApprox/Heteroscedasticoption is ignored and non-variational GP (or HDAGP) technique is used.See corresponding Heteroscedastic data section for details.

**GTApprox/HDAFDGauss**

Include Gaussian functions into functional dictionary used in construction of approximations

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: `"No"`

,`"Ordinary"`

or`"Auto"`

Default: `"Auto"`

Changed in version 6.14: added the

`"Auto"`

value which is now default (was`"Ordinary"`

).In order to construct an approximation, the linear expansion in functions from special functional dictionary is used. This option controls whether Gaussian functions should be included into functional dictionary used in construction of approximation.

In general, using Gaussian functions as building blocks for construction of approximation can lead to significant increase in accuracy, especially in the case when the approximable function is bell-shaped. However, it may also significantly increase training time.

**GTApprox/HDAFDLinear**

Include linear functions into functional dictionary used in construction of approximations

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: `"No"`

,`"Ordinary"`

or`"Auto"`

Default: `"Auto"`

Changed in version 6.14: added the

`"Auto"`

value which is now default (was`"Ordinary"`

).In order to construct an approximation, the linear expansion in functions from special functional dictionary is used. This option controls whether linear functions should be included into functional dictionary used in construction of approximation or not.

In general, using linear functions as building blocks for construction of approximation can lead to increase in accuracy, especially in the case when the approximable function has significant linear component. However, it may also increase training time.

**GTApprox/HDAFDSigmoid**

Include sigmoid functions into functional dictionary used in construction of approximations

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: `"No"`

,`"Ordinary"`

or`"Auto"`

Default: `"Auto"`

Changed in version 6.14: added the

`"Auto"`

value which is now default (was`"Ordinary"`

).In order to construct an approximation, the linear expansion in functions from special functional dictionary is used. This option controls whether sigmoid-like functions should be included into functional dictionary used in construction of approximation or not.

In general, using sigmoid-like functions as building blocks for construction of approximation can lead to increase in accuracy, especially in the case when the approximable function has square-like or discontinuity regions. However, it may also lead to significant increase in training time.

**GTApprox/HDAHessianReduction**

Maximum proportion of data used in evaluating Hessian matrix

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: floating point number in range \([0, 1]\) Default: 0.0 New in version 1.6.1.

This option shrinks maximum amount of data points for Hessian estimation (used in high-precision algorithm). If the value is 0, the whole set of points is used in Hessian estimation, otherwise, if the value is in range \((0;1]\), only a part (smaller than HDAHessianReduction of the whole set) is used. Reduction is used only in case of samples bigger than 1250 input points (if number of points is smaller than 1250, this option is ignored and Hessian is estimated by the whole train sample).

Note

In some cases, the high-precision algorithm can be disabled automatically, regardless of the

GTApprox/HDAHessianReductionvalue. This happens if:

- \((dim(X) + 1) \cdot p \ge 4000\), where
dim(X)is the dimension of the input vectorXandpis a total number of basis functions, or- \(dim(X) \ge 25\), where
dim(X)is the dimension of the input vectorX, or- there are no sufficient computational resources to use the high precision algorithm.

**GTApprox/HDAMultiMax**

Maximum number of basic approximators constructed during one approximation phase.

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: integer in range \([\)GTApprox/HDAMultiMin\(, 1000]\), or 0 (auto) Default: 0 (auto) Changed in version 6.14: added 0 as a valid value for automatic selection, which is now default (was 10).

This option specifies the maximum number of basic approximators constructed during one approximation phase. Option value is a positive integer which must be greater than or equal to the value of GTApprox/HDAMultiMin option. This option sets upper limit to the number of basic approximators, but does not require this limit to be reached (approximation algorithm stops constructing basic approximators as soon as construction of subsequent basic approximator does not increase accuracy). In general, the bigger the value of

GTApprox/HDAMultiMaxis, the more accurate is the constructed approximator. However, increasing the value may lead to significant training time increase and/or overtraining in some cases.

**GTApprox/HDAMultiMin**

Minimum number of basic approximators constructed during one approximation phase.

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: integer in range \([1,\) GTApprox/HDAMultiMax\(]\), or 0 (auto) Default: 0 (auto selection) Changed in version 6.14: added 0 as a valid value for automatic selection, which is now default (was 5).

This option specifies the minimum number of basic approximators constructed during one approximation phase. Option value is a positive integer which must be less than or equal to the value of GTApprox/HDAMultiMax option. In general, the bigger the value of

GTApprox/HDAMultiMinis, the more accurate is the constructed approximator. However, increasing the value may lead to significant training time increase and/or overtraining in some cases.

**GTApprox/HDAPhaseCount**

Maximum number of approximation phases.

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: integer in range \([1, 50]\), or 0 (auto) Default: 0 (auto) Changed in version 6.14: added 0 as a valid value for automatic selection, which is now default (was 10).

This option specifies the maximum possible number of approximation phases. It sets upper limit to that number only, and does not require the limit to be reached (approximation algorithm stops performing new phases as soon as the subsequent approximation phase does not increase accuracy). In general, the more approximation phases, the more accurate approximator is built. However, increasing maximum number of approximation phases may lead to significant training time increase and/or overtraining in some cases.

**GTApprox/HDAPMax**

Maximum allowed approximator complexity.

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: integer in range \([\)GTApprox/HDAPMin\(, 5000]\), or 0 (auto) Default: 0 (auto) Changed in version 6.14: added 0 as a valid value for automatic selection, which is now default (was 150).

This option specifies the maximum allowed complexity of the approximator. Its value must be greater than or equal to the value of the

GTApprox/HDAPMinoption. The approximation algorithm selects the approximator with optimal complexitypOptfrom the range \([\)GTApprox/HDAPMin, GTApprox/HDAPMax\(]\). Optimality here means that, depending on the complexity of approximable function behavior and the size of the available training sample, constructed approximator with complexitypOptfits this function in the best possible way compared to other approximators with complexity in range \([\)GTApprox/HDAPMin, GTApprox/HDAPMax\(]\). Thus theGTApprox/HDAPMaxvalue should be big enough in order to select the approximator with complexity being the most appropriate for the considered problem. Note, however, that increasing theGTApprox/HDAPMaxvalue may lead to significant increase in training time and/or overtraining in some cases.

**GTApprox/HDAPMin**

Minimum allowed approximator complexity.

Works forHigh Dimensional ApproximationandHigh Dimensional Approximation combined with Gaussian Processestechniques.

Value: integer in range \([1,\) GTApprox/HDAPMax\(]\), or 0 (auto) Default: 0 (auto) Changed in version 6.14: 0 is now a special value which enables automatic selection.

This option specifies the minimum allowed complexity of the approximator. Its value must be less than or equal to the value of the

GTApprox/HDAPMaxoption. The approximation algorithm selects the approximator with optimal complexitypOptfrom the range \([\)GTApprox/HDAPMin,GTApprox/HDAPMax\(]\). Optimality here means that, depending on the complexity of approximable function behavior and the size of the available training sample, constructed approximator with complexitypOptfits this function in the best possible way compared to other approximators with complexity in range \([\)GTApprox/HDAPMin,GTApprox/HDAPMax\(]\). Thus theGTApprox/HDAPMinvalue should not be too big in order to select the approximator with complexity being the most appropriate for the considered problem. Note that increasing theGTApprox/HDAPMinvalue may lead to significant increase in training time and/or overtraining in some cases.

**GTApprox/InputDomainType**

Specifies the input domain for the model.

Value: `"unbound"`

,`"manual"`

,`"box"`

, or`"auto"`

Default: `"unbound"`

New in version 6.16.

By default, a GTApprox model has an unlimited input domain — that is, model functions are defined everywhere, and

`calc()`

and other evaluation methods always return numeric values. Such model fits the training sample data but tends to linear extrapolation outside the input space region covered by the training sample.This option limits the input domain by adding input constraints to the model. The model then returns NaN for inputs which do not satisfy the constraints (points outside the input domain).

The input domain type can be:

`"unbound"`

(default) — unlimited input domain, the same as in all pSeven Core versions prior to 6.16.`"manual"`

— a box-bound domain specified manually using the x_meta argument to`build()`

or`build_smart()`

. See section Model Metainformation for details on usage.`"box"`

— a box-bound domain which is an intersection of:

- the training sample’s bounding box, determined automatically by GTApprox, and
- the box specified by x_meta, if any.
`"auto"`

— a box-bound domain with additional quadratic constraint. This is an intersection of:

- the training sample’s bounding box,
- the region bound by an ellipsoid which envelops the training sample, and
- the box specified by x_meta, if any.
If you use a limited input domain, the

`"auto"`

type is recommended because it “cuts empty corners” from the sample’s bounding box, so the input constraints represent the training data better.Note

For models trained with the PLA technique, the input domain is by default limited to the region inside the convex hull of the training sample (see Algorithm Details in Piecewise Linear Approximation). This constraint is more strict than the sample’s bounding box and the enveloping ellipsoid, so

GTApprox/InputDomainTypebecomes mostly irrelevant. You can use it to trim the default domain by tightening the input bounds in x_meta, otherwise it has no effect for PLA.Note

Another option which adds input constraints to the model is GTApprox/OutputNanMode (when set to

`"predict"`

). See also Input Constraints in Model Details for explanations on how the input constraints are represented in the model.

**GTApprox/InputNanMode**

Specifies how to handle non-numeric values in the input part of the training sample.

Value: `"raise"`

,`"ignore"`

Default: `"raise"`

New in version 6.8.

Changed in version 6.19: for the GBRT technique only,

`"ignore"`

means to accept points where some (but not all) inputs are NaN, and these points are actually used in training.With the exception of the GBRT technique, GTApprox cannot obtain any information from non-numeric (NaN or infinity) values of variables. This option controls its behavior when encountering such values:

- Default (
`"raise"`

) raises an exception and cancels training.- For the GBRT technique,
`"ignore"`

excludes data points containing infinity input values from the sample, and excludes points where all inputs are NaN. Points where only some inputs are NaN are kept and actually used in training.- For all other techniques,
`"ignore"`

excludes all points with non-numeric values before training.

**GTApprox/InputsTolerance**

Specifies up to which tolerance each input variable would be rounded.

Value: list of length \(dim(X)\) of floating point numbers Default: `[]`

New in version 6.3.

If default, option does nothing. Otherwise each input variable in training sample is rounded up to specified tolerance. Note that this may lead to merge of some points.

See section Sample Cleanup for details.

**GTApprox/InternalValidation**

Enable or disable internal validation.

Value: Boolean Default: off If this option is on (

`True`

) then, in addition to the approximation, the constructed model contains a table of cross validation errors of different types, which may serve as a measure of accuracy of approximation.See Model Validation chapter for details.

**GTApprox/IVDeterministic**

Controls the behavior of the pseudorandom algorithm selecting data subsets in cross validation.

Value: Boolean Default: on New in version 5.0.

Cross validation involves partitioning the training sample into a number of subsets (defined by GTApprox/IVSubsetCount) and randomized combination of these subsets for each training (validation) session. Since the algorithm that combines subsets is pseudorandom, its behavior can be controlled in the following way:

- If
GTApprox/IVDeterministicis on (deterministic cross validation mode, default), a fixed seed is used in the combination algorithm. The seed is set by GTApprox/IVSeed. This makes cross-validation reproducible — a different combination is selected for each session, but if you repeat a cross validation run, for each session it will select the same combination as the first run.- Alternatively, if
GTApprox/IVDeterministicis off (non-deterministic cross validation mode), a new seed is generated internally for every run, so cross validation results may slightly differ. In this case, GTApprox/IVSeed is ignored. The generated seed that was actually used in cross validation can be found in model info, so results can still be reproduced exactly by switching to the deterministic mode and setting GTApprox/IVSeed to this value.Final model is never affected by

GTApprox/IVDeterministicbecause it is always trained using the full sample.

**GTApprox/IVSavePredictions**

Save model values calculated during internal validation.

Value: Boolean or `"Auto"`

Default: `"Auto"`

New in version 2.0 Release Candidate 2.If this option is on (

`True`

), internal validation information, in addition to error values, also contains raw validation data: model values calculated during internal validation, as well as validation inputs and outputs.

**GTApprox/IVSeed**

Fixed seed used in the deterministic cross validation mode.

Value: positive integer Default: 15313 New in version 5.0.

Fixed seed for the pseudorandom algorithm that selects the combination of data subsets for each cross validation session.

GTApprox/IVSeedhas an effect only if GTApprox/IVDeterministic is on — see its description for more details.

**GTApprox/IVSubsetCount**

The number of cross validation subsets.

Value: 0 (auto) or an integer in range \([2, |S|]\), where \(|S|\) is the training sample size Default: 0 (auto) Changed in version 6.19:

GTApprox/IVSubsetCountis no longer required to be less than GTApprox/IVTrainingCount, since the latter now sets an upper limit for the number of cross validation sessions instead of the exact number of sessions.The number of subsets into which the training sample is divided for cross validation. The subsets are of approximately equal size.

GTApprox/IVSubsetCountcannot be set together with GTApprox/IVSubsetSize. Default (0) means that the number of subsets is determined by the sample size and GTApprox/IVSubsetSize. If both options are default, the number and size of subsets are selected automatically based on the sample size.

**GTApprox/IVSubsetSize**

The size of a cross validation subset.

Value: 0 (auto) or an integer in range \([1, \frac{2}{3}|S|]\), where \(|S|\) is the training sample size Default: 0 (auto) New in version 6.19.

The size of a sample subset used as test data in a cross validation session. This option may be more convenient than GTApprox/IVSubsetCount when the training sample size is not known or is a parameter. In such cases, GTApprox can automatically determine the required number of subsets, given their size. If the sample cannot be evenly divided into subsets of the given size, then sizes of some subsets are adjusted to fit. The maximum valid option value is \(\frac{2}{3}\) of the sample size, however in this case the actual subset size is adjusted to \(\frac{1}{2}\) of the sample size.

Practically this option configures leave-

n-out cross validation, wherenis the option value. Since the number of subsets — hence the number of cross validation sessions — can get too high for smalln, it is recommended to limit the number of sessions with GTApprox/IVTrainingCount. Otherwise model training may take much time, because each session trains a dedicated internal validation model.

GTApprox/IVSubsetSizecannot be set together with GTApprox/IVSubsetCount. Default (0) means that the subset size is determined by the sample size and GTApprox/IVSubsetCount. If both options are default, the number and size of subsets are selected automatically based on the sample size.

**GTApprox/IVTrainingCount**

The maximum allowed number of training sessions in cross validation.

Value: positive integer or 0 (auto) Default: 0 (auto) Changed in version 6.19: now sets an upper limit instead of the exact number of sessions, and is no longer required to be less than GTApprox/IVSubsetCount.

Each GTApprox cross validation session includes the following steps:

- Select one of the cross validation subsets to be the test data.
- Prepare the complement of the selected subset, which is the training sample excluding the test data.
- Train an internal validation model, using this complement as the training sample — so the test data is excluded from training.
- Calculate error metrics for the validation model, using the previously selected test data subset.
Internal validation repeats such sessions with different test subsets, until the number of sessions reaches

GTApprox/IVTrainingCount, or there are no more subsets to test (each subset may be tested only once).The number and sizes of cross validation subsets are determined by GTApprox/IVSubsetCount and GTApprox/IVSubsetSize, and are selected by GTApprox if both these options are default. If

GTApprox/IVTrainingCountis also default, GTApprox sets an appropriate limit for the number of sessions, based on the training sample size.

**GTApprox/LinearityRequired**

Require the model to be linear.

Value: Boolean Default: off If this option is on (

`True`

), then the approximation is constructed as a linear function which fits the training data optimally. If option is off (`False`

), then no condition related to linearity is imposed on the approximation: it can be either linear or non-linear depending on which fits training data best.Note

The TGP technique does not support linear models: if GTApprox/Technique is

`"TGP"`

,GTApprox/LinearityRequiredshould be off.

**GTApprox/LogLevel**

Set minimum log level.

Value: `"Debug"`

,`"Info"`

,`"Warn"`

,`"Error"`

,`"Fatal"`

Default: `"Info"`

If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.

**GTApprox/MaxExpectedMemory**

Maximum expected amount of memory (in GB) allowed for model training.

Value: positive integer or 0 (no limit) Default: 0 (no limit) New in version 6.4.

This option currently works for the GBRT technique only.

GTApprox/MaxExpectedMemoryis intended to avoid the case when a long training process fails due to memory overflow, spending much time and giving no results. IfGTApprox/MaxExpectedMemoryis not default, GTApprox tries to estimate the expected memory usage at each stage of the training algorithm, and if the estimate exceeds the option value, the training is suspended: the process stops and`build()`

returns a “partially trained” model which then can be trained incrementally (see Incremental Training).To check whether the training stopped due to memory limit violation or for other reasons, you can test the value of

`model.info['ModelInfo']['Builder']['Details']['/GTApprox/MemoryOverflowDetected']`

. This key is present in`info`

only ifGTApprox/MaxExpectedMemorywas non-default when training the model; its value is`True`

ifGTApprox/MaxExpectedMemorystopped the training and`False`

otherwise.With

GTApprox/MaxExpectedMemoryset it is also possible that the training sample is so big that it never can be processed with the allowed amount of memory; in this case, the training does not start, and`build()`

raises an`OutOfMemoryError`

exception.If

GTApprox/MaxExpectedMemoryis default (0, no limit) or training technique is not GBRT, then GTApprox does not try to prevent memory overflow and simply raises`OutOfMemoryError`

if and then it happens.See also

Training with Limited Memory example.

**GTApprox/MaxAxisRotations**

Use rotation transformations in the input space to iteratively improve model quality.

Value: positive integer or -1 (auto) Default: 0 (no rotations)

New in version 6.11 Service Pack 1.This option enables a special training mode which can improve quality of models trained using Gaussian processes-based techniques (HDA, GP, HDAGP, and SGP) in some cases where the training sample is non-uniformly distributed. After training an initial model, it evaluates model gradients in the training points and uses the principal component analysis algorithm to create a model input projection matrix. Then it applies the input transformation and trains a new model which improves the initial one. The process is repeated until an internal quality criterion is satisfied or the maximum number of iterations is reached. The final model is a weighted combination of all models trained in the process.

Option values are:

`0`

(default): iterative training is disabled.`-1`

(auto): selects the number of iterations automatically with respect to the approximation technique, training dataset size, GTApprox/Accelerator value and the GTApprox/ExactFitRequired setting.- Any other value sets the maximum allowed number of iterations explicitly. The process may finish before this maximum is reached, if the internal quality criterion is satisfied.
This option works with the HDA, GP, HDAGP, and SGP techniques only. Note that enabling it can significantly increase training time, since a new model is trained internally on each iteration.

See example_axis_rotations.py for usage example.

**GTApprox/MaxParallel**

Sets the maximum number of parallel threads to use when training a model.

Value: integer in range \([1, 512]\), or 0 (auto) Default: 0 (auto)

New in version 5.0 Release Candidate 1.GTApprox can run in parallel to speed up model training. This option sets the maximum number of threads the builder is allowed to create.

Changed in version 6.0: auto (0) sets the number of threads to 1 for small training samples.

Changed in version 6.12: auto (0) tries to detect hyper-threading CPUs in order to use only physical cores.

Changed in version 6.15: added the upper limit for the option value, previously was any positive integer.

Changed in version 6.17: changed the upper limit to 512 (was 100000).

Default (auto) behavior depends on the value of the

`OMP_NUM_THREADS`

environment variable.If

`OMP_NUM_THREADS`

is set to a valid value, this value is the maximum number of threads by default. Note that`OMP_NUM_THREADS`

must be set before the Python interpreter starts.If

`OMP_NUM_THREADS`

is unset, set to 0 or an invalid value, the default maximum number of threads is equal to the number of cores detected by GTApprox. However, there are two exceptions:

- Parallelization becomes inefficient in the case of a small training sample. For small training samples, only 1 thread is used by default.
- On hyper-threading CPUs using all logical cores has been found to negatively affect the training performance. If a hyper-threading CPU is detected, the default maximum number of threads is set to half the number of cores (to use only physical cores).
The behavior described above is only for the default (0) option value. If you set this option to a non-default value, it will be the maximum number of threads, regardless of the sample size and your CPU.

**GTApprox/MoACovarianceType**

Type of covariance matrix to use when creating the Gaussian Mixture Model for the Mixture of Approximators technique.

Value: `"Full"`

,`"Tied"`

,`"Diag"`

,`"Spherical"`

,`"BIC"`

, or`"Auto"`

.Default: `"Auto"`

New in version 1.10.0.

`"Full"`

— all covariance matrices are positive semidefinite and symmetric.`"Tied"`

— all covariance matrices are positive semidefinite, symmetric, and equal.`"Diag"`

— all covariance matrices are diagonal.`"Spherical"`

— diagonal matrix with equal elements on its diagonal.`"BIC"`

— the type of covariance matrix and effective number of clusters are selected according to Bayesian Information Criterion.`"Auto"`

— optimal covariance type for each possible number of clusters is chosen according to the clustering quality.Changed in version 6.11: added the

`"Auto"`

value which is now default (previously default was`"BIC"`

).This option allows the user to control accuracy and training time of the MoA technique. For example, if it is known that design space consists of regions of regularity having similar structure it may be reasonable to use

`"Tied"`

matrix for Gaussian Mixture Models.`"Full"`

has the slowest training time and`"Diag"`

and`"Spherical"`

have the fastest training time. In`"BIC"`

mode Gaussian Mixture Models are constructed for all types of covariance matrices and numbers of clusters, the best one in sense of Bayesian Information Criterion (BIC) is chosen.In

`"Auto"`

mode optimal covariance types are selected for each possible number of clusters according to the clustering quality based on the cluster’s tightness and separation assessment.

**GTApprox/MoANumberOfClusters**

Sets the number of design space clusters.

Works only forMixture of Approximatorstechnique.

Value: list of positive integers, or an empty list (auto) Default: `[]`

(auto)New in version 1.10.0.

New in version 1.11.0: empty list is also a valid value which selects the number of clusters automatically.

If set, the effective number of clusters is selected from the list according to Bayesian Information Criterion (BIC). To fix the number of clusters, you may specify a list containing a single positive integer. Default (

`[]`

) selects the number of clusters automatically, based on the training sample size and input dimension.

**GTApprox/MoAPointsAssignment**

Select the technique for assigning points to clusters.

Works only forMixture of Approximatorstechnique, seeDesign Space Decomposition.

Value: `"Probability"`

or`"Mahalanobis"`

.Default: `"Probability"`

New in version 1.10.0.

`"Probability"`

corresponds to points assignment based on posterior probability.`"Mahalanobis"`

corresponds to points assignment based on Mahalanobis distance.For the Mahalanobis distance based technique, the confidence value \(\alpha\) may be changed using the GTApprox/MoAPointsAssignmentConfidence option.

**GTApprox/MoAPointsAssignmentConfidence**

This option sets confidence value for points assignment technique based on Mahalanobis distance.

Works only forMixture of Approximatorstechnique, seeDesign Space Decomposition.

Value: floating point number in range \((0, 1)\). Default: 0.97 New in version 1.10.0.

This option allows to control size of clusters. The greater this value is the greater will be the cluster size.

**GTApprox/MoATechnique**

This option specifies approximation technique for local models created by the Mixture of Approximators (MoA) technique.

Value: `"SPLT"`

,`"HDA"`

,`"GP"`

,`"HDAGP"`

,`"SGP"`

,`"TA"`

,`"iTA"`

,`"TGP"`

,`"RSM"`

,`"GBRT"`

,`"PLA"`

, or`"Auto"`

.Default: `"Auto"`

New in version 1.10.0.

Changed in version 5.1: added GBRT to the list of available techniques.

Changed in version 6.3: added PLA to the list of available techniques.

The option allows to control local approximation technique. It sets the same technique for all local models.

Note that MoA performs sample clustering and the resulting subsamples may lose certain properties of the input sample. For example, it is possible that the input sample has tensor structure, but the subsamples do not, so the TA and iTA techniques become inapplicable to local models.

**GTApprox/MoATypeOfWeights**

This option sets the type of weighting used for “gluing” local approximations.

Works only forMixture of Approximators,seeCalculating Model Output.

Value: `"Probability"`

or`"Sigmoid"`

.Default: `"Probability"`

New in version 1.10.0.

`"Probability"`

corresponds to weights based on posterior probability.`"Sigmoid"`

corresponds to weights based on sigmoid function.Sigmoid weighting can be fine-tuned with GTApprox/MoAWeightsConfidence.

**GTApprox/MoAWeightsConfidence**

This option sets confidence for sigmoid based weights.

Works only forMixture of Approximators,seeCalculating Model Output.

Value: floating point number in range \((0, 1)\); must be greater than GTApprox/MoAPointsAssignmentConfidence Default: 0.99 New in version 1.10.0.

This options controls smoothness of weights. The greater this value is the smoother will be weights providing more smooth approximation.

**GTApprox/OutputNanMode**

Specifies how to handle non-numeric values in the output part of the training sample.

Value: `"raise"`

,`"ignore"`

, or`"predict"`

Default: `"raise"`

New in version 6.8.

By convention, NaN output values signify undefined function behavior. This option controls whether the model should try to predict undefined behavior. If set to

`"predict"`

, NaN values in training sample outputs are accepted, and the model will return NaN values in regions that are close to those points for which training sample contained NaN output values. Default (`"raise"`

) means that NaN output values are not accepted and GTApprox raises an exception and cancels training if they are found;`"ignore"`

means that such points are excluded from the sample, and training continues.Note

This option adds specific input constraints to the model. These constraints are combined with the constraints added by GTApprox/InputDomainType. See Input Constraints in Model Details for details on their representation.

**GTApprox/OutputTransformation**

Apply transformation to the training sample outputs before training the model.

Value: a string or a list of strings specifying the transformation Default: `"none"`

(no transformations)

New in version 6.13 Service Pack 1.This option is intended to improve accuracy of models trained on data where values of some outputs are exponentially distributed. For such outputs, a log transformation can reduce the distribution skew, resulting in a more accurate approximation. The model is trained on the transformed data and automatically applies the reverse transformation when evaluated.

Transformations are denoted by strings with the following meanings:

`"auto"`

— use statistical tests to determine whether transformation should be applied to an output. Automatically applies transformation if the distribution of output values is statistically similar to an exponential distribution.`"lnp1"`

— applies one of the following transformations, depending on whether you have set the output thresholds for the given output:

- If the output thresholds are not set, applies log transformation of the form \(y^* = \text{sgn}(y) \cdot \ln(|y| + 1)\), where \(\text{sgn}\) is the sign function.
- If only one of the thresholds is set (lower \(y_{min}\) or upper \(y_{max}\)), applies log transformation of the form \(y^* = \ln(\max({y-y_{min}}, \epsilon))\) or \(y^* = \ln(\max({y_{max}-y}, \epsilon))\), respectively.
- If both thresholds are set, applies the
logittransformation: \(y^* = \ln \frac{\max({y - y_{min}},\epsilon)}{\max({y_{max}-y},\epsilon)}\).`"none"`

— disable transformation, values of outputs are passed to the model builder as is.Note that the strings used in this option’s value (

`"auto"`

,`"lnp1"`

, and`"none"`

) are case-sensitive.If option value is a single string, the same setting is applied to all outputs. For example:

from da.p7core import gtapprox builder = gtapprox.Builder() # test each output to determine whether transformation should be applied to it builder.options.set("GTApprox/OutputTransformation", "auto")The list form can be used to apply a specific setting to each output, or to enable automatic transformation for selected outputs only. For example:

from da.p7core import gtapprox builder = gtapprox.Builder() # assuming there are 10 outputs, test outputs 0, 5, and 9, and disable transformation for others transforms = ["none"]*10 for i in (0, 5, 9): transforms[i] = "auto" builder.options.set("GTApprox/OutputTransformation", transforms)The order of outputs in the list is the order of respective output columns in the training sample.

Note

If you train a model with the GBRT or HDAGP technique and use an initial model (the initial_model argument in

`build()`

and`build_smart()`

), thenGTApprox/OutputTransformationmust be set either to`"auto"`

or to the same value which was used when training the initial model (you can get it from the initial model’s`details`

, see Model Details).This option also accepts such values as an empty string

`""`

, a string containing only whitespaces`" "`

, or an empty list`[]`

. Their meaning is the same as`"none"`

(do not apply any transformations).

**GTApprox/PartialDependentOutputs/RRMSThreshold**

Specifies the RRMS error threshold for the internal model of linear dependency between outputs.

Value: floating point number in range \([10^{-15}, 1]\) Default: \(10^{-5}\) New in version 6.29.

When search for linear dependencies between outputs in the training data is enabled (GTApprox/DependentOutputs is set to

`"PartialLinear"`

), GTApprox attempts to fit output data to a linear model (see Output Dependency Modes). This option sets the maximum allowed error of that model; if GTApprox cannot reach the error threshold, it is assumed that there is no linear dependency between outputs.

**GTApprox/RSMCategoricalVariables**

Specifies categorical variables.

Value: a list of zero-based indexes of input variables Default: `[]`

(no categorical variables)Deprecated since version 6.3: kept for compatibility only, use GTApprox/CategoricalVariables instead.

Deprecated option previously used to specify categorical variables for the RSM technique.

**GTApprox/RSMElasticNet/L1_ratio**

Specifies ratio between L1 and L2 regularization.

Works only forResponse Surface Modeltechnique.

Value: list of floats in range \([0, 1]\). Default: `[]`

New in version 6.1.

Each element of the list sets the trade-off between L1 and L2 regularization: 1 means L1 regularization only, while 0 means L2 regularization only. The best value among given is chosen via cross-validation procedure. If none is given (default) RSM with pure L1 regularization is constructed.

**GTApprox/RSMFeatureSelection**

Specifies the regularization and term selection procedures.

Works only forResponse Surface Modeltechnique.

Value: `"LS"`

,`"RidgeLS"`

,`"MultipleRidgeLS"`

,`"ElasticNet"`

,`"StepwiseFit"`

, or`"Auto"`

.Default: `"Auto"`

Changed in version 6.17: added the

`"Auto"`

setting, which is now default.Sets the technique to use for regularization and term selection:

`"LS"`

— ordinary least squares (no regularization, no term selection).`"RidgeLS"`

— least squares with Tikhonov regularization (no term selection).`"MultipleRidgeLS"`

— multiple ridge regression that also filters non-important terms.`"ElasticNet"`

— linear combination of L1 and L2 regularizations.`"StepwiseFit"`

— ordinary least squares regression with stepwise inclusion and exclusion for term selection.`"Auto"`

: primarily intended for compatibility with`build_smart()`

, where it explicitly “unlocks” the option for smart training. In`build()`

, defaults to`"RidgeLS"`

.

**GTApprox/RSMMapping**

Specifies mapping type for data pre-processing.

Works only forResponse Surface Modeltechnique.

Value: `"None"`

,`"MapStd"`

or`"MapMinMax"`

Default: `"MapStd"`

The technique to use for data pre-processing:

`"None"`

- no data pre-processing.`"MapStd"`

- linear mapping of standard deviation for each variable to \([-1, 1]\) range.`"MapMinMax"`

- linear mapping of values for each variable to \([-1, 1]\) range.

**GTApprox/RSMStepwiseFit/inmodel**

Selects the starting model for stepwise-fit regression.

Works only for theResponse Surface Modeltechnique, when stepwise regression is selected.

Value: `"IncludeAll"`

,`"ExcludeAll"`

,`"Auto"`

Default: `"Auto"`

Changed in version 6.14: added the

`"Auto"`

setting, which is now default instead of`"IncludeAll"`

.This option specifies the terms initially included in the model when stepwise-fit regression is used (GTApprox/RSMFeatureSelection is set to

`"StepwiseFit"`

).

`"IncludeAll"`

starts with a full model (all terms included).`"ExcludeAll"`

assumes none of the terms are included at the starting step.`"Auto"`

selects the type of the initial model automatically according to the number of terms. If the number of terms is low enough, regression starts with a full model (similar to`"IncludeAll"`

); otherwise, if the number of terms is high, no terms are included in the initial model (similar to`"ExcludeAll"`

).Note that depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms.

**GTApprox/RSMStepwiseFit/penter**

Specifies

p-value of inclusion for stepwise-fit regression.

Works only forResponse Surface Modeltechnique.

Value: floating point number in range \((0,\) GTApprox/RSMStepwiseFit/premove\(]\) Default: 0.05 Option value is the maximum

p-value ofF-test for a term to be added into the model. Generally, the higher the value, the more terms are included into the final model.

**GTApprox/RSMStepwiseFit/premove**

Specifies

p-value of exclusion for stepwise-fit regression.

Works only forResponse Surface Modeltechnique.

Value: floating point number in range \([\)GTApprox/RSMStepwiseFit/penter\(, 1)\) Default: 0.10 Option value is the minimum

p-value ofF-test for a term to be removed from the model. Generally, the higher the value, the more terms are included into the final model.

**GTApprox/RSMType**

Specifies the type of a response surface model.

Value: `"Linear"`

,`"Interaction"`

,`"Quadratic"`

,`"PureQuadratic"`

, or`"Auto"`

Default: `"Auto"`

Changed in version 6.8: default is

`"Linear"`

(was`"PureQuadratic"`

).Changed in version 6.17: added the

`"Auto"`

setting, which is now default.This option restricts the type of terms that may be included into the regression model.

`"Linear"`

— only constant and linear terms may be included.`"Interaction"`

— constant, linear, and interaction terms may be included.`"Quadratic"`

— constant, linear, interaction, and quadratic terms may be included.`"PureQuadratic"`

— only constant, linear, and quadratic terms may be included (interaction terms are excluded).`"Auto"`

is primarily intended for compatibility with`build_smart()`

, where it explicitly “unlocks” the option for smart training. In`build()`

, defaults to`"Linear"`

.

**GTApprox/Seed**

Fixed seed used in the deterministic training mode.

Value: positive integer Default: 15313 New in version 5.0.

In the deterministic training mode,

GTApprox/Seedsets the seed for randomized initialization algorithms in certain techniques. See GTApprox/Deterministic for more details.

**GTApprox/SGPNumberOfBasePoints**

The number of base points used to approximate the full covariance matrix of the points from the training sample.

Works only forSparse Gaussian Processtechnique.

Value: integer in range \([1, 4000]\) Default: 1000 Changed in version 6.14: upper limit is now 4000 (was \(2^{31}-2\)).

Base points (subset of regressors) are selected randomly among points from the training sample and used for the reduced rank approximation of the full covariance matrix of the points from the training sample. Reduced rank approximation is done using Nystrom method for selected subset of regressors. Note that if the value of this option is greater than the dataset size, then GP technique is used instead of SGP.

**GTApprox/SPLTContinuity**

Smoothness requirement for SPLT approximation.

Value: `"C1"`

,`"C2"`

, or`"Auto"`

Default: `"Auto"`

Changed in version 6.17: added the

`"Auto"`

setting, which is now default.Sets the smoothness requirement for the SPLT technique (see 1D Splines with tension).

`"C1"`

requires continuous first derivative.`"C2"`

requires continuous second derivative.`"Auto"`

is primarily intended for compatibility with`build_smart()`

, where it explicitly “unlocks” the option for smart training. In`build()`

, defaults to`"C2"`

.This option is ignored by all techniques other than SPLT.

**GTApprox/StoreTrainingSample**

Save a copy of training data with the model.

Value: Boolean or `"Auto"`

Default: `"Auto"`

New in version 6.6.

If on, the trained model will store a copy of the training sample in

`training_sample`

. If off, this attribute will be an empty list. The`"Auto"`

setting currently defaults to “off”.Note that in case of GBRT incremental training (see Incremental Training) setting

GTApprox/StoreTrainingSamplesaves only the last (most recent) training sample on each training iteration.

**GTApprox/SubmodelTraining**

Select whether to train submodels in parallel or sequentially.

Value: `"Sequential"`

,`"Parallel"`

, or`"Auto"`

Default: `"Auto"`

New in version 6.14.

This option can be used to force or disable parallel training of submodels.

`"Sequential"`

: different submodels are never trained simultaneously. Parallel threads will be used only if the selected approximation technique supports parallelization internally (on the algorithm level).`"Parallel"`

: parallel threads will be used to train multiple submodels simultaneously. If some submodel is trained by a technique which supports parallelization internally, it can use several threads if available.`"Auto"`

(default): determines the mode to use automatically, depending on the approximation settings and properties of the training sample.See Submodels and Parallel Training for details.

**GTApprox/TADiscreteVariables**

Specifies discrete input variables.

Value: a list of zero-based indexes of input variables Default: `[]`

(no discrete variables)Deprecated since version 6.3: kept for compatibility only, use GTApprox/CategoricalVariables instead.

Deprecated option previously used to specify categorical variables for the TA technique.

**GTApprox/TALinearBSPLExtrapolation**

Use linear extrapolation for BSPL factors.

Works forTensor Products of ApproximationsandIncomplete Tensor Products of Approximationstechniques.

Value: Boolean or `"Auto"`

Default: `"Auto"`

New in version 1.9.4.

This option allows to switch extrapolation type for BSPL factors to linear. By default, BSPL factors extrapolate to constant. If

GTApprox/TALinearBSPLExtrapolationis`True`

, extrapolation will be linear in the range specified by the GTApprox/TALinearBSPLExtrapolationRange option, and fall back to constant outside this range.

`True`

: use linear extrapolation in the range specified by GTApprox/TALinearBSPLExtrapolationRange.`False`

: do not use linear extrapolation (always use constant extrapolation).`"Auto"`

: defaults to`False`

.This option affects only the Tensor Products of Approximations (including Incomplete Tensor Products of Approximations) models that contain BSPL factors. It does not affect non-BSPL factors at all, and if a Tensor Products of Approximations model is built using only non-BSPL factors, this option is ignored.

**GTApprox/TALinearBSPLExtrapolationRange**

Sets linear BSPL extrapolation range.

Works forTensor Products of ApproximationsandIncomplete Tensor Products of Approximationstechniques.

Value: floating point number in range \((0, \infty)\) Default: 1.0 New in version 1.9.4.

Sets the range in which the BSPL factors extrapolation will be linear (see GTApprox/TALinearBSPLExtrapolation) relatively to the variable range of this factor in the training sample. This setting “expands” the sample range: let \(x_{min}\) and \(x_{max}\) be the minimum and maximum value of a variable found in the sample (BSPL factors are always 1-dimensional), then the extrapolation range is \((x_{max} - x_{min}) \cdot (1 + 2r)\), where \(r\) is the

GTApprox/TALinearBSPLExtrapolationRangeoption value (the range is expanded by \((x_{max} - x_{min}) \cdot r\) on each bound).This option affects only the Tensor Products of Approximations (including Incomplete Tensor Products of Approximations) models that contain BSPL factors, and only if GTApprox/TALinearBSPLExtrapolation is set to

`True`

. It does not affect non-BSPL factors at all, and if a Tensor Approximation model is built using only non-BSPL factors, this option is ignored.

**GTApprox/TAModelReductionRatio**

Sets the ratio of model complexity reduction.

Value: floating point number in \([1, \infty)\), or 0 (auto) Default: 0 (auto) New in version 6.2.

Sets the complexity (the number of basis functions) for TA and iTA models as the ratio to the default complexity (see Model Complexity Reduction). For example,

`2`

sets the number of basis functions 2 times less than default. The reduction affects only BSPL factors; all other factors ignore this option.This option slightly increases model size but reduces memory consumption in model evaluation and the size of model exported to C or Octave. Model accuracy decreases in most cases.

Note that there is a lower limit for model complexity, so the actual reduction ratio may be less than the GTApprox/TAModelReductionRatio value you set.

The exact fit requirement may be impossible to satisfy, if GTApprox/TAModelReductionRatio has any meaningful non-default value (greater than 1). Generally, this option is not compatible with exact fit.

This option is also not compatible with GTApprox/TAReducedBSPLModel, because both options reduce model complexity but use different algorithms.

**GTApprox/TAReducedBSPLModel**

Deprecated since version 6.2: kept for compatibility only, use GTApprox/TAModelReductionRatio instead.

Since version 6.2 this option is deprecated by a more advanced GTApprox/TAModelReductionRatio option which allows to set the desired complexity of the final model.

**GTApprox/Technique**

Specify the approximation algorithm to use.

Value: `"RSM"`

,`"SPLT"`

,`"HDA"`

,`"GP"`

,`"SGP"`

,`"HDAGP"`

,`"TA"`

,`"iTA"`

,`"TGP"`

,`"MoA"`

,`"GBRT"`

,`"PLA"`

,`"TBL"`

or`"Auto"`

Default: `"Auto"`

New in version 1.9.2: added the incomplete Tensor Approximation technique.

New in version 1.10.0: added the Mixture of Approximators technique.

New in version 3.0 Release Candidate 1:added the Tensor Gaussian Processes technique.New in version 5.1: added the Gradient Boosted Regression Trees technique.

New in version 6.3: added the Piecewise Linear Approximation technique.

New in version 6.8: added the Table Function technique.

Changed in version 6.8: removed the deprecated Linear Regression (LR) technique. This technique is no longer supported; instead, use RSM with GTApprox/RSMType set to

`"Linear"`

.This option allows user to explicitly specify an algorithm to be used in approximation. Its default value is

`"Auto"`

, meaning that the tool will automatically determine and use the best algorithm (except TGP and GBRT which are never selected automatically, and TA and iTA which are by default excluded from automatic selection — see GTApprox/EnableTensorFeature). Manual settings are:

`"RSM"`

— Response Surface Model`"SPLT"`

— 1D Splines with tension`"HDA"`

— High Dimensional Approximation`"GP"`

— Gaussian Processes`"SGP"`

— Sparse Gaussian Process`"HDAGP"`

— High Dimensional Approximation combined with Gaussian Processes`"TA"`

— Tensor Products of Approximations`"iTA"`

— Incomplete Tensor Products of Approximations (added in 1.9.2)`"TGP"`

— Tensored Gaussian Processes (added in 3.0 Release Candidate 1)`"MoA"`

— Mixture of Approximators (added in 1.10.0)`"GBRT"`

— Gradient Boosted Regression Trees (added in 5.1)`"PLA"`

— Piecewise Linear Approximation (added in 6.3)`"TBL"`

— Table Function (added in 6.8)Sample size requirements taking effect when the approximation technique is selected manually are described in section Sample Size Requirements.

Note

Smart training of GBRT technique (using

`da.p7core.gtapprox.Builder.build_smart()`

) can be time consuming even in case of small training samples. Details on smart training can be found in section Smart Training.

**GTApprox/TensorFactors**

Describes tensor factors to use in the Tensor Approximation technique.

Value: list of factor definitions, see description Default: `[]`

(automatic factorization)Changed in version 6.18: added the PLA technique support.

This option enables specifying input factorization for the TA technique manually. Also works with TGP, but does not allow to change factor techniques in this case, except specifying discrete variables. iTA and other techniques ignore this option completely.

Note

The incomplete tensor approximation (iTA) technique ignores factorization specified by

GTApprox/TensorFactorsbecause it always uses 1-dimensional BSPL factors. The tensor Gaussian processes (TGP) technique applies factorization, but in this case the option value cannot include technique labels (see below). The only valid label for TGP is`"DV"`

, but it is better to use the GTApprox/CategoricalVariables option instead.Option value is a list of user-defined tensor factors, each factor being a subset of input dataset components selected by user. A factor is defined by a list of component indices and optionally includes a label, specifying the approximation technique to use, as the last element of the list. Indices are zero-based, lists are comma-separated and enclosed in square brackets.

For example,

`[[0, 2], [1, "BSPL"]]`

specifies factorization of a 3-dimensional input dataset into two factors. The first factor includes the first and third components, and the approximation technique for this factor will be selected automatically (no technique specified by user). The second factor includes the second component, and splines (`"BSPL"`

label) will be used in the approximation of this factor.Technique label must be the last element of the list defining a factor. Valid labels are:

`"Auto"`

- automatic selection (same as no label).`"BSPL"`

- use 1-dimensional cubic smoothing splines.`"PLA"`

- use piecewise linear approximation.`"GP"`

- use Gaussian processes.`"SGP"`

- use Sparse Gaussian Process (added in 6.2).`"HDA"`

- use high dimensional approximation.`"LR"`

- linear approximation (linear regression).`"LR0"`

- constant approximation (zero order linear regression).`"DV"`

- discrete variable. The only valid label for the tensor Gaussian processes (TGP) technique. To specify discrete variable GTApprox/CategoricalVariables option can also be used. Interaction between these two options is described in section Categorical Variables for TA, iTA and TGP techniques.Note

The splines technique (

`"BSPL"`

) is available only for 1-dimensional factors.Note

The PLA technique has specific requirements regarding the sample size and input dimension — see Sample Size Requirements for details. When using PLA as a factor technique in TA, these requirements apply to the factor, and GTApprox issues a warning if they are not met.

Note

For factors using sparse Gaussian processes (

`"SGP"`

) the number of base points is specified by GTApprox/SGPNumberOfBasePoints. Note that this number is the same for all SGP factors. If a factor’s cardinality is less than the number of base points then a warning is generated and the Gaussian processes (`"GP"`

) technique is used for this factor instead.Warning

The

`"DV"`

label may conflict with the GTApprox/CategoricalVariables option — see its description for details. For this reason, when using the TGP technique, GTApprox/CategoricalVariables should be used instead of specifying discrete variables using the`"DV"`

label.Factorization has to be full (has to include all components). If there is a component not included in any of the factors, it leads to an exception.

**GTApprox/TrainingAccuracySubsetSize**

Limits the number of points selected from the training set to calculate model accuracy on the training set.

Value: integer in range \([1, 2^{32}-1]\), or 0 (no limit). Default: 100 000 New in version 1.9.0.

After a model has been built by GTApprox, it is evaluated on the input values from the training set to test model accuracy (calculate model errors, or the deviation of model output values from the original output values). The result is an integral characteristic named “Training Set Accuracy”, which is found in model info. For very large samples this test is time consuming and may significantly increase the build time. If the number of points in the training set exceeds the

GTApprox/TrainingAccuracySubsetSizeoption value, some of the points will be dropped to make the test take less time, and training set accuracy statistic will be based only on the model errors calculated using the limited points subset (the size of which is equal to theGTApprox/TrainingAccuracySubsetSizeoption value). The number of points actually used in the test will also be found in model info.If the sample size is less than

GTApprox/TrainingAccuracySubsetSizevalue, this option in fact has no effect. In this case the number of points used in model accuracy test is equal to the number of points used to build the model (which may still be different from the number of points in the training set — for example, if the training set contains duplicate values).When this option does take an effect, it always produces a warning to the model build log stating that only a limited subset of points selected from the training set will be used to calculate model accuracy.

To cancel the limit, set this option to 0. With this setting, the model will always be evaluated on the same set of points which were used to build the model.