July 10, 2019
Initial Sample in the Design Space Exploration and Uncertainty Quantification Studies
Introduction
The initial sample is the optional input data, sent to the block — existing designs and responses sample, which can be used in several ways depending on the block configuration. The support for initial problem data allows to improve existing solutions or, for example, to continue an interrupted optimization. Also, it can be used to select the best point from a sample or to check the feasibility of sample points. If the initial sample is available, it sometimes helps to obtain a more accurate solution faster. There are two blocks in pSeven which can work with the initial samples: Design space exploration (DSX) block and Uncertainty quantification (UQ) block (Fig. 1). Optimization history, previous solution, problem evaluation data are the typical examples of the initial sample. Let’s consider the features of using the initial sample in these two blocks.
Fig. 1. Design space exploration and Uncertainty quantification blocks in pSeven
Initial sample in Design Space Exploration block
The Design space exploration (DSX) block solves a variety of design exploration tasks using design of experiments (DoE) and optimization methods. It can generate DoE samples with specific properties, collect responses samples from blackboxes which evaluate these designs, and solve optimization problems. The initial sample can contain either values of variables or values of both variables and responses (Fig. 2). It may contain empty values for responses (see below) and NaN-values, which is usefull for some optimization methods (see Surrogate Based Optimization in pSeven documentation).
Fig. 2. Initial sample containing only values of variables (left) or values of both variables and responses (right)
The initial sample can be supplied as a single matrix or separately for each variable and response (Fig. 3). The Initial sample input accepts the single initial sample matrix. The order of matrix columns should be the same as the order of variables and responses on the Variables and Responses pane, variables first. For a multi-dimensional variable or response, the number of corresponding matrix columns is equal to the variable (response) Size.
Fig. 3. Initial sample ports on Ports and Parameters pane in DSX block
Let’s consider how different techniques in the DSX block can use the initial sample. First of all, the Initial sample evaluation technique processes the initial sample the same way, it just does not generate new design points (Fig. 4).
Fig. 4. Scheme of work of the Initial sample evaluation technique
DSX block obtains the values of variables and outputs a sample containing the corresponding values of responses after evaluation (Fig. 5).
Fig. 5. Scheme of pSeven workflow
Non-adaptive DoE techniques add the new generated designs to the initial sample and then process the obtained sample according to the current block configuration (Fig. 6).
Fig. 6. Scheme of the Non-adaptive DoE techniques work with the initial sample
For instance, if there are constraint responses, then all designs are checked for feasibility, or if there is a blackbox connected, then all feasible designs are evaluated, and so on. Fig. 7 shows a pSeven workflow consisting of the DSX block, a Model composite block, and a CSV Parser block which imports the initial sample from an external folder and transfers it into the DSX block.
Fig. 7. pSeven workflow
If the initial sample is a Latin hypercube design, the Latin hypercube sampling technique can update it (generate new designs) in such a way that the new sample is also a Latin hypercube. This is a special mode enabled by the Property preservation option. Adaptive design techniques can use the initial sample to train initial approximations of the response functions, thus improving generation quality (Fig. 8).
Fig. 8.Using the initial sample in Adaptive design of experiments
Another example of initial sample application is when a configuration includes an adaptive type response, which is not evaluated by a blackbox. In this case, the technique requires an initial sample for this response and uses it to train an internal model, evaluated instead of the blackbox (Fig. 9).
Fig. 9. Adaptive design generation without the blackbox
Adaptive design technique will generate new points and return response values as empty for all new designs. None value means that a response was not evaluated for the corresponding design point. You can find more information about Adaptive design technique in "Adaptive Design of Experiments in pSeven" Tech Tip. Surrogate-based optimization (SBO) technique uses the initial sample to train the initial approximation models of responses. If the initial sample does not contain response values, or some of them are missing, these designs are evaluated first.
The main feature of Surrogate-based optimization is a possibility to save the same number of points as in the initial sample, and to obtain virtually the same result. Let’s consider the Surrogate-based optimization of a high-speed rotating disk geometry with 120 number of designs and SBO with 20 number of designs but with the initial sample containing 100 points. You can learn more about the subject matter in series of pSeven video tutorials. Design of experiments and optimization problems solution is shown in detail in the third and fourth videos of the series. The initial sample is generated by LHS design of experiments technique. Detailed information of the optimization process is depicted in the Table 1.
Table 1. Comparison the two processes of Surrogate-based optimization (SBO)
SBO (120 points) | SBO (20 points) + Initial LHS (100 points) | |
Optimal designs | 41 | 22 |
Pareto frontiers resulting from SBO are presented in Fig. 10. It’s clear that the solution quality of SBO with the initial sample but a small number of designs is high, and almost fits to the SBO (120 points). This means that using the initial sample can be very helpful and can save budget. Moreover, it’s recommended to perform a preliminary design of experiments as a design space exploration before the optimization. All the given data will be used in the optimization process, and an additional study of the problem will be done.
Fig. 10. Pareto frontiers of Surrogate-based optimization (SBO)
Gradient-based optimization (GBO) technique uses the initial sample as an extended initial guess. If the initial sample contains both variable and response values, this technique selects optimum designs from the initial sample and starts searching from these points. If response values are missing, the initial designs are firstly evaluated, and then analyzed for optimality. The results of Gradient-based optimization (GBO) of a high-speed rotating disk geometry with and without the initial sample are presented in the Table 2. The initial sample is generated by LHS and contains 200 values. A number of designs for optimization is equal to 300.
Table 2. Comparison the two processes of Gradient-based optimization (GBO)
GBO | GBO + Initial LHS | |
Processing time | 1 m 10 s | 57 s |
Optimal designs | 52 | 78 |
Pareto frontiers obtained in these two types of solution are show in Fig. 11. Obviously, Pareto frontier covers a much wider range when the initial sample is used.
Fig. 11. Pareto frontiers of Gradient-based optimization (GBO)
Initial sample in Uncertainty Quantification block
The Uncertainty Quantification (UQ) block allows performing an uncertainty quantification study for the output of the computational model of interest. Fig. 12 represents an operational scheme of the UQ block with initial sample.
Fig. 12. Operational scheme of the UQ block with initial sample.
If the sample is available, the distribution parameters for a given distribution type can be determined. A pSeven workflow performing an uncertainty quantification analysis with the sample is presented in Fig. 13.
Fig. 13. pSeven workflow with the UQ block
The UQ block supports Sample port, so a data sample can be used instead of specifying a parametric distribution. CSV Parser block imports this sample from an external folder. The block analyzes this sample and automatically determines the distribution parameters for the variable. To illustrate this, let’s connect the sample for variable t3. Only the type of distribution for this sample should be selected in the UQ block configuration (Fig. 14). As an example, “Loginuform” is specified, and distribution parameters are default.
Fig. 14. UQ block configuration
A port matrix of the CSV Parser block is connected to the Sample port of variable t3 in UQ block (Fig. 15). Note that all the variables have Sample port.
Fig. 15. Sample port in UQ block for variable t3
As a result, the distribution parameters for the selected variable will be obtained.
Summary
The specific scenarios of using the initial sample in pSeven are described above. Two exploration blocks support this type of sample. It is useful to apply at the design of experiments or in the optimization process, and can be helpful for uncertainty quantification analysis, when the distribution parameters are unknown.