To solve modern scientific and engineering problems, complex sequences of computations involving various task-specific programs are usually required. pSeven Enterprise makes it possible to automate such computations by representing computational processes in the form of workflows and performing computations represented by workflows. Each workflow consists of interrelated tasks with a well-defined order of their execution.
In a workflow diagram, tasks are represented by named blocks, and task relationships are represented by links between blocks. Each block is a functional unit that takes input data, performs computations using that data, and outputs the computation result data. Links between blocks designate channels of data flow from one block to another.
Data input and output is done through the named ports of the block, so the links between the blocks actually link their ports. When configuring a port, you can specify the direction of data flow as well as the type of data passing through that port. A link between two ports with different data types, if possible, converts the incoming data to the type allowed on the receiving port.
Block inputs and outputs¶
Workflows are made up of blocks, each of which can receive data from other blocks and issue data to other blocks. The block receives and issues data through its ports. Ports for receiving data are called input ports or block inputs, ports for issuing data are called output ports or block outputs. Each port has a configurable name.
To start the task represented by the block, data must be available at all its inputs, and upon completion, the task issues data to the block outputs. Some blocks can start even if no data is set on certain block inputs. The input data in this case must be specified in the configuration of the block.
The direction of data flow through the port is specified in the port settings. A unidirectional port can serve as either a block input or a block output. A bidirectional port can serve as both an input and an output of the block. Think of such a port as a pair of unidirectional ports.
The input port settings allow you to:
- Add the port to the workflow parameters. During the execution of the workflow, the block will use the parameter value for this port if no data from other blocks arrives at the port. Parameter values are part of the workflow run configuration.
- Set the port to some default value. During the execution of the workflow, the block will use this value if no data from other blocks or workflow parameters arrives at the port. The default input port value is part of the block configuration.
Some blocks, so-called driver blocks, can perform iterative computations by exchanging data with other blocks in a loop. A block can control a loop if it has at least a pair of driver ports: a request port and a response port. Request ports serve to issue data, and response ports serve to receive data.
During the execution of the workflow, the driver block issues data to one or more request ports and waits for data to arrive at a specific response port. When data arrives at this response port, new data is issued to the same request ports, and the block again waits for data on the response port. This looping behavior of the block allows driver ports to be used to perform iterations.
To organize a loop using a driver block, it is necessary to link its driver ports to the ports of another block that implements the loop body. The loop body block receives data from the request ports, performs computations using that data, and sends the computation result to the response port. Having received the response, the driver block issues new data to the loop body block, thereby starting the next iteration. This cycle stops when the driver block stops issuing new data to the request ports.
Sometimes it is advisable to represent some part of the workflow as a single block containing a group of blocks together with their links. Such a block, which actually encloses a partial workflow, is called a composite block. Composite blocks greatly expand the capabilities of workflows, enabling batch processing and parallel execution of tasks, complex iterative computation, and data provenance tracking.
From an external perspective, a composite block has a lot in common with an ordinary one. It has inputs and outputs and it is triggered by data on its inputs. However, within a composite block, you can selectively link its ports with the ports of the blocks it contains, since the inputs and outputs of the composite block are at the same time the inputs and outputs of its internal workflow. Setting up links within the composite block provides its internal workflow with the necessary data and allows the workflow results to be passed on to the block outputs. The ability to flexibly configure those links within a composite block is one of its important advantages.
A composite block can be configured to receive and process data arrays by setting its inputs to batch mode. As a result, when running the block, its internal workflow will be executed for each element of the input data in parallel mode, where multiple elements can be processed concurrently. This ability to operate in batch mode enables the implementation of iterative computations with parallel data processing.
The content of a composite block is a certain workflow, consisting of one or more blocks, and therefore a composite block can enclose other blocks, including composite ones. Thus, the task of the parent composite block gets split into subtasks represented by its child blocks. This capability allows you to implement a hierarchy of tasks such as nested loops and condition or constraint checks, and also provides a convenient way to structure your workflow.
Working directory settings¶
Each block has a working directory, where the files created by its task are stored, and in which the task's input files can also be located. The working directory of the parent block serves as the working directory for all of its non-composite child blocks. As for composite child blocks, their working directory can be configured individually for each block by choosing one of the Working directory options at the top of the Composite properties pane:
parent- The working directory of the child block is the root of the working directory of the parent block. This is the lowest level of file isolation. Child block files may conflict or be overwritten by files from other blocks. However, this option can make it easier to access files created by other blocks.
single- The working directory of the child block is a predefined single folder at the root of the parent block's working directory. This folder isolates the files of the child block, preventing their conflicts with files from other blocks. However, this level of file isolation may not be sufficient when executing the child block in a loop, because the output files are then overwritten at each iteration.
indexed- When executing the child block in a loop, each iteration creates a separate working directory, which is a folder with a unique name in the root of the parent block's working directory. This option is used when the analysis of the computation results requires files generated by the block at each iteration.
Port settings specific to composite blocks¶
The inputs and outputs of composite blocks have the same settings as other blocks, plus the following additional settings:
Batch- Set the input port to batch mode, enabling parallel processing of data arrays from that port. When multiple ports are in batch mode, the workflow must ensure that the arrays sent to each of those ports are the same size.
Results- Monitor and save port data while the workflow is running. When the workflow completes, the record of the port data is available for analysis in the workflow execution results. This setting applies to both input and output ports.
Composite blocks provide control over the parallelization of computations: you can set the maximum allowed number of parallel processes and specify the input ports intended to receive batches of values for parallel processing. When running within a workflow, a composite block creates a number of processes for executing its nested blocks and automatically distributes the input data between them. The current number of such processes is limited by the Maximum parallelization setting at the top of the Composite properties pane.
To enable parallelization, the Maximum parallelization setting must be 2 or
greater; in addition, at least one of the input ports must be configured as a
batch mode port using the
Batch setting (see
Port settings specific to composite blocks).
Data arrays (for example, lists of values) must be fed to the batch mode ports,
while the workflow must ensure the same array size on all those ports.
When parallelization is enabled, the input data of the batch mode ports is distributed among the nested block execution processes in such a way that each process handles one of the batch data items at a time. Data from input ports that are not in batch mode is copied so that the same data is sent to each of the parallel processes. Normally, there is no need to enable batch mode for ports that receive constants and other values that remain unchanged for all nested block execution processes.
The Maximum parallelization setting determines the maximum allowed number of parallel nested block execution processes, from 1 to 32. When you set a value of 1, only one process is allowed; the block, however, can receive data batches and will process their data items sequentially, one by one. This mode of composite block operation can serve as an alternative to a data processing loop (see Iterative computations).
Each composite block provides a number of built-in ports for conveying current information about the workflow execution in progress. Internal information ports can be connected to input ports of nested blocks inside the composite block, external information ports - to input ports of workflow blocks located outside the composite block. You can view all information ports in the Info section of the Composite properties pane.
Internal information ports provide the following information:
@Step index- the number of the current loop iteration executing this composite block. If the composite block is part of a loop, its
@Step indexport enables nested blocks to get the number of the loop iteration currently executing their parent block.
@Working directory path- the path to the current working directory of this composite block. The internal
@Working directory pathport of the composite block enables its nested blocks to get a string representing the path to their current working directory. Normally, nested blocks have the same working directory as their parent composite block - see Block working directories for further details.
External information ports provide the following information:
@Run statistics- advanced information about the current execution of this composite block. The
Resultssetting for the
@Run statisticsport allows you to save this information in the workflow run results, which can be helpful in diagnosing and troubleshooting workflow issues.
@Error- error information in case of this composite block execution failure. The
@Errorport of the composite block enables other blocks in the workflow to get details of the calculation failure as a result of executing this block.
@Working directory path- the path to the current working directory of this composite block. The
@Working directory pathexternal port of the composite block enables other blocks in the workflow to get a string representing the path to the current working directory of that composite block. This information can be useful when the working directory changes depending on the call index of the composite block (see the
indexedoption in Working directory settings).
Workflow as a composite block¶
In pSeven Enterprise, a workflow is organized as a composite block, which enables batch mode, parallel processing, as well as many settings that are similar to those of composite blocks, including:
Compared to composite blocks, the working directory settings for a workflow have the following differences:
- The parent working directory for the workflow is its run directory. The run directory is created when configuring workflow runs, and each run has its own separate directory.
- With the
parentoption, the working directory is the run directory.
- With the
indexedoption, the working directory is a folder located in the run directory root. The working directory of the workflow is the parent working directory for all its child blocks, including the composite ones.
Like a composite block, a workflow can have inputs and outputs that can be linked to blocks that make up the workflow. Block input ports can be linked to (and receive data from) workflow inputs, and block output ports can be linked to (and issue data to) workflow outputs. The workflow inputs receive initial data for the computation, and its outputs are used to issue the computation results.
Similar to a composite block, a workflow can be configured to receive and process data arrays by setting its inputs to batch mode. As a result, the workflow will be executed for each element of the input data in parallel mode, where multiple elements can be processed concurrently. This ability to operate in batch mode improves computational efficiency by implementing parallel processing of large amounts of data.
Workflow has the same information ports as any composite block. Thus workflow blocks can use the
@Working directory pathinternal port to get the path to the current working directory of the workflow (see Workflow working directory). External information ports can be used to append respective information to workflow run results.
The above settings of the workflow are located on its Composite properties
pane. This pane is similar to the composite block configuration pane and
appears if the root element is selected in the Workflow contents pane, or
[Out] area is selected in the workflow diagram. For more
Understanding the setup controls.