ProActive Workflows & Scheduling (PWS)

1. Overview

ProActive Scheduler is a comprehensive Open Source job scheduler and Orchestrator, also featuring Workflows and Resource Management. The user specifies the computation in terms of a series of computation steps along with their execution and data dependencies. The Scheduler executes this computation on a cluster of computation resources, each step on the best-fit resource and in parallel wherever its possible.

architecture

On the top left there is the Studio interface which allows you to build Workflows. It can be interactively configured to address specific domains, for instance Finance, Big Data, IoT, Artificial Intelligence (AI). See for instance the Documentation of ProActive AI Orchestration here, and try it online here. In the middle there is the Scheduler which enables an enterprise to orchestrate and automate Multi-users, Multi-application Jobs. Finally, at the bottom right is the Resource manager interface which manages and automates resource provisioning on any Public Cloud, on any virtualization software, on any container system, and on any Physical Machine of any OS. All the components you see come with fully Open and modern REST APIs.

automation dashboard catalog

The screenshot above shows the Catalog Portal, where one can store Workflows, Calendars, Scripts, etc. Powerful versioning together with full access control (RBAC) is supported, and users can share easily Workflows and templates between teams, and various environments (Dev, Test, Staging and Prod).

job planner

The screenshot above shows the Job Planner Portal, allowing to automate and schedule recurring Jobs. From left to right, you can define and use Calendar Definitions , associate Workflows to calendars, visualize the execution planning for the future, as well as actual executions of the past.

calendar planning gantt

The Gantt screenshot above shows the Gantt View of Job Planner, featuring past job history, current Job being executed, and future Jobs that will be submitted, all in a comprehensive interactive view. You easily see the potential differences between Planned Submission Time and Actual Start Time of the Jobs, get estimations of the Finished Time, visualize the Job that stayed PENDING for some time (in Yellow) and the Jobs that had issues and got KILLED, CANCELLED, or FAILED (in red).

automation dashboard cloud watch

The screenshot above shows the Event Orchestration portal, where a user can manage and benefit from Event Orchestration - smart monitoring system. This ProActive component detects complex events and then triggers user-specified actions according to predefined set of rules.

cloud automation on demand paas

The screenshot above shows the Service Automation portal which is actually a PaaS automation tool. It allows you to easily manage any On-Demand Services with full Life-Cycle Management (create, deploy, suspend, resume and terminate).

scalable elastic resources

The screenshots above taken from the Resource Manager portal shows that you can configure ProActive Scheduler to dynamically scale up and down the infrastructure being used (e.g. the number of VMs you buy on Clouds) according to the actual workload to execute.

1.1. Glossary

The following terms are used throughout the documentation:

ProActive Workflows & Scheduling

The full distribution of ProActive for Workflows & Scheduling, it contains the ProActive Scheduler server, the REST & Web interfaces, the command line tools. It is the commercial product name.

ProActive Scheduler

Can refer to any of the following:

  • A complete set of ProActive components.

  • An archive that contains a released version of ProActive components, for example activeeon_enterprise-pca_server-OS-ARCH-VERSION.zip.

  • A set of server-side ProActive components installed and running on a Server Host.

Resource Manager

ProActive component that manages ProActive Nodes running on Compute Hosts.

Scheduler

ProActive component that accepts Jobs from users, orders the constituent Tasks according to priority and resource availability, and eventually executes them on the resources (ProActive Nodes) provided by the Resource Manager.

Please note the difference between Scheduler and ProActive Scheduler.
REST API

ProActive component that provides RESTful API for the Resource Manager, the Scheduler and the Catalog.

Resource Manager Web Interface

ProActive component that provides a web interface to the Resource Manager. Also called Resource Manager portal.

Scheduler Web Interface

ProActive component that provides a web interface to the Scheduler. Also called Scheduler portal.

Workflow Studio

ProActive component that provides a web interface for designing Workflows.

Automation Dashboard

Centralized interface for the following web portals: Workflow Execution, Catalog, Analytics, Job Planner, Service Automation, Event Orchestration and Notifications.

Analytics

ProActive component responsible to gather monitoring information for Jobs and ProActive Nodes

Job Analytics

ProActive component, part of the Analytics service, that provides an overview of executed workflows along with their input variables and results.

Job Gantt

ProActive component, part of the Analytics service, that builds a Gantt diagram of past Jobs executions.

Node Gantt

ProActive component, part of the Analytics service, that provides an overview of ProActive Nodes usage over time.

Analytics portal

Web interface of the Analytics component.

Notification Service

ProActive component that allows a user or a group of users to subscribe to different types of notifications (web, email, sms or a third-party notification) when certain event(s) occur (e.g., job In-Error state, job in Finished state, scheduler in Paused state, etc.).

Notification portal

Web interface to visualize notifications generated by the Notification Service and manage subscriptions.

Notification Subscription

In Notification Service, a subscription is a per-user configuration to receive notifications for a particular type of events.

Third Party Notification

A Notification Method which executes a script when the notification is triggered.

Notification Method

Element of a Notification Subscription which defines how a user is notified (portal, email, third-party, etc.).

Notification Channel

In the Notification Service, a channel is a notification container that can be notified and displays notifications to groups of users

Service Automation

ProActive component that allows a user to easily manage any On-Demand Services (PaaS, IaaS and MaaS) with full Life-Cycle Management (create, deploy, pause, resume and terminate).

Service Automation portal

Web interface of the Service Automation.

Workflow Execution portal

Web interface, is the main portal of ProActive Workflows & Scheduling and the entry point for end-users to submit workflows manually, monitor their executions and access job outputs, results, services endpoints, etc.

Catalog

ProActive component that provides storage and versioning of Workflows and other ProActive Objects. It is also possible to query the Catalog for specific Workflows through a REST API and also with Graphql API.

Catalog portal

Web interface of the Catalog component.

Event Orchestration

ProActive component that monitors a system, according to predefined set of rules, detects complex events and then triggers user-specified actions.

Event Orchestration portal::Web interface of the Event Orchestration component.

Scheduling API

ProActive component that offers a GraphQL endpoint for getting information about a ProActive Scheduler instance.

Connector IaaS

ProActive component that enables to do CRUD operations on different infrastructures on public or private Cloud (AWS EC2, Openstack, VMWare, Kubernetes, etc).

Job Planner

ProActive component providing advanced scheduling options for Workflows.

Job Planner portal

Web interface to manage the Job Planner service.

Calendar Association

Component of the Job Planner that defines Workflows associated to the Calendar Definition.

Bucket

ProActive notion used with the Catalog to refer to a specific collection of ProActive Objects and in particular ProActive Workflows.

Server Host

The machine on which ProActive Scheduler is installed.

SCHEDULER_ADDRESS

The IP address of the Server Host.

ProActive Node

One ProActive Node can execute one Task at a time. This concept is often tied to the number of cores available on a Compute Host. We assume a task consumes one core (more is possible, see multi-nodes tasks), so on a 4 cores machines you might want to run 4 ProActive Nodes. One (by default) or more ProActive Nodes can be executed in a Java process on the Compute Hosts and will communicate with the ProActive Scheduler to execute tasks. We distinguish two types of ProActive Nodes:

  • Server ProActive Nodes: Nodes that are running in the same host as ProActive server;

  • Remote ProActive Nodes: Nodes that are running on machines other than ProActive Server.

Compute Host

Any machine which is meant to provide computational resources to be managed by the ProActive Scheduler. One or more ProActive Nodes need to be running on the machine for it to be managed by the ProActive Scheduler.

Examples of Compute Hosts:

Node Source

A set of ProActive Nodes deployed using the same deployment mechanism and sharing the same access policy.

Node Source Infrastructure

The configuration attached to a Node Source which defines the deployment mechanism used to deploy ProActive Nodes.

Node Source Policy

The configuration attached to a Node Source which defines the ProActive Nodes acquisition and access policies.

Scheduling Policy

The policy used by the ProActive Scheduler to determine how Jobs and Tasks are scheduled.

PROACTIVE_HOME

The path to the extracted archive of ProActive Scheduler release, either on the Server Host or on a Compute Host.

Workflow

User-defined representation of a distributed computation. Consists of the definitions of one or more Tasks and their dependencies.

Workflow Revision

ProActive concept that reflects the changes made on a Workflow during it development. Generally speaking, the term Workflow is used to refer to the latest version of a Workflow Revision.

Generic Information

Are additional information which are attached to Workflows or Tasks. See generic information.

Calendar Definition

Is a json object attached by adding it to the Generic Information of a Workflow.

Job

An instance of a Workflow submitted to the ProActive Scheduler. Sometimes also used as a synonym for Workflow.

Job Id

An integer identifier which uniquely represents a Job inside the ProActive Scheduler.

Job Icon

An icon representing the Job and displayed in portals. The Job Icon is defined by the Generic Information workflow.icon.

Task

A unit of computation handled by ProActive Scheduler. Both Workflows and Jobs are made of Tasks. A Task must define a ProActive Task Executable and can also define additional task scripts

Task Id

An integer identifier which uniquely represents a Task inside a Job ProActive Scheduler. Task ids are only unique inside a given Job.

Task Executable

The main executable definition of a ProActive Task. A Task Executable can either be a Script Task, a Java Task or a Native Task.

Script Task

A Task Executable defined as a script execution.

Java Task

A Task Executable defined as a Java class execution.

Native Task

A Task Executable defined as a native command execution.

Additional Task Scripts

A collection of scripts part of a ProActive Task definition which can be used in complement to the main Task Executable. Additional Task scripts can either be Selection Script, Fork Environment Script, Pre Script, Post Script, Control Flow Script or Cleaning Script

Selection Script

A script part of a ProActive Task definition and used to select a specific ProActive Node to execute a ProActive Task.

Fork Environment Script

A script part of a ProActive Task definition and run on the ProActive Node selected to execute the Task. Fork Environment script is used to configure the forked Java Virtual Machine process which executes the task.

Pre Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, before the Task Executable.

Post Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, after the Task Executable.

Control Flow Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, after the Task Executable, to determine control flow actions.

Control Flow Action

A dynamic workflow action performed after the execution of a ProActive Task. Possible control flow actions are Branch, Loop or Replicate.

Branch

A dynamic workflow action performed after the execution of a ProActive Task similar to an IF/THEN/ELSE structure.

Loop

A dynamic workflow action performed after the execution of a ProActive Task similar to a FOR structure.

Replicate

A dynamic workflow action performed after the execution of a ProActive Task similar to a PARALLEL FOR structure.

Cleaning Script

A script part of a ProActive Task definition and run after the Task Executable and before releasing the ProActive Node to the Resource Manager.

Script Bindings

Named objects which can be used inside a Script Task or inside Additional Task Scripts and which are automatically defined by the ProActive Scheduler. The type of each script binding depends on the script language used.

Task Icon

An icon representing the Task and displayed in the Studio portal. The Task Icon is defined by the Task Generic Information task.icon.

ProActive Agent

A daemon installed on a Compute Host that starts and stops ProActive Nodes according to a schedule, restarts ProActive Nodes in case of failure and enforces resource limits for the Tasks.

2. Get started

To submit your first computation to ProActive Scheduler, install it in your environment (default credentials: admin/admin) or just use our demo platform try.activeeon.com.

ProActive Scheduler provides comprehensive interfaces that allow to:

We also provide a REST API. and command line interfaces for advanced users.

3. Create and run your computation

3.1. Jobs, workflows and tasks

In order to use Scheduler for executing various computations, one needs to write the execution definition also known as the Workflow definition. A workflow definition is an XML file that adheres to XML schema for ProActive Workflows.

It specifies a number of XML tags for specifying execution steps, their sequence and dependencies. Each execution step corresponds to a task which is the smallest unit of execution that can be performed on a computation resources (ProActive Node). There are several types of tasks which caters different use cases.

3.1.1. Task Types

ProActive Scheduler currently supports three main types of tasks:

  • Native Task, a command with eventual parameters to be executed

  • Script Task, a script written in Groovy, Ruby, Python and other languages supported by the JSR-223

  • Java Task, a task written in Java extending the Scheduler API

For instance, a Script Task can be used to execute an inline script definition or a script file as an execution step whereas a Native Task can be used to execute a native executable file.

We recommend to use script tasks that are more flexible rather than Java tasks. You can easily integrate with any Java code from a Groovy script task.

3.1.2. Additional Scripts

In addition to the main definition of a Task, scripts can also be used to provide additional actions. The following actions are supported:

  • one or more selection scripts to control the Task resource (ProActive Node) selection.

  • a fork environment script script to control the Task execution environment (a separate Java Virtual Machine or Docker container).

  • a pre script executed immediately before the main task definition (and inside the forked environment).

  • a post script executed immediately after the main task definition if and only if the task did not trigger any error (also run inside the forked environment).

  • a control flow script executed immediately after the post script (if present) or main task definition to control flow behavior such as branch, loop or replicate (also run inside the forked environment).

  • finally, a clean script executed after the task is finished, whether the task succeeded or not, and directly on the ProActive Node which hosted the task.

3.1.3. Task Dependencies

A workflow in ProActive Workflows & Scheduling can be seen as an oriented graph of Tasks:

flow spec dependency

In this tasks graph, we see that task 4 is preceded by task 1, this means that the ProActive Scheduler will wait for the end of task 1 execution before launching task 4. In a more concrete way, task 1 could be the calculation of a part of the problem to solve, and task 4 takes the result provided by task 1 and compute another step of the calculation. We introduce here the concept of passing data between tasks. This relation is called a Dependency, and we say that task 4 depends on task 1.

We see that task 1, 2 and 3 are not linked, so these three tasks can be executed in parallel, because they are independent from each other.

The task graph is defined by the user at the time of workflow creation, but can also be modified dynamically during the job execution by control flow actions such as Replicate.

A finished job contains the results and logs of each task. Upon failure, a task can be restarted automatically or cancelled.

3.2. Native Tasks

A Native Task is a ProActive Workflow task which main execution is defined as a command to run.

Native tasks are the simplest type of Task, where a user provides a command with a list of parameters.

Using native tasks, one can easily reuse existing applications and embed them in a workflow.

Once the executable is wrapped as a task you can easily leverage some of the workflow constructs to run your executable commands in parallel.

You can find an example of such integration in this XML workflow or you can also build one yourself using the Workflow Studio.

Native application by nature can be tied to a given operating system so if you have heterogeneous nodes at your disposal, you might need to select a suitable node to run your native task. This can be achieved using selection script.

Native Tasks are not the only possibility available to run executables or commands, this can also be achieved using shell language script tasks.

3.3. Script Tasks

A Script Task is a ProActive Workflow task which main execution is defined as a script.

ProActive Workflows supports tasks in many scripting languages. The currently supported dynamic languages or backends are Groovy, Jython, Python, JRuby, Javascript, Scala, Powershell, VBScript, R, Perl, PHP, Bash, Any Unix Shell, Windows CMD, Dockerfile, Docker Compose and Kubernetes.

Dockerfile, Docker Compose and Kubernetes are not really scripting languages but rather description languages, ProActive tasks can interpret Dockerfiles, Docker Compose or Kubernetes yaml files to start containers or pods.

As seen in the list above, Script Tasks can be used to run native operating system commands by providing a script written in bash, ksh, windows cmd, Powershell, etc…​ Simply set the language attribute to bash, shell, or cmd and type your command(s) in the workflow. In this sense, it can replace in most cases a Native Task.

You can easily embed small scripts in your workflow. The nice thing is that the workflow will be self-contained and will not require to compile your code before executing. However, we recommend that you keep these scripts small to keep the workflow easy to understand.

You can find an example of a script task in this XML workflow or you can also build one yourself using the Workflow Studio. The quickstart tutorial relies on script tasks and is a nice introduction to writing workflows.

Scripts can also be used to decorate any kind of task (Script, Java or Native) with specific actions, as described in the additional scripts section.

It is thus possible to combine various scripting languages in a single Task. For example a pre script could be written in bash to transfer some files from a remote file system, while the main script will be written in python to process those files.

Finally, a Script Task may or may not return a result materialized by the result script binding.

3.3.1. Script Bindings

A Script Task or additional scripts can use automatically-defined script bindings. These bindings allow to associate the script inside its workflow context, for example, the ability to know the current workflow variables as submitted by a user, the workflow job id as run inside the ProActive Scheduler, etc.

Bindings allow as well to pass information between various scripts inside the same task or across different tasks.

A script binding can be either:

  • Defined automatically by the ProActive Scheduler before the script starts its execution. In this case, we call it also a provided or input script binding.

  • Needed to be defined by the script during its execution to provide meaningful results. In this case, we call it also a result or output script binding.

Below is an example of Script Task definition which uses both types of bindings:

jobId = variables.get("PA_JOB_ID")
result = "The Id of the current job is: " + jobId

As we can see, the variables binding in provided automatically by the ProActive Scheduler before the script execution and it is used to compute a result binding as task output.

Bindings are stored internally inside the ProActive Scheduler as Java Objects. Accordingly, a conversion may be performed when targeting (or when reading from) other languages.

For example, Generic Information or Workflow Variables are stored as Java objects implementing the Map interface. When creating the variables or genericInformation bindings prior to executing a Groovy, JRuby, Jython or Python script, the ProActive Scheduler will convert it to various types accepted by the target language. In those languages, printing the type of the genericInformation binding will show:

Groovy: class com.google.common.collect.RegularImmutableBiMap
JRuby: Java::ComGoogleCommonCollect::RegularImmutableBiMap
Jython: <type 'com.google.common.collect.RegularImmutableBiMap'>
Python: <class 'py4j.java_collections.JavaMap'>

We see here that Groovy, JRuby and Jython did not perform any conversion, whereas Python (CPython) did. This behavior is expected as CPython script execution is run as a standalone python process and a custom type conversion occurs while the formers are run directly inside the task Java Virtual Machine will full Java type compatibility.

Depending on the script type (task script, selection script, etc…​), the script may need to define an output binding to return some information to the scheduler.

Below are some examples of output bindings for various kind of scripts:

  • result and resultMetadata for a Script Task main execution script.

  • selected for a selection script.

  • loop for a Loop construct flow script.

  • runs for a Replicate construct flow script.

The complete list of script bindings (both input and output) is available in the Script Bindings Reference section.

Below are descriptions of some specific scripting language support, which can be used in Script Tasks main execution script or in any Task additional scripts.

3.4. Java Tasks

A workflow can execute Java classes thanks to Java Tasks. In terms of XML definition, a Java Task consists of a fully-qualified class name along with parameters:

<task name="Java_Task">
  <javaExecutable class="org.ow2.proactive.scheduler.examples.WaitAndPrint">
    <parameters>
      <parameter name="sleepTime" value="20"/>
      <parameter name="number" value="2"/>
    </parameters>
  </javaExecutable>
</task>

The provided class must extend the JavaExecutable abstract class and implement the execute method.

Any parameter must be defined as public attributes. For example, the above WaitAndPrint class contains the following attributes definitions:

/** Sleeping time before displaying. */
public int sleepTime;

/** Parameter number. */
public int number;

A parameter conversion is performed automatically by the JavaExecutable super class. If this automatic conversion is not suitable, it is possible to override the init method.

Finally, several utility methods are provided by JavaExecutable and should be used inside execute. A good example is getOut which allows writing some textual output to the workflow task or getLocalSpace which allows access to the task execution directory.

The complete code for the WaitAndPrint class is available below:

public class WaitAndPrint extends JavaExecutable {

    /** Sleeping time before displaying. */
    public int sleepTime;

    /** Parameter number. */
    public int number;

    /**
     * @see JavaExecutable#execute(TaskResult[])
     */
    @Override
    public Serializable execute(TaskResult... results) throws Throwable {
        String message = null;

        try {
            getErr().println("Task " + number + " : Test STDERR");
            getOut().println("Task " + number + " : Test STDOUT");

            message = "Task " + number;
            int st = 0;
            while (st < sleepTime) {
                Thread.sleep(1000);
                try {
                    setProgress((st++) * 100 / sleepTime);
                } catch (IllegalArgumentException iae) {
                    setProgress(100);
                }
            }

        } catch (Exception e) {
            message = "crashed";
            e.printStackTrace(getErr());
        }

        getOut().println("Terminate task number " + number);

        return ("No." + this.number + " hi from " + message + "\t slept for " + sleepTime + " Seconds");
    }

}

3.5. Run a workflow

To run a workflow, the user submits it to the ProActive Scheduler. It will be possible to choose the values of all Job Variables when submitting the workflow (see section Job Variables). A verification is performed to ensure the well-formedness of the workflow. A verification will also be performed to ensure that all variables are valid according to their model definition (see section Variable Model). Next, a job is created and inserted into the pending queue and waits until free resources become available. Once the required resources are provided by the ProActive Resource Manager, the job is started. Finally, once the job is finished, it goes to the queue of finished jobs where its result can be retrieved.

You can submit a workflow to the Scheduler using the Workflow Studio, the Scheduler Web Interface or command line tools. For advanced users we also expose REST and JAVA APIs.

During the submission, you will be able to edit Workflow variables, so you can effectively use them to parameterize workflow execution and use workflows as templates.

3.5.1. Job & Task status

During their execution, jobs go through different status.

Table 1. Job Status for Pending and Current Jobs
Status Name Description

PENDING

Pending

The job is waiting to be scheduled. None of its tasks have been Running so far.

RUNNING

Running

The job is running. At least one of its task has been scheduled.

STALLED

Stalled

The job has been launched but no task is currently running.

PAUSED

Paused

The job is paused waiting for user to resume it.

IN_ERROR

In-Error

The job has one or more In-Error tasks that are suspended along with their dependencies. User intervention is required to fix the causing issues and restart the In-Error tasks to resume the job.

Table 2. Job Status for Past Jobs
Status Name Description

FINISHED

Finished

The job is finished. Tasks are finished or faulty.

CANCELED

Cancelled

The job has been canceled because of an exception. This status is used when an exception is thrown by the user code of a task and when the user has asked to cancel the job on exception.

FAILED

Failed

The job has failed. One or more tasks have failed (due to resources failure). There is no more executionOnFailure left for a task.

KILLED

Killed

The job has been killed by the user.

Similarly to jobs, during their execution, tasks go through different status.

Table 3. Task Status
Status Name Description

ABORTED

Aborted

The task has been aborted by an exception on an other task while the task is running. (job is cancelOnError=true). Can be also in this status if the job is killed while the concerned task was running.

FAILED

Resource down

The task is failed (only if maximum number of execution has been reached and the node on which it was started is down).

FAULTY

Faulty

The task has finished execution with error code (!=0) or exception.

FINISHED

Finished

The task has finished execution.

IN_ERROR

In-Error

The task is suspended after first error, if the user has asked to suspend it. The task is waiting for a manual restart action.

NOT_RESTARTED

Could not restart

The task could not be restarted. It means that the task could not be restarted after an error during the previous execution.

NOT_STARTED

Could not start

The task could not be started. It means that the task could not be started due to dependencies failure.

PAUSED

Paused

The task is paused.

PENDING

Pending

The task is in the scheduler pending queue.

RUNNING

Running

The task is executing.

SKIPPED

Skipped

The task was not executed: it was the non-selected branch of an IF/ELSE control flow action.

SUBMITTED

Submitted

The task has just been submitted by the user.

WAITING_ON_ERROR

Faulty…​

The task is waiting for restart after an error (i.e. native code != 0 or exception, and maximum number of execution is not reached).

WAITING_ON_FAILURE

Failed…​

The task is waiting for restart after a failure (i.e. node down).

3.5.2. Job Priority

A job is assigned a default priority of NORMAL but the user can increase or decrease the priority once the job has been submitted. When they are scheduled, jobs with the highest priory are executed first.

The following values are available:

  • IDLE

  • LOWEST

  • LOW

  • NORMAL

  • HIGH can only be set by an administrator

  • HIGHEST can only be set by an administrator

3.6. Retrieve logs

It is possible to retrieve multiple logs from a job, these logs can either be:

  • The standard output/error logs associated with a job.

  • The standard output/error logs associated with a task.

  • The scheduler server logs associated with a job.

  • The scheduler server logs associated with a task.

Unless your account belongs to an administrator group, you can only see the logs of a job that you own.

3.6.1. Retrieve logs from the portal

  • Job standard output/error logs:
    To view the standard output or error logs associated with a job, select a job from the job list and then on the Output tab in the bottom right panel.
    Click on Streaming Output checkbox to auto-fetch logs for running tasks of the entire Job. The logs panel will be updated as soon as new log lines will be printed by this job.
    You cannot select a specific Task in the streaming mode. If you activate streaming while some Tasks are already finished, you will get the logs of those Tasks as well.
    Click on Finished Tasks Output button to retrieve logs for already finished tasks. For all the finished Tasks within the Job, or for the selected Task.
    The command does work when Job is still in Running status, as well as when Job is Finished.
    Logs are limited to 1024 lines. Should your job logs be longer, you can select the Full logs (download) option from the drop down list.

  • Task standard output/error logs:
    To view the standard output or error logs associated with a task, select a job from the job list and a task from the task list.
    Then in the Output tab, choose Selected task from the drop down list.
    Once the task is terminated, you will be able to click on the Finished Tasks Output button to see the standard output or error associated with the task.
    It is not possible to view the streaming logs of single task, only the job streaming logs are available.

  • Job server logs:
    Whether a job is running or finished, you can access the associated server logs by selecting a job, opening the Server Logs tab in the bottom-right panel and then clicking on Fetch logs.
    Server logs contains debugging information, such as the job definition, output of selection or cleaning scripts, etc.

  • Task server logs:
    In the Server Logs tab, you can choose Selected task to view the server logs associated with a single task.

3.6.2. Retrieve logs from the command line

The chapter command line tools explains how to use the command line interface. Once connected, you can retrieve the various logs using the following commands. Server logs cannot be accessed from the command line.

  • Job standard output/error logs: joboutput(jobid)

> joboutput(1955)
[1955t0@precision;14:10:57] [2016-10-27 14:10:057 precision] HelloWorld
[1955t1@precision;14:10:56] [2016-10-27 14:10:056 precision] HelloWorld
[1955t2@precision;14:10:56] [2016-10-27 14:10:056 precision] HelloWorld
[1955t3@precision;14:11:06] [2016-10-27 14:11:006 precision] HelloWorld
[1955t4@precision;14:11:05] [2016-10-27 14:11:005 precision] HelloWorld
  • Task standard output/error logs: taskoutput(jobid, taskname)

> taskoutput(1955,'0_0')
[1955t0@precision;14:10:57] [2016-10-27 14:10:057 precision] HelloWorld
  • Streaming job standard output/error logs: livelog(jobid)

> livelog(2002)
Displaying live log for job 2002. Press 'q' to stop.
> Displaying live log for job 2002. Press 'q' to stop.
[2002t2@precision;15:57:13] [ABC, DEF]

3.7. Retrieve results

Once a job or a task is terminated, it is possible to get its result. Unless you belong to the administrator group, you can only get the result of the job that you own. Results can be retrieved using the Scheduler Web Interface or the command line tools.

When running native application, the task result will be the exit code of the application. Results usually make more sense when using script or Java tasks.

3.7.1. Retrieve results from the portal

In the scheduler portal, select a job, then select a task from the job’s task list. Click on Preview tab in the bottom-right panel.

In this tab, you will see two buttons:

  • Open in browser: when clicking on this button, the result will be displayed in a new browser tab. By default, the result will be displayed in text format. If your result contains binary data, it is possible to specify a different display mode using <<_assigning_metadata_to_task_result,Result Metadata>.
    If the task failed, when clicking on the Open in browser button, the task error will be displayed.

  • Save as file: when clicking on this button, the result will be saved on disk in binary format. By default, the file name will be generated automatically using the job and task ids, without an extension. It is possible to customize this behavior and specify in the task a file name, or a file extension using Result Metadata.

The following example gets one png image and add the metadata to help the browser display it and add a name when downloading.

file = new File(fileName)
result = file.getBytes()
resultMetadata.put("file.name", fileName)
resultMetadata.put("content.type", "image/png")

3.7.2. Retrieve results from the command line

The chapter command line tools explains how to use the command line interface. Once connected, you can retrieve the task or job results:

  • Result of a single task: taskresult(jobid, taskname)

> taskresult(2002, 'Merge')
task('Merge') result: true
  • Result of all tasks of a job: jobresult(jobid)

> jobresult(2002)
job('2002') result:
Merge : true
Process*1 : DEF
Process : ABC
Split : {0=abc, 1=def}

4. ProActive Studio

ProActive Workflow Studio is used to create and submit workflows graphically. The Studio allows to simply drag-and-drop various task constructs and draw their dependencies to form complex workflows. It also provides various flow control widgets such as conditional branch, loop, replicate etc to construct workflows with dynamic structures.

The studio usage is illustrated in the following example.

4.1. A simple example

The quickstart tutorial on try.activeeon.com shows how to build a simple workflow using script tasks.

Below is an example of a workflow created with the Studio:

GI documentation finance url

In the left part, are illustrated the General Parameters of the workflow with the following information:

  • Name: the name of the workflow.

  • Project: the project name to which belongs the workflow.

  • Description: the textual description of the workflow.

  • Documentation: if the workflow has a Generic Information named "Documentation", then its URL value is displayed as a link.

  • Job Priority: the priority assigned to the workflow. It is by default set to NORMAL, but can be increased or decreased once the job is submitted.

In the right part, we see the graph of dependent tasks composing the workflow. Each task can be selected to show the task specific attributes and defintions.

Finally, above the workflow graph, we see the various palettes which can be used to drag & drop sample task definitions or complex constructs.

The following chapters describe the three main type of tasks which can be defined inside ProActive Workflows.

4.2. Use the ProActive Catalog from the Studio

The GUI interaction with the Catalog can be done in two places: the Studio and Catalog Portal. The portals follow the concepts of the Catalog: Workflows are stored inside buckets, a workflow has some metadata and can have several revisions.

The Catalog view in the Studio or in the Catalog Portal allows to browse all the workflows in the Catalog grouped by buckets and to list all the revisions of a workflow along with their commit message.

Inside the Studio, there is a Catalog menu from which a user can directly interact with the Catalog to import or publish Workflows.

studio catalog menu

Additionally, the Palette of the Studio lists the user’s favourite buckets to allow easy import of workflows from the Catalog with a simple Drag & Drop.

studio palette

4.2.1. Publish current Workflow to the Catalog

Workflows created in the studio can be saved inside a bucket in the Catalog by using the Publish current Workflow to the Catalog action. When saving a Workflow in the Catalog, users can add a specific commit message. If a Workflow with the same name already exists in the specified bucket, a new revision of the Workflow is created. We recommend to always specify commit messages at any commit for an easier differentiation between stored versions.

4.2.2. Get a Workflow from the Catalog

When the Workflow is selected from Catalog you have two options:

  1. Open as a new Workflow: Open the Workflow from Catalog as a new Workflow in Studio.

  2. Append to current Workflow: Append the selected Workflow from the Catalog to the Workflow already opened inside the Studio.

4.2.3. Add Bucket Menu to the Palette

The Palette makes it easier to use workflows stored in a specific Bucket of the Catalog as templates when designing a Workflow.

To add a bucket from the Catalog as a dropdown menu in the Palette: Click on ViewAdd Bucket Menu to the Palette or you press the + button in the palette. A window will show up to ask you which bucket you want to add to the Palette as in the image below.

studio catalog main menu3

The selected Data Connectors Bucket now appears as a menu in the Studio. See the image below.

studio catalog main menu4

You can repeat the operation as many times as you wish in order to add other specific Menus you might need for your workflow design. See the image below.

studio catalog main menu5

4.2.4. Change Palette Preset

The Studio offers the possibility to set a Preset for the Palette. This makes it easier to load the Palette with a predefined set of default buckets. Presets are generally arranged by theme (Basic Examples, Machine Learning, Deep Learning, Big Data, etc.) The Palette’s preset can be changed by clicking ViewChange Palette Preset. This will show a window with the current list of presets as in the image below.

studio catalog main menu1

After changing the preset, its name appears at the top of the Studio and all its buckets are added to the Palette.

studio catalog main menu2

5. Workflow concepts

Workflows come with constructs that help you distribute your computation. The tutorial Advanced workflows is a nice introduction to workflows with ProActive.

The following constructs are available:

  • Dependency

  • Replicate

  • Branch

  • Loop

Use the Workflow Studio to create complex workflows, it is much easier than writing XML!

5.1. Dependency

Dependencies can be set between tasks in a TaskFlow job. It provides a way to execute your tasks in a specified order, but also to forward result of an ancestor task to its children as parameter. Dependency between tasks is then both a temporal dependency and a data dependency.

flow spec dependency

Dependencies between tasks can be added either in ProActive Workflow Studio or simply in workflow XML as shown below:

<taskFlow>
    <task name="task1">
        <scriptExecutable>
            <script>
                <code language="groovy">
                    println "Executed first"
                </code>
            </script>
        </scriptExecutable>
    </task>
    <task name="task2">
        <depends>
            <task ref="task1"/>
        </depends>
        <scriptExecutable>
            <script>
                <code language="groovy">
                    println "Now it's my turn"
                </code>
            </script>
        </scriptExecutable>
    </task>
</taskFlow>

5.2. Replicate

The replication allows the execution of multiple tasks in parallel when only one task is defined and the number of tasks to run could change.

flow spec duplicate
  • The target is the direct child of the task initiator.

  • The initiator can have multiple children; each child is replicated.

  • If the target is a start block, the whole block is replicated.

  • The target must have the initiator as only dependency: the action is performed when the initiator task terminates. If the target has an other pending task as dependency, the behavior cannot be specified.

  • There should always be a merge task after the target of a replicate: if the target is not a start block, it should have at least one child, if the target is a start block, the corresponding end block should have at least one child.

  • The last task of a replicated task block (or the replicated task if there is no block) cannot perform a branching or replicate action.

  • The target of a replicate action cannot be tagged as end block.

  • The current replication index (from to 0 to runs) can be accessed via the PA_TASK_REPLICATION variable.

If you are familiar with programming, you can see the replication as forking tasks.

5.3. Branch

The branch construct provides the ability to choose between two alternative task flows, with the possibility to merge back to a common flow.

flow spec if
  • There is no explicit dependency between the initiator and the if/else targets. These are optional links (ie. A → B or E → F) defined in the if task.

  • The if and else flows can be merged by a continuation task referenced in the if task, playing the role of an endif construct. After the branching task, the flow will either be that of the if or the else task, but it will be continued by the continuation task.

  • If and else targets are executed exclusively. The initiator however can be the dependency of other tasks, which will be executed normally along the if or the else target.

  • A task block can be defined across if, else and continuation links, and not just plain dependencies (i.e. with A as start and F as end).

  • If using no continuation task, the if and else targets, along with their children, must be strictly distinct.

  • If using a continuation task, the if and else targets must be strictly distinct and valid task blocks.

  • if, else and continuation tasks (B, D and F) cannot have an explicit dependency.

  • if, else and continuation tasks cannot be entry points for the job, they must be triggered by the if control flow action.

  • A task can be target of only one if or else action. A continuation task can not merge two different if actions.

If you are familiar with programming, you can see the branch as an if/then/else.

5.4. Loop

The loop provides the ability to repeat a set of tasks.

flow spec loop
  • The target of a loop action must be a parent of the initiator following the dependency chain; this action goes back to a previously executed task.

  • Every task is executed at least once; loop operates in a do…​while fashion.

  • The target of a loop should have only one explicit dependency. It will have different parameters (dependencies) depending if it is executed for the first time or not. The cardinality should stay the same.

  • The loop scope should be a task block: the target is a start block task, and the initiator its related end block task.

  • The current iteration index (from 0 to n until loop is false) can be accessed via the PA_TASK_ITERATION variable.

If you are familiar with programming, you can see the loop as a do/while.

5.5. Task Blocks

Workflows often relies on task blocks. Task blocks are defined by pairs of start and end tags.

  • Each task of the flow can be tagged either start or end

  • Tags can be nested

  • Each start tag needs to match a distinct end tag

Task blocks are very similar to the parenthesis of most programming languages: anonymous and nested start/end tags. The only difference is that a parenthesis is a syntactical information, whereas task blocks are semantic.

The role of task blocks is to restrain the expressiveness of the system so that a workflow can be statically checked and validated. A treatment that can be looped or iterated will be isolated in a well-defined task block.

  • A loop flow action only applies on a task block: the initiator of the loop must be the end of the block, and the target must be the beginning.

  • When using a continuation task in an if flow action, the if and else branches must be task blocks.

  • If the child of a replicate task is a task block, the whole block will be replicated and not only the child of the initiator.

5.6. Variables

Variables can come at hand while developing ProActive applications. Propagate data between tasks, customize jobs, and store useful debug information are a few examples of how variables can ease development. ProActive has basically two groups of variables:

5.6.1. Workflow variables

Workflow variables are declared in the XML job definition. Within the definition we can have variables in several levels, at job level the variables are visible and shared by all tasks. At task level, variables are visible only within the task execution context.

Job variables

In a workflow, you can define job variables that are shared and visible by all tasks.

A job variable is defined using the following attributes:

  • name: the variable name.

  • value: the variable value.

  • model: the variable value type or model (optional). See section Variable Model.

  • description: the variable description (optional). The description can be written using html syntax (e.g. This variable is <b>mandatory</b>.).

  • group: the group name associated with the variable (optional). When the workflow has many variables, defining variable groups helps understanding the various parameters. Variables with no associated group are called Main Variables.

  • advanced: when a variable is advanced, it is not displayed by default in the workflow submission form. A checkbox allows the user to display or hide advanced variables.

  • hidden: a hidden variable is not displayed to the user in the workflow submission form. Hidden variables have the following usages:

Variables are very useful to use workflows as templates where only a few parameters change for each execution. Upon submission you can define variable values in several ways from CLI, using the ProActive Workflow Studio, directly editing the XML job definition, or even using REST API.

The following example shows how to define a job variable in the XML:

<job ... >
    <variables>
        <variable name="variable" value="value" model=""/>
    </variables>
    ...
</job>

Job variables can be referenced anywhere in the workflow, including other job variables.

The syntax for referencing a variable is the pattern ${variable_name} (case-sensitive), for example:

<nativeExecutable>
    <staticCommand value="${variable}" />
</nativeExecutable>
Task variables

Similarly to job variables, Task variables can be defined within task scope in the job XML definition. Task variables scope is strictly limited to the task definition.

A task variable is defined using the following attributes:

  • name: the variable name.

  • value: the variable value.

  • inherited: asserts when the content of this variable is propagated from a job variable or a parent task. If true, the value defined in the task variable will only be used if no variable with the same name is propagated. Value field can be left empty however it can also work as a default value in case a previous task fails to define the variable.

  • model: the variable value type or model (optional). See section Variable Model

  • description: the variable description (optional). The description can be written using html syntax (e.g. This variable is <b>mandatory</b>.).

  • group: the group name associated with the variable (optional). When the workflow has many variables, defining variable groups helps understanding the various parameters. Variables with no associated group are called Main Variables.

  • advanced: when a variable is advanced, it is not displayed by default in the workflow submission form. A checkbox allows the user to display or hide advanced variables.

  • hidden: a hidden variable is not displayed to the user in the workflow submission form. Hidden variables have the following usages:

For example:

<task ... >
    <variables>
        <variable name="variable" value="value" inherited="false" model=""/>
    </variables>
    ...
</task>

Task variables can be used similarly to job variables using the pattern ${variable_name} but only inside the task where the variable is defined.

Task variables override job variables, this means that if a job variable and a task variable are defined with the same name, the task variable value will be used inside the task (unless the task variable is marked as inherited), and the job variable value will be used elsewhere in the job.

Global Variables

Global variables are job variables that are configured inside ProActive server and apply to all workflows or to certain categories of workflows (e.g. workflows with a given name).

See Configure Global Variables and Generic Information section to understand how global variables can be configured on the ProActive server.

Variable Model

Job and Task variables can define a model attribute which let the workflow designer control the variable value syntax. A workflow designer can for example decide that the variable NUMBER_OF_ENTRIES is expected to be convertible to an Integer ranged between 0 and 20.

The workflow designer will then provide a default value to the variable NUMBER_OF_ENTRIES, for example "10". This workflow is valid as "10" can be converted to an Integer and is between 0 and 20.

When submitting the workflow, it will be possible to choose a new value for the variable NUMBER_OF_ENTRIES. If the user submitting the workflow chooses for example the value -5, the validation will fail and an error message will appear.

Available Models

The following list describes the various model syntaxes available:

  • PA:BOOLEAN : variable is either "true", "false", "0" or "1".

  • PA:INTEGER , PA:INTEGER[min,max] : variable can be converted to java.lang.Integer, and eventually is contained in the range [min, max].
    Examples: PA:INTEGER will accept "-5" but not "1.4", PA:INTEGER[0,20] will accept "12" but not "25".

  • PA:LONG , PA:LONG[min,max] : same as above with java.lang.Long.

  • PA:FLOAT , PA:FLOAT[min,max] : same as above with java.lang.Float.
    Examples: PA:FLOAT[-0.33,5.99] will accept "3.5" but not "6".

  • PA:DOUBLE , PA:DOUBLE[min,max] : same as above with java.lang.Double.

  • PA:SHORT , PA:SHORT[min,max] : same as above with java.lang.Short.

  • PA:URL : variable can be converted to java.net.URL.
    Examples: PA:URL will accept "http://mysite.com" but not "c:/Temp".

  • PA:URI : variable can be converted to java.net.URI.
    Examples: PA:URI will accept "/tmp/file" but not "c:\a^~to" due to invalid characters.

  • PA:DATETIME(format) , PA:DATETIME(format)[min,max] : variable can be converted to a java.util.Date using the format specified (see the format definition syntax in the SimpleDateFormat class).
    A range can also be used in the PA:DATETIME model. In that case, each bound of the range must match the date format used.

    Examples:
    PA:DATETIME(yyyy-MM-dd) will accept "2014-12-01" but not "2014".
    PA:DATETIME(yyyy-MM-dd)[2014-01-01, 2015-01-01] will accept "2014-12-01" but not "2015-03-01".
    PA:DATETIME(yyyy-MM-dd)[2014, 2015] will result in an error during the workflow definition as the range bounds [2014, 2015] are not using the format yyyy-MM-dd.

  • PA:LIST(item1,item2,…​) : variable must be one of the values defined in the list.
    Examples: PA:LIST(a,b,c) will accept "a", "b", "c" but no other value.

  • PA:REGEXP(pattern) : variable syntax must match the regular expression defined in the pattern. The regular expression syntax is described in class Pattern.
    Examples: PA:REGEXP([a-z]+) will accept "abc", "foo", but not "Foo".

  • PA:MODEL_FROM_URL(url) : variable syntax must match the model fetched from the given URL. This can be used for example when the model needs to represent a list of elements which may evolve over time and is updated inside a file. Such as a list of machines in an infrastructure, a list of users, etc.
    See Variable Model using Resource Manager data for premade models based on the Resource Manager state.
    Examples: PA:MODEL_FROM_URL(file:///srv/machines_list_model.txt), if the file machines_list_model.txt contains PA:LIST(host1,host2), will accept only "host1" and "host2", but may accept other values as the machines_list_model file changes.

  • PA:CRON : variable syntax must be a valid cron expression as defined in the cron4j manual.
    Examples: PA:CRON will accept "5 * * * *" but not "* * * *" (missing minutes sub-pattern).

  • PA:SPEL(SpEL expression) : variable syntax will be evaluated by a SpEL expression. Refer to next paragraph.

  • PA:CATALOG_OBJECT, PA:CATALOG_OBJECT(kind), PA:CATALOG_OBJECT(,contentType), PA:CATALOG_OBJECT(kind,contentType) : variable syntax must be a valid expression that matches the following pattern: bucketName/objectName[/revision]. Note that the revision sub-pattern is a hash code number represented by 13 digit.
    A filter can be specified in the PA:CATALOG_OBJECT model. In that case, the variable value must be a catalog object which matches the kind and content type requirements. Note that these filters are case insensitive and require a "startsWith" matching. That is, the kind and the content type of the provided catalog object must start with the filters specified in the model. The scheduler server verifies that the object exists in the catalog and fulfills the requirements when the workflow is submitted.
    Examples: PA:CATALOG_OBJECT will accept "bucket-example/object-example/1539310165443" and "bucket-example/object-example" but not "bucket-example/object-example/153931016" (invalid revision number) and "bucket-example/" (missing object name). PA:CATALOG_OBJECT(Script,text/x-python) will only accept a catalog object which is a Python script but not a workflow object or a Groovy script.

  • PA:JSON : variable syntax must be a valid JSON expression as defined in JSON doc.
    Examples: PA:JSON will accept {"name": "John", "city":"New York"} and empty values like {} or [{},{}], but not ["test" : 123] (Unexpected character ':') and {test : 123} (Unexpected character 't').

  • PA:NOT_EMPTY_STRING : variable must be provided with a non-empty string value.

  • PA:HIDDEN : variable which allows the user to securely enter its value (i.e., each character is shown as an asterisk, so that it cannot be read.) while submitting the workflow.

  • PA:CREDENTIAL : variable whose value is a key of the ProActive Scheduler Third-Party Credentials (which are stored on the server side in encrypted form). The variable allows the user to access a credential through credentials.get(variables.get("MY_CRED_KEY")) instead of in plain-text in the workflow definition. At workflow submission, the scheduler server verifies that the key exists in the 3rd party credentials of the user.

  • PA:GLOBAL_FILE : variable whose value is the relative path of a file in the Global Data Space. At workflow submission, the scheduler server verifies that the file exists in the global dataspace.

  • PA:USER_FILE : variable whose value is the relative path of a file in the User Data Space. At workflow submission, the scheduler server verifies that the file exists in the user dataspace.

  • PA:GLOBAL_FOLDER : variable whose value is the relative path of a folder in the Global Data Space. At workflow submission, the scheduler server verifies that the folder exists in the global dataspace. Note, the variable value should not end with a slash to avoid the problem of duplicate slash in its usage.

  • PA:USER_FOLDER : variable whose value is the relative path of a folder in the User Data Space. At workflow submission, the scheduler server verifies that the folder exists in the user dataspace. Note, the variable value should not end with a slash to avoid the problem of duplicate slash in its usage.

Variable Model (Type) using a type defined dynamically in another Variable

A Variable can use as its type a model that is defined in another variable.

To use such possibility, the workflow designer can simply use in the Model definition of another specific variable name proceeded by the character $.

When submitting the workflow, the user will have the ability to select the model dynamically by changing the value of the referenced variable. And then will be able to select the value of the first variable according to the selected type.

For example, if we have:
variable1 has as its model PA:LIST(PA:GLOBAL_FILE, PA:INTEGER)
variable2 has as its model $variable1
Then the model of variable2 is the value that the variable variable1 will have in runtime. Thus, it will be either PA:GLOBAL_FILE or PA:INTEGER.

Optional Variable

To define an optional variable, the workflow designer can simply add ? at the end of the model attribute, such as PA:INTEGER?.

When submitting the workflow, it will be allowed to not provide a value for the optional variables. The validation will only fail when the user fills in an invalid value.

For example, a variable MY_OPTIONAL_INTEGER defined as the model PA:INTEGER? will accept an empty string as the variable value, but it will refuse 1.4.

All the available model syntaxes, except PA:NOT_EMPTY_STRING support to be defined as optional.

Variable Model using Resource Manager data

ProActive Resource Manager provides a set of REST endpoints which allow to create dynamic models based on the Resource Manager state.

These models are summarized in the following table. Models returned are PA:LIST types which allow to select a value in ProActive portals through a drop-down list. The list always contain an empty value choice.

Table 4. Resource Manager Models
Metric Name Description Model Syntax Example returned data

Hosts

All machine host names or ip addresses registered in the Resource Manager

PA:MODEL_FROM_URL(${PA_SCHEDULER_REST_PUBLIC_URL}/rm/model/hosts)

PA:LIST(,try.activeeon.com,10.0.0.19)

Node Sources

All node sources registered in the Resource Manager

PA:MODEL_FROM_URL(${PA_SCHEDULER_REST_PUBLIC_URL}/rm/model/nodesources)

PA:LIST(,Default,LocalNodes,GPU,Kubernetes)

Tokens

All tokens registered in the Resource Manager (across all registered ProActive Nodes). See NODE_ACCESS_TOKEN.

PA:MODEL_FROM_URL(${PA_SCHEDULER_REST_PUBLIC_URL}/rm/model/tokens)

PA:LIST(,token1,token2)

Spring Expression Language Model

The PA:SPEL(expr) model allows to define expressions able to validate a variable value or not. Additionally, this model can be used to validate multiple variable values or to dynamically update other variables. The syntax of the SpEL expression is defined by the Spring Expression Language reference.

For security concerns, we apply a restriction on the authorized class types. Besides the commonly used data types (Boolean, String, Long, Double, etc.), we authorize the use of ImmutableSet, ImmutableMap, ImmutableList, Math, Date types, JSONParser and ObjectMapper for JSON type and DocumentBuilderFactory for XML type.

In order to interact with variables, the expression has access to the following properties:

  • #value: this property will contain the value of the current variable defined by the user.

  • variables['variable_name']: this property array contains all the variable values of the same context (for example of the same task for a task variable).

  • models['variable_name']: this property array contains all the variable models of the same context (for example of the same task for a task variables).

  • valid: can be set to true or false to validate or invalidate a variable.

  • temp: can be set to a temporary object used in the SpEL expression.

  • tempMap: an empty Hash Map structure which can be populated and used in the SpEL expression.

The expression has also access to the following functions (in addition to the functions available by default in the SpEL language):

  • t(expression): evaluate the expression and return true.

  • f(expression): evaluate the expression and return false.

  • s(expression): evaluate the expression and return an empty string.

  • hideVar('variable name'): hides the variable given in parameter. Used to build Dynamic form example using SpEL. Returns true to allow chaining actions.

  • showVar('variable name'): shows the variable given in parameter. Used to build Dynamic form example using SpEL. Returns true to allow chaining actions.

  • hideGroup('group name'): hides all variables belonging to the variable group given in parameter. Used to build Dynamic form example using SpEL. Returns true to allow chaining actions.

  • showGroup('group name'): shows all variables belonging to the variable group given in parameter. Used to build Dynamic form example using SpEL. Returns true to allow chaining actions.

The SpEL expression must either:

  • return a boolean value, true if the value is correct, false otherwise.

  • set the valid property to true or false.

Any other behavior will raise an error.

  • Example of SpEL simple validations:

    PA:SPEL(#value == 'abc') : will accept the value if it’s the 'abc' string
    PA:SPEL(T(Integer).parseInt(#value) > 10) : will accept the value if it’s an integer greater than 10.

    Note that #value always contains a string and must be converted to other types if needed.

  • Example of SpEL multiple validations:

    PA:SPEL(variables['var1'] + variables['var2'] == 'abcdef') : will be accepted if the string concatenation of variable var1 and var2 is 'abcdef'.
    PA:SPEL(T(Integer).parseInt(variables['var1']) + T(Integer).parseInt(variables['var2']) < 100) : will be accepted if the sum of variables var1 and var2 are smaller than 100.

  • Example of SpEL variable inference:

    PA:SPEL( variables['var2'] == '' ? t(variables['var2'] = variables['var1']) : true ) : if the variable var2 is empty, it will use the value of variable var1 instead.

  • Example of SpEL variable using ObjectMapper type:

    PA:SPEL( t(variables['var1'] = new org.codehaus.jackson.map.ObjectMapper().readTree('{"abc": "def"}').get('abc').getTextValue()) ) : will assign the value 'def' to the variable var1.

  • Example of SpEL variable using DocumentBuilderFactory type:

    PA:SPEL( t(variables['var'] = T(javax.xml.parsers.DocumentBuilderFactory).newInstance().newDocumentBuilder().parse(new org.xml.sax.InputSource(new java.io.StringReader('<employee id="101"><name>toto</name><title>tata</title></employee>'))).getElementsByTagName('name').item(0).getTextContent()) ): will assign the value 'toto' to the variable var1.

    the SpEL expression must return a boolean value, this is why in the above expressions we use the t(expression) function to perform affectations and return a boolean true value.
Dynamic form example using SpEL

Using SpEL expressions, it is possible to show or hide variables based on the values of other variables. Thus, it allows to create dynamic forms.

Consider the following variables definition (attributes xml escaping has been removed for clarity):

  <variables>
    <variable name="type" value="vegetable" model="PA:LIST(vegetable,fruit)" description="" group="" advanced="false" hidden="false"/>
    <variable name="potatoes" value="0" model="PA:INTEGER" description="Amount of potatoes to order (in kilograms)" group="vegetables" advanced="false" hidden="false"/>
    <variable name="leek" value="0" model="PA:INTEGER" description="Amount of leek to order (in kilograms)" group="vegetables" advanced="false" hidden="false"/>
    <variable name="apples" value="0" model="PA:INTEGER" description="Amount of apples to order (in kilograms)" group="fruits" advanced="false" hidden="true"/>
    <variable name="oranges" value="0" model="PA:INTEGER" description="Amount of oranges to order (in kilograms)" group="fruits" advanced="false" hidden="true"/>
    <variable name="type_handler" value="" model="PA:SPEL(variables['type'] == 'vegetable' ? showGroup('vegetables') && hideGroup('fruits') : showGroup('fruits') && hideGroup('vegetables'))" description="" group="" advanced="false" hidden="true"/>
  </variables>

The first variable type presents a choice to the user : select fruits or vegetables. The last variable type_handler, which is hidden to the user, analyses this choice and displays either variables belonging to the fruits group or the vegetables group. The SpEL model associated with type_handler performs this operation:

PA:SPEL(variables['type'] == 'vegetable' ? showGroup('vegetables') && hideGroup('fruits') : showGroup('fruits') && hideGroup('vegetables'))

When type is equal to fruit, then the variables belonging to the vegetables group are hidden, and the variables belonging to the fruits group are shown. Respectively, the vegetables group is shown and the fruits group is hidden when type is equal to vegetable.

The complete workflow example can be downloaded here.

Here is how the variables are displayed when submitting the workflow:

Vegetables

Fruits

5.6.2. Dynamic Variables

As opposed to Workflow variables, dynamic variables are created or manipulated directly when executing workflow tasks scripts, through the use of the variables script binding map (see the Script Bindings chapter or Script Bindings Reference for more information about script bindings). We have mainly two types of dynamic variables:

ProActive system variables

Some variables are implicitly defined by the Scheduler to retrieve runtime information about a job or a task.

Here is the list of ProActive system variables:

Table 5. ProActive variables

Variable name

 Description

Type

PA_JOB_ID

The current job ID.

String

PA_JOB_NAME

The current job name.

String

PA_TASK_ID

The current task ID.

String

PA_TASK_NAME

The current task name.

String

PA_TASK_ITERATION

The current iteration index, when using looping, starts at 0.

Integer

PA_TASK_REPLICATION

The current iteration index, when using a replication, starts at 0.

Integer

PA_TASK_PROGRESS_FILE

The path to the progress file, used to set the task’s progress.

You can import and use the utility class org.ow2.proactive.scripting.helper.progress.ProgressFile to read/write values to the file identified by this variable.

String

PA_SCHEDULER_HOME

The path to Scheduler home, where the Scheduler or the Node is installed.

String

PA_NODESFILE

The path to the hostfile when using a multi nodes task.

String

PA_NODESNUMBER

The number of acquired nodes when using a multi nodes task.

Integer

PA_USER

The username of the ProActive user who has submitted the job.

String

PA_SCHEDULER_REST_URL

The URL of scheduler REST api.

String

PA_SCHEDULER_REST_PUBLIC_URL

The URL of scheduler REST public api if the setting pa.scheduler.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini.

String

PA_CATALOG_REST_URL

The URL of catalog REST api.

String

PA_CATALOG_REST_PUBLIC_URL

The URL of catalog REST public api if the setting pa.catalog.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini.

String

PA_CLOUD_AUTOMATION_REST_URL

The URL of Service Automation REST api.

String

PA_CLOUD_AUTOMATION_REST_PUBLIC_URL

The URL of Service Automation REST public api if the setting pa.cloud-automation.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini.

String

PA_JOB_PLANNER_REST_URL

The URL of job planner REST api.

String

PA_JOB_PLANNER_REST_PUBLIC_URL

The URL of job planner REST public api if the setting pa.job-planner.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini.

String

PA_NOTIFICATION_SERVICE_REST_URL

The URL of notification service REST api.

String

PA_NOTIFICATION_SERVICE_REST_PUBLIC_URL

The URL of notification service REST public api if the setting pa.notification-service.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini.

String

They can be used inside the workflow with the pattern syntax, for example:

<task ...>
    ...
     <forkEnvironment workingDir="/opt/${PA_JOB_ID}"></forkEnvironment>
    ...
</task>
Script variables

In addition to the ability to declare variables directly inside job XML definition, it is also possible to dynamically read and write new variables while executing a task script with the variables map. This map of variables is bound to hash type depending on the script engine you are using, for instance in native Java as Map. In Groovy as a global map, see below:

String var = variables.get("one_variable")
variables.put("other_variable", "foo")

In the Groovy example above the first line retrieve the value of variable one_variable from the variables map. The second line create a new entry other_variable with value foo. The variables map is propagated down the execution chain. If a task modifies a variable in the variables map, or add a new variable, all dependent tasks will have access to this modification. For example:

// task1
String var = variables.get("one_variable")
variables.put("one_variable", "foo")
variables.put("other_variable", "bar")
// task2 depends on task1
println variables.get("one_variable") // will display "foo"
println variables.get("other_variable") // will display "bar"

If a task depends on several tasks and each task modifies the same variable, the final value of the variable which is propagated down the execution chain, depends on the order of task execution. Therefore, the users need to take appropriate measures to prevent any undesired effects such as race conditions.

System variables can also be accessed from the variables map, for example:

println variables.get("PA_JOB_ID") // will display the id of the current job

If using a Bash shell script, variables are accessible through environment variables.

#!/bin/bash
echo $variables_PA_TASK_NAME
In native tasks and Bash/CMD script engines, variables can be read but not written to.

For Java native script you can set any Java serializable object as a variable value. They will be converted into strings using toString() method when required, for instance, to make those values available as environment variables in native tasks.

The field variable tag has an inherited field, if this field is set as true a task variable will read its value from the variables map instead of the value field in the XML definition. In this case, the value field becomes optional and work as a default value. Below a XML schema that shows two tasks: first and second. Task first insert a new variable using variables.put("inherited_var", "somevalue"). Task second declares inherited_var but with defaultvalue, this value will be overwritten by the first task, variables.put("inherited_var", "somevalue"). Defining a default value might be useful if for some reason the first task fails before inserting the inherited_var on variables map. In this last case the defaultvalue remains unchanged.

<task name="first" >
    <scriptExecutable>
        <script>
            <code language="groovy">
                <![CDATA[
                variables.put("inherited_var", "somevalue")
                ]]>
            </code>
        </script>
    </scriptExecutable>
</task>
...
<task name="second">
    <variables>
        <variable name="inherited_var" value="defaultvalue" inherited="true"/>
    </variables>
    <depends>
        <task ref="before"/>
    </depends>
...
</task>
Dynamic replacements

We’ve seen in Job variables that we can use the ${varname} syntax to create and access variables. Dynamic replacement is in charge to resolve variables just before the execution of task script, or in case of job variables, just before submitting the job. We can also combine recursive definitions of variables.

For example, one could write:

<job ...>
    ...
    <variables>
        <variable name="job_var" value="hello" model=""/>
        <variable name="job_var2" value="${job_var}" model=""/>
        <variable name="job_var3" value="${job_var2}" model=""/>
    </variables>
    ...
</job>

The dynamic replacement will resolve job_var3 to hello just before submitting the job to the scheduler.

We can use job variable references in task variables but not otherwise.

When dynamic replacing a task script variable the resolution happens just before running the task. So we can use the variables map value. The example below will print "hello world" because hello is recursively resolved at job level assigning hello world to inherited_var. When task_2 starts its local variable task_var2 is dynamically replaced by hello world, i.e., the content of inherited_var that was inserted in the variables map by the previous task.

<job ... >
  <variables>
    <variable name="job_var" value="hello" model=""/>
    <variable name="job_var2" value="${job_var}" model=""/>
  </variables>
  <taskFlow>
    <task name="task_1" >
      <scriptExecutable>
        <script>
          <code language="groovy">
            <![CDATA[
            variables.put("inherited_var", "\${job_var2} world")
            ]]>
          </code>
        </script>
      </scriptExecutable>
    </task>
    <task name="task_2">
      <variables>
        <variable name="task_var2" value="${inherited_var}" inherited="false"/>
      </variables>
      <depends>
        <task ref="task_1"/>
      </depends>
      <scriptExecutable>
        <script>
          <code language="groovy">
            <![CDATA[
            println ""+variables.get("task_var2")
            ]]>
          </code>
        </script>
      </scriptExecutable>
    </task>
  </taskFlow>
</job>

Note that we can combine recursive definitions of variables.

For example, one could write:

<task ...>
    ...
     <forkEnvironment workingDir="/opt/${DIRECTORY_${PA_TASK_REPLICATION}}"></forkEnvironment>
    ...
</task>

In that case, the variable DIRECTORY_0 or DIRECTORY_1, etc (depending on the replication index) will be used in the working dir attribute.

Pattern variable replacements may be performed at submission time or at execution time:

  • A replacement performed at execution time means that the replacement is executed only when the task enclosing the replacement is executed.

  • A replacement performed at submission time means that the replacement is directly executed when the job is submitted to the scheduler.

Replacements directly using global job or task variables will always be performed at submission time.

In the following example, the description replacement is performed at submission time:

<job ... >
    <variables>
        <variable name="description" value="My ProActive workflow"/>
    </variables>
    ...
    <task ... >
        <description>${description}</description>

    </task>
</job>

Replacements using system variables, such as the workingDir example above, will always be performed at execution time.

5.7. Generic Information

In addition to variables, another key/value structure can be accessed inside a script: Generic Information.

Generic information semantics differ from Job variables semantics in the following way:

  • Generic information can be accessed inside a script, but cannot be modified.

  • Generic information can be defined at job-level or at task-level. If the same generic information is defined both at job-level and at task-level, the latter value takes precedence inside the task scope.

  • Generic information cannot be used directly inside the workflow with the syntax ${} (See Job variables).

  • Generic information are used in general internally by the scheduler, for example to provide information to the scheduling loop on how to handle the task. An example of such generic information is the START_AT info used inside Cron Tasks.

  • Generic information can use in their definition job/task variables patterns, pattern replacements can be done at execution time when using task generic information.

Example of a generic information definition:

For example:

<task ... >
    <genericInformation>
        <info name="ginfo" value="${test}"/>
    </genericInformation>
    ...
</task>

Some generic information can also be configured globally on the ProActive server. These global generic information will then apply to all submitted workflows or to certain categories of workflows (e.g. workflows with a given name). Global generic information are job-level only.

See Configure Global Variables and Generic Information section to understand how global generic information can be configured on the ProActive server.

The following table describes all available generic information:

Table 6. Generic Information

Name

Description

Scope

Example

START_AT

delay a job or task execution at a specified date/time

job-level, task-level

2020-06-20T18:00:00+02:00

PARENT_JOB_ID

contains the id of the parent job (if the current job has been submitted from another workflow)

job-level

24

NODE_ACCESS_TOKEN

execute task(s) on node(s) with token-based usage restriction

job-level, task-level

my_token

EMAIL

send email to recipient(s) based on job state events

job-level

user@example.com

NOTIFICATION_EVENTS

a list of job events associated with email notifications

job-level

JOB_PENDING_TO_RUNNING, JOB_RUNNING_TO_FINISHED

REMOVE_DELAY

once the job is terminated, this setting controls the delay after which it will be removed from the scheduler database

job-level

3d 12h

REMOVE_DELAY_ON_ERROR

once the job is terminated with errors, this setting controls the delay after which it will be removed from the scheduler database. This generic information should be set in addition to REMOVE_DELAY when there is a need to keep the job longer in the scheduler database in case of error.

job-level

3d 12h

JOB_DDL

job deadline used in Earliest Deadline First Scheduling Policy

job-level

+5:30:00

JOB_EXEC_TIME

job expected execution time used in Earliest Deadline First Scheduling Policy

job-level

1:00:00

DISABLE_PTK

skip sub-process cleaning after the task is terminated

task-level

true

WALLTIME

maximum execution time of a task

job-level, task-level

10:00

PRE_SCRIPT_AS_FILE

skip pre-script execution and store its content as a file

task-level

my_prescript.py

NS

run the task on a Native Scheduler Node Source. Value is the Node Source name

task-level

SLURM

NS_BATCH

extra parameters given to the native scheduler command

task-level

-q queue1 -lnodes=2:ppn=2

NS_PRE

command run before the ProActive Node is started inside the Native Scheduler infrastructure

task-level

module load mpi/gcc/openmpi-1.6.4

NODESOURCENAME

run the task on a Node Source controlled by a Dynamic Policy (generally cloud node sources). Value is the Node Source name

task-level

Azure

Documentation

Add a documentation link to the workflow

job-level

http://my-server/my-doc.html

task.documentation

Add a documentation link to the task

task-level

http://my-server/my-doc.html

workflow.icon

Add an icon to the workflow

job-level

http://my-server/my-icon.png

task.icon

Add an icon to the task

task-level

http://my-server/my-icon.png

content.type

Assign a MIME content type to a byte array task result

task-level

image/png

file.name

Assign a file name to a byte array task result

task-level

image_balloon.png

file.extension

Assign a file extension to a byte array task result

task-level

.png

RUNAS_METHOD

Allows overriding the impersonation method used when executing the task. Can be pwd, key or none.

job-level, task-level

pwd

RUNAS_USER

Allows overriding the login name used during the impersonation. This allows to run a task under a different user as the user who submitted the workflow.

job-level, task-level

bob

RUNAS_DOMAIN

Allows defining or overriding a user domain that will be attached to the impersonated user. User domains are only used on Windows operating systems.

job-level, task-level

MyOrganisation

RUNAS_PWD

Allows overriding the password attached to the impersonated user. This can be used only when the impersonation method is set to pwd.

job-level, task-level

MyPassword

RUNAS_PWD_CRED

Similar to RUNAS_PWD but the password will be defined inside Third-Party Credential instead of inlined in the workflow. This method of defining the password should be preferred to RUNAS_PWD for security reasons. The value of RUNAS_PWD_CRED must be the third-party credential name containing the user password.

job-level, task-level

MyPasswordCredName

RUNAS_SSH_KEY_CRED

Allows overriding the SSH private key attached to the impersonated user. This can be used only when the impersonation method is set to key. The private key will be defined inside Third-Party Credential instead of inlined in the workflow. The value of RUNAS_SSH_KEY_CRED must be the third-party credential name containing the SSH key.

job-level, task-level

MySSHKeyCredName

PYTHON_COMMAND

Python command to use in CPython script engine.

job-level, task-level

python3

docker-compose-options

general parameters given to the docker-compose command in Docker Compose script engine.

job-level, task-level

--verbose

docker-compose-up-options

general parameters given to the docker-compose up command in Docker Compose script engine.

job-level, task-level

--exit-code-from helloworld

docker-compose-options-split-regex

declare a string to be used as options separator in Docker Compose script engine.

job-level, task-level

!SPLIT!

docker-actions

actions to perform in Dockerfile script engine.

task-level

build,run

docker-image-tag

tag identifying the docker image in Dockerfile script engine.

task-level

my-image

docker-container-tag

tag identifying the docker container in Dockerfile script engine.

task-level

my-container

docker-build-options

options given to the docker build command in Dockerfile script engine.

job-level, task-level

--no-cache

docker-run-options

options given to the docker run command in Dockerfile script engine.

job-level, task-level

--detach

docker-exec-command

command given to docker exec, if used in docker-actions. See Dockerfile script engine.

job-level, task-level

/bin/sh -c echo 'hello'

docker-exec-options

options given to the docker exec command in Dockerfile script engine.

job-level, task-level

-t -w /my/work/dir

docker-stop-options

options given to the docker stop command in Dockerfile script engine.

job-level, task-level

--time 20

docker-rm-options

options given to the docker rm command in Dockerfile script engine.

job-level, task-level

--volumes

docker-rmi-options

options given to the docker rmi command in Dockerfile script engine.

job-level, task-level

--force

docker-file-options-split-regex

declare a string to be used as options separator in Dockerfile script engine.

job-level, task-level

!SPLIT!

calendar.name

generic information automatically added by the Job Planner. It contains the name of the calendar based on which the job is planned.

job-level

every_10_minutes

next.execution

generic information automatically added by the Job Planner. It contains the next execution date of the job.

job-level

2022-04-26 12:50:00 CEST

5.7.1. START_AT

The START_AT Generic Information can be used to delay a job or task execution at a specified date/time. Its value should be ISO 8601 compliant. See Cron Tasks for more details.

Examples:

  • START_AT = "2020-06-20T18:00:00" will delay the job execution until 20th June 2020 at 6pm GMT.

  • START_AT = "2020-06-20T18:00:00+02:00" will delay the job execution until 20th June 2020 at 6pm GMT+02:00.

START_AT can be defined at job-level (delay the execution of the whole job) or at task-level (delay the execution of a single task).

5.7.2. PARENT_JOB_ID

The PARENT_JOB_ID Generic Information is set automatically by ProActive when the current job has been submitted from another workflow using the Scheduler API. It contains the id of the parent job which submitted the current job.

PARENT_JOB_ID is defined at job-level

5.7.3. NODE_ACCESS_TOKEN

The NODE_ACCESS_TOKEN Generic Information can be used to execute a task or all tasks of a workflow to specific nodes restricted by tokens.

The value of NODE_ACCESS_TOKEN must contain the token value. Workflows or tasks with NODE_ACCESS_TOKEN enabled will run exclusively on nodes containing the token.

See Node Source Policy Parameters for further information on node token restrictions.

NODE_ACCESS_TOKEN can be defined at job-level (applies to all tasks of a workflow) or at task-level (applies to a single task).

5.7.4. Email

Email notifications on job events can be enabled on workflows using the following generic information:

EMAIL: contains the email address(es) of recipient(s) which should be notified.

NOTIFICATION_EVENTS: contains the set of events which should trigger a notification.

These generic information can be defined at job-level only.

See Get Notifications on Job Events for further information.

5.7.5. REMOVE_DELAY

The REMOVE_DELAY generic information can be used to control when a job is removed from the scheduler database after its termination.

The housekeeping mechanism must be configured to allow usage of REMOVE_DELAY.

REMOVE_DELAY overrides the global pa.scheduler.core.automaticremovejobdelay setting for a particular job. It allows a job to be removed either before or after the delay configured globally on the server.

The general format of the REMOVE_DELAY generic information is VVd XXh YYm ZZs, where VV contain days, XX hours, YY minutes and ZZ seconds.

The format allows flexible combinations of the elements:

  • 12d 1h 10m: 12 days, 1 hour and 10 minutes.

  • 26h: 26 hours.

  • 120m 12s: 120 minutes and 12 seconds.

REMOVE_DELAY can be defined at job-level only.

5.7.6. REMOVE_DELAY_ON_ERROR

The REMOVE_DELAY_ON_ERROR generic information can be used to control when a job is removed from the scheduler database after its termination, if the job has terminated with errors.

The housekeeping mechanism must be configured to allow usage of REMOVE_DELAY_ON_ERROR.

REMOVE_DELAY_ON_ERROR overrides the global pa.scheduler.core.automaticremove.errorjob.delay setting for a particular job. It allows a job to be removed either before or after the delay configured globally on the server.

The general format of the REMOVE_DELAY_ON_ERROR generic information is VVd XXh YYm ZZs, where VV contain days, XX hours, YY minutes and ZZ seconds.

The format allows flexible combinations of the elements:

  • 12d 1h 10m: 12 days, 1 hour and 10 minutes.

  • 26h: 26 hours.

  • 120m 12s: 120 minutes and 12 seconds.

REMOVE_DELAY_ON_ERROR can be defined at job-level only.

5.7.7. Earliest Deadline First Policy

The Earliest Deadline First Policy is a Scheduling Policy which can be enabled in the ProActive Scheduler server.

When enabled, this policy uses the following generic information to determine jobs deadlines and expected duration:

  • JOB_DDL: represents the job deadline in absolute (e.g. 2018-08-14T08:40:30+02:00) or relative to submission (e.g. +4:30) format.

  • JOB_EXEC_TIME: represents job expected execution time in the format HH:mm:ss, mm:ss or ss (e.g. 4:30)

See Earliest Deadline First Policy for further information.

JOB_DDL and JOB_EXEC_TIME can be defined at job-level only.

5.7.8. DISABLE_PTK

The DISABLE_PTK Generic Information can be used to prevent the Process Tree Killer from running after a task execution.

Disabling the Process Tree Killer is mostly useful when a task requires to start a backgroud process which must remain alive after the task terminates.

Simply define a DISABLE_PTK=true generic information on any given task to prevent the Process Tree Killer from running.

More information is available in the Task Termination Behavior section.

DISABLE_PTK can be defined at task-level only.

5.7.9. WALLTIME

The WALLTIME Generic Information can be used to enforce a maximum execution time for a task, or all tasks of a workflow.

The general format of the walltime attribute is [hh:mm:ss], where h is hour, m is minute and s is second. The format still allows for more flexibility. We can define the walltime simply as 5 which corresponds to 5 seconds, 10 is 10 seconds, 4:10 is 4 minutes and 10 seconds, and so on.

When used at job-level, the configured walltime will not be applied to the workflow globally but to each individual task of the workflow.

For example, if the walltime is configured at job-level to be ten minutes, each task of the workflow can run no more than ten minutes, but the workflow itself has no time limitation.

As the walltime can also be configured directly in the workflow (xml attribute) or globally on the scheduler server (scheduler property), an order of priority applies.

More information is available in the Maximum execution time for a task section.

WALLTIME can be defined at job-level or task-level.

5.7.10. PRE_SCRIPT_AS_FILE

The PRE_SCRIPT_AS_FILE Generic Information can be used to store a task pre-script into a file and skip its execution. It can be used for example to embed inside a workflow a data file or a file written in a script language not supported by ProActive tasks and delegate its execution to a command-line interpreter.

More information is available in the Save script section.

PRE_SCRIPT_AS_FILE can be defined at task-level only.

5.7.11. Native Scheduler

NS (short for Native Scheduler), NS_BATCH and NS_PRE are Generic Information used to deploy and configure workflow tasks inside a Native Scheduler infrastructure.

  • NS: execute a task associated with this generic information inside a ProActive Node Source interacting with a native scheduler. The value of this generic information must be equal to the node source name. Example: NS=Slurm.

  • NS_BATCH: allows to provide additional parameters to the native scheduler. Example: NS_BATCH=-q queue1 -lnodes=2:ppn=2.

  • NS_PRE: allows to provide a single line command which will be executed before the ProActive Node on the cluster. Example: NS_PRE=module load mpi/gcc/openmpi-1.6.4.

NS, NS_BATCH and NS_PRE can be defined at job-level (applies to all tasks of a workflow) or at task-level (applies to a single task).

5.7.12. NODESOURCENAME

NODESOURCENAME is used to deploy workflow tasks in a Node Source controlled by a Dynamic Policy.

See Dynamic Policy for more information.

NODESOURCENAME can be defined at job-level (applies to all tasks of a workflow) or at task-level (applies to a single task).

5.7.13. Documentation

The Documentation generic information allows to associate an html documentation with a workflow. Its value must contain an URL pointing to the workflow documentation.

Documentation can be defined at job-level only.

The task.documentation generic information allows to associate an html documentation with a task. Its value must contain an URL pointing to the task documentation.

task.documentation can be defined at task-level only.

Documentation and task.documentation values can also be a relative path. In that case, the html file containing the documentation must be put inside SCHEDULER_HOME/dist/war/getstarted/doc.

5.7.14. Icon Management

There are specific generic information that are dedicated to icon management. The icon of a workflow is specified inside the job-level Generic Information using the keyword workflow.icon. The icon of a task is specified inside task-level Generic Information using the keyword task.icon.

These generic information are used in ProActive portals for proper visualization of workflow and task icons.

The value of these generic information can contain either a url or a path to the icon. ProActive server stores by default workflow icons in SCHEDULER_HOME/dist/war/automation-dashboard/styles/patterns/img/wf-icons/.

Example value with the default icon path: /automation-dashboard/styles/patterns/img/wf-icons/postgresql.png

5.7.15. Result Metadata

The following generic information can be used to assign result metadata to a workflow task.

Can only be used if the task result content is an array of bytes.

  • content.type: define the MIME type of the result.

  • file.name: allows to store (Save as) the result from the scheduler or workflow-automation portals as a specific file name.

  • file.extension: allows to store (Save as) the result from the scheduler or workflow-automation portals as a specific file extension with auto-generated file name.

See Assigning metadata to task result for further information.

Result metadata generic information can be defined at task-level only.

5.7.16. PYTHON_COMMAND

When using CPython tasks, the PYTHON_COMMAND generic information can be used to define the command starting the python interpreter.

The interpreter is started by default using the python command, but this generic information can be defined to use for example python3.

See Python script language for further information.

PYTHON_COMMAND generic information should be defined at task-level but can be defined at job-level to apply to all workflow tasks.

5.7.17. Docker Compose options

When using Docker Compose tasks, the following generic information can be used to control options given to docker-compose commands:

  • docker-compose-options: general parameters given to the docker-compose command.

  • docker-compose-up-options: options given to the docker-compose up command.

  • docker-compose-options-split-regex: declare a string to be used as options separator.

See Docker Compose script language for further information.

The Docker Compose generic information should be defined at task-level but can be defined at job-level to apply to all workflow tasks.

5.7.18. Dockerfile options

When using Dockerfile tasks, the following generic information can be used to control options given to docker commands:

  • docker-actions: actions to perform. A comma separated list of possible actions (build, run, exec, stop, rmi). Default is build,run,stop,rmi.

  • docker-image-tag: tag identifying the docker image. Default is image_${PA_JOB_ID}t${PA_TASK_ID}

  • docker-container-tag: tag identifying the docker container. Default is container_${PA_JOB_ID}t${PA_TASK_ID}

  • docker-build-options: options given to the docker build command.

  • docker-run-options: options given to the docker run command.

  • docker-exec-command: command given to docker exec, if used in docker-actions. If the command contains spaces, docker-file-options-split-regex should be used to split command parameters.

  • docker-exec-options: options given to the docker exec command. Default is -t (which should always be included).

  • docker-stop-options: options given to the docker stop command.

  • docker-rm-options: options given to the docker rm command.

  • docker-rmi-options: options given to the docker rmi command.

  • docker-file-options-split-regex: declare a string to be used as options separator, instead of the space character.

See Dockerfile script language for further information.

The Dockerfile generic information should be defined at the task level. Some (docker-file-options-split-regex, or command options) may be defined at the job level to apply to all tasks of the workflow.

5.7.19. Job Planner

The Job Planner automatically adds two generic information to the jobs it submits:

  • calendar.name: the calendar name based on which the job is planned.

  • next.execution: the next execution date of the planned job. The date is formatted with respect to the pattern yyyy-MM-dd HH:mm:ss z and considers the time zone specified in the Job Planner configuration. When no time zone is specified, the default time zone of ProActive server is considered.

5.8. Task result

Another way to propagate data from a task to another is the use of the result script binding. Anywhere in a parent task (usually at the end) you can set a value to a reserved variable named result. This value will be available in the child tasks i.e which are depending on the parent task. If a task has one or several parents, results will always be an array.

Assuming that we have two tasks task1 and task2 written in Groovy:

// task1
result = "task1";
// task2
result = "task2";

and task3 that depends on tasks task1 and task2, then, you can access result values defined by the parents as follows:

// task3
println(results[0]);
// will print "task1"
println(results[1]);
// will print "task2"
results will be aggregated according to the order declared in the dependency list. Consequently, if the xml depends tag of task3 contains the list [task1, task2] (see the xml example below), then results[0] will contain the result of task1 and results[1] the result of task2. On the contrary, if the depends list is [task2, task1], then results[0] will contain the result of task2 and results[1] the result of task1.
<depends>
        <task ref="task1"/>
        <task ref="task2"/>
</depends>

For nearly all script languages, the results variable contains a list of TaskResult java object. In order to access the result value, the value() method of this object must be called. Example for Python/Jython:

print results[0].value()

5.8.1. Assigning metadata to task result

Result metadata can contain additional information associated with the result. In order to store metadata information, you have to use the resultMetadata script binding. As for the result binding, the resultMetadata value of a task is available in its child tasks.

// task1
result = "task1";
resultMetadata.put("mymetadata", "myvalue")
// task2
println(results[0].getMetadata());

It is the user code which specifies the metadata semantics. However, some specific metadata can have a meaning when downloading or previewing results from the Scheduler portal

  • file.name: the name of the file, including the extension, which will be used when storing the result in binary format.

    // fileNameTask
    file = new File("balloon13.png")
    result = file.getBytes()
    resultMetadata.put("file.name","balloon13.png")
  • file.extension: the extension, which will be appended to the automatically generated name, when storing the result in binary format

    // fileExtensionTask
    file = new File("balloon13.png")
    result = file.getBytes()
    resultMetadata.put("file.extension",".png")
  • content.type: the display format, which will be used when previewing the file in the browser. Open the following link for the complete list of mime-types.

    // contentTypeTask
    file = new File("balloon13.png")
    result = file.getBytes()
    resultMetadata.put("content.type","image/png")

A Task result can be added to the list of results of a Job, see section Result List below.

The specific result metadata described above can also be set using Generic Information. For example, defining the generic information content.type = text/html is equivalent to adding this value to the resultMetadata map.

5.9. Job Results

Job Results are composed of two elements: Result List and Result Map. Each task can contribute to both. Once the job is finished, the user can download/visualize Job Results from the Scheduler portal as shown in screenshot below. The label in Result List corresponds to the task name that generated the value.

scheduler result map

5.9.1. Result List

Each task has its own result (see section Task result). Some of intermediate task results may be precious such that the user would like to access them easily at the job level. To do that, the user has to select precious task(s) and check the Task Result Added to Job Result checkbox as shown in the screenshot below. In that case, the task result will be added to the Result List when the job execution is finished.

precious result

In the Job xml schema it is available as preciousResult:

<task name="Init_Task" preciousResult="true" >

5.9.2. Result Map

In addition to the Results List, a Job can also store some results into a dedicated key-value map, which is called Result Map. Every task can contribute to this Result Map. It is a write-only map, in such a way that a task cannot see what other tasks already wrote to the result map. When several tasks write the same key to the result map, the last write always takes precedence. This map contains String as a key, and Serializable as value.

Consider the following groovy example.

Task 1 writes to Result Map:

resultMap.put("filename", "results.txt")
resultMap.put("latestError", "no error")

Task 2 writes to Result Map:

resultMap.put("latestError", "wrong user id")

Once the job is finished, Result Map would contain:

"filename" -> "results.txt"
"latestError" -> "wrong user id"

5.10. Control Flow Scripts

To perform a control flow action such as if, replicate or loop, a Control Flow Script is executed on the ProActive node. This script takes the result of the task as input; meaning a Java object if it was a Java or Script task, or nothing if it was a native task.

The script is executed on the ProActive node, just after the task’s executable. If the executable is a Java Executable and returns a result, the variable result will be set in the script’s environment so that dynamic decisions can be taken with the task’s result as input. Native Java objects can be used in a Groovy script.

Decide whether to keep looping or not based on the task’s result:
loop = result

Similarly to how parameters are passed through the result variable to the script, the script needs to define variables specific to each action to determine what action the script will lead to.

  • A replicate control flow action needs to define how many parallel runs will be executed, by defining the variable runs:

// assuming result is a java.lang.Integer
runs = result % 4 + 1

The assigned value needs be a strictly positive integer.

  • An if control flow action needs to determine whether the if or the else branch is selected, it does this by defining the boolean variable branch:

// assuming result is a java.lang.Integer
if (result % 2) {
  branch = "if"
} else {
  branch = "else"
}

The assigned value needs to be the string value if or else.

  • The loop control flow action requires setting the loop, which will determine whether looping to the statically defined target is performed, or if the normal flow is executed as would a continue instruction do in a programming language:

loop = result as Boolean

The assigned value needs to be a boolean.

Failure to set the required variables or to provide a valid control flow script will not be treated gracefully and will result in the failure of the task.

5.11. Loop and Replicate awareness

When Control Flow actions such as replicate or loop are performed, some tasks are replicated. To be able to identify replicated tasks uniquely, each replicated task has an iteration index, replication index, and a tag. In addition to help to identify uniquely, these tags are useful to filter tasks by iterations for example.

5.11.1. Task name

First, those indexes are reflected inside the names of the tasks themselves. Indeed, task names must be unique inside a job. The indexes are added to the original task name as a suffix, separated by a special character.

  • If a task named T is replicated after a loop action, the newly created tasks will be named T#1, T#2, etc. The number following the # separator character represents the iteration index.

  • The same scheme is used upon replicate actions: newly created tasks are named T*1, T*2, and so on. The number following the * separator represents the replication index.|

  • When combining both of the above, the resulting task names are of the form: T#1*1, T#2*4, etc., in that precise order.

5.11.2. Task tag

Tags are assigned automatically by the scheduler when a task is created by replication from another task. They are designed to reflect the task that initiated the replication for the first time, the type of replication (loop or replicate) and the iteration index or replication indexes. So the tag is formed like this: (LOOP|REPLICATE)-Name_of_the_initiator-index.

  • If the task T1 initiates a loop that contains the tasks T2 and T3, then the tasks T2#1 and T3#1 will have the tag LOOP-T1-1. The tasks T2#2 and T3#2 will have the tag LOOP-T1-3.

  • If the loop is a cron loop, the index is replaced by the resolved time of the initiated looping. For example, in a cron loop that was initiated the 21/12/2015 at 14h00, the task T1#1 will have the tag LOOP-T1#1-21_12_15_14_00.

  • If the task T1 replicates a block that contains the tasks T2 and T3, then the tasks T2*1 and T3*1 will have the tag REPLICATE-T1-1. The tasks T2*2 and T3*2 will have the tag REPLICATE-T1-2.

  • If the task T1#1, inside a loop, replicates tasks, the new tasks will have the tags REPLICATE-T1#1-1, REPLICATE-T1#1-2, etc…​

  • If the replicated task T1*1 initiates a loop inside a replicate, the new created tasks will have the tags LOOP-T1*1-1, LOOP-T1*1-2, etc…​

5.11.3. Task definition

Those indexes are also available as workflow variables. They can be obtained using the variable names:

  • PA_TASK_REPLICATION

  • PA_TASK_ITERATION

Here is how to access those variables:

  • Native Executable arguments:

<staticCommand value="/path/to/bin.sh">
  <arguments>
    <argument value="/some/path/${PA_TASK_ITERATION}/${PA_TASK_REPLICATION}.dat" />
  • Dataspace input and output:

<task name="t1" retries="2">
  <inputFiles>
    <files includes="foo_${PA_TASK_ITERATION}_${PA_TASK_REPLICATION}.dat" accessMode="transferFromInputSpace"/>
  </inputFiles><outputFiles>
    <files includes="bar_${PA_TASK_ITERATION}_${PA_TASK_REPLICATION}.res" accessMode="transferToOutputSpace"/>
Scripts affected by the variable substitutions are: Pre, Post, Control Flow. No substitution will occur in selection scripts or clean scripts.

5.11.4. Task executable

The iteration and replication indexes are available inside the executables launched by tasks.

In script tasks, the indexes are exported through the following workflow variables: PA_TASK_ITERATION and PA_TASK_REPLICATION.

int it  = variables.get("PA_TASK_ITERATION")
int rep = variables.get("PA_TASK_REPLICATION")

In a similar fashion, environment variables are set when launching a native executable: PAS_TASK_ITERATION and PAS_TASK_REPLICATION:

#!/bin/sh
myApp.sh /path/to/file/${variables_PAS_TASK_ITERATION}.dat

5.12. Example: Embarrassingly Parallel problem

An Embarrassingly Parallel problem is a problem that is easy to split into smaller independent tasks. With ProActive Scheduler you can tackle this type of problem with the Replicate construct.

Familiar with MapReduce? Well, a workflow using replication uses similar concepts.

The Advanced workflows is an example of an embarrassingly parallel problem where the computation is easily distributed across ProActive Nodes.

5.13. Example: MPI application

MPI is often used in the area of parallel computing. The Scheduler integrates with MPI with the concept of multi node tasks. This particular task will acquire several nodes and will expose these nodes to the MPI environment.

Applications built with MPI are often executed using the mpirun command. It accepts several parameters to choose how many CPUs and how many machines will be used. One particular parameter is the machine file that is built by the Scheduler when using multi node tasks. Here is how an MPI application invocation would look like when executed with the Scheduler:

mpirun -hostfile $variables_PA_NODESFILE myApplication

The variable $variables_PA_NODESFILE contains the path to a machine file created by the Scheduler that will be similar to:

compute-host
compute-host
another-host

Meaning that myApplication will use 2 CPUs on compute-host and 1 CPU on another-host. You can also use $variables_PA_NODESNUMBER to retrieve the number of acquired nodes.

You can find an example of a native task in this XML workflow or you can also build one yourself using the Workflow Studio.

To achieve the best performance for MPI applications, nodes can be selected taking into account their network topology. One might want to select nodes that are as close to each other as possible to obtain better network performance or nodes on different hosts to split the I/O load. Refer to Topology Types for more details.

5.14. Example: Install specific software tools and libraries before launching tasks

Some workflow tasks require the prior installation of specific software tools and libraries that are necessary to properly execute these tasks. ProActive offers two methods to enable the installation of such tools and libraries:

  1. By using software containers encapsulating the required libraries and tools. ProActive enables then the execution of computational tasks in containers such as Docker, Podman or Singularity containers. See Section Fork environment for more details about executing tasks inside software containers.

  2. By using pre-scripts, i.e., scripts that are executed just before the considered computational task, and on the same compute resource. Such pre-scripts implement the instructions for installing and configuring the required tools and libraries. ProActive solution allows writing pre-scripts in various languages. These include languages ​​for operating systems (shell, batch, powershell, etc.), scripting languages ​​(groovy, javascript, etc.) and programming languages (python, java, etc.). See Section Pre, Post & Clean Scripts for more details about pre-scripts.

6. Workflow Execution Control

ProActive Scheduler supports portable script execution through the JSR-223 Java Scripting capabilities. Scripts can be written in any language supported by the underlying Java Runtime Environment.

They are used in the ProActive Scheduler to:

  • Execute some simple pre, post and cleaning processing (pre scripts, post scripts and cleaning scripts)

  • Select among available resources the node that suits the execution (selection scripts).

6.1. Selection of ProActive Nodes

A selection script provides an ability for the Scheduler to execute tasks on particular ProActive nodes. E.g. you can specify that a task must be executed on a Unix/Linux system.

A selection script is always executed before the task itself on any candidate node: the execution of a selection script must set the boolean variable selected, that indicates if the candidate node is suitable for the execution of the associated task.

A Java helper org.ow2.proactive.scripting.helper.selection.SelectionUtils is provided for allowing user to simply make some kind of selections. Some script samples are available in PROACTIVE_HOME/samples/scripts/selection.

The following example selects only nodes running on Windows:

importClass(org.ow2.proactive.scripting.helper.selection.SelectionUtils);

/* Check if OS name is Windows */
if (SelectionUtils.checkOSName("windows")) {
    selected = true;
} else {
    selected = false;
}

You can use variables inside the selection script, these variables must be visible within the task context, i.e., declared on job scope, on task scope, or result variable from a previous task. In the last case, it is mandatory that the task which declares the inherited variable precedes the task using the variable. Below is an example job in XML format which uses a task variable to select Unix/Linux as operating system. In this example, variable operating_system (task scope) resolves to variable operating_system_workflow (job scope).

<?xml version="1.0" encoding="UTF-8"?>
<job
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:proactive:jobdescriptor:3.10"
     xsi:schemaLocation="urn:proactive:jobdescriptor:3.10 https://www.activeeon.com/public_content/schemas/proactive/jobdescriptor/3.10/schedulerjob.xsd"
    name="variable solving"
    priority="normal"
    onTaskError="continueJobExecution"
     maxNumberOfExecution="2"
>
  <variables>
    <variable name="operating_system_workflow" value="linux" model=""/>
  </variables>
  <taskFlow>
    <task name="Groovy_Task0">
      <description>
        <![CDATA[ The simplest task, ran by a groovy engine. ]]>
      </description>
      <variables>
        <variable name="operating_system" value="$operating_system_workflow" inherited="false" />
      </variables>
      <scriptExecutable>
        <script>
          <code language="groovy">
            <![CDATA[
                import org.ow2.proactive.scripting.helper.selection.SelectionUtils
                selected = SelectionUtils.checkOSName(variables.get("operating_system"))
            ]]>
          </code>
        </script>
      </scriptExecutable>
    </task>
  </taskFlow>
</job>

6.2. Fork environment

A fork environment is a new forked Java Virtual Machine (JVM) which is started exclusively to execute a task. For example, this task may need to use some third-party libraries, so it can be executed properly. In this example, the list of path elements to be added when starting a new JVM will be specified inside the Additional Classpath field. Starting a new JVM means that the task inside it will run in a dedicated, configurable environment.

6.2.1. Fork environment modes

The fork environment can be activated through one of these four executions modes:

  • Standard forked mode,

  • Fork mode with customized configuration,

  • Run as me mode,

  • Fork mode with docker.

fork environment

By default, all workflow tasks are executed in a standard fork mode, that is, executed in a forked JVM separated from the ProActive Node’s JVM. This behavior can be controlled globally at the ProActive server level by modifying the global setting pa.scheduler.task.fork in the scheduler configuration file and also locally at the task level via the optional checkbox property Fork.

  • In case the server is configured in non-fork mode (i.e. pa.scheduler.task.fork = false), the checkbox property Fork is unchecked and uneditable and all the fields under Fork Environment are uneditable.

  • In case the server is configured in a fork mode (i.e. pa.scheduler.task.fork = true), the checkbox property Fork is checked and uneditable and all the fields under Fork Environment are editable.

  • In case the global setting is not set (i.e. comment out the related configuration through #pa.scheduler.task.fork = true), each task can specify its execution mode through the checkbox property Fork. If the checkbox property Fork is unchecked, then all the other fields under Fork Environment are disabled.

6.2.2. Fork environment customization

The fork execution environment mechanism can be customized by writing a fork environment script. This script contains a specific script binding called forkEnvironment containing a ForkEnvironment java object giving access to various configuration settings.

For example, the following fork environment groovy script can be written to change the working directory of the task, and the path to the java executable:

forkEnvironment.setWorkingDir("/data/folder")
forkEnvironment.setJavaHome("/usr/java/default")

6.2.3. Environment cleanup

After a task is executed, an internal mechanism called the Process Tree Killer ensures that all children processes started by the task are automatically terminated. This mechanism is explained in details in the Task Termination Behavior section.

The Process Tree Killer mechanism can be disabled using the DISABLE_PTK generic information.

6.2.4. Docker fork execution environment

A Docker fork execution environment is a specific usage of the fork environment mechanism. In that case, the ProActive Node’s JVM starts a docker container which will run the forked JVM described above. This forked JVM can be customized by the JVM settings and by the tool-set and isolation provided by Docker containers.

If your task needs to install new software or updates software, then run it inside a Docker container. That way, other tasks will not be affected by changes, because Docker containers provide isolation so that the host machine’s software stays the same.
The simplest way to use a Docker fork execution environment

From the Studio Portal, when editing a workflow task, open the Fork Environment section, and select Docker in the Fork Execution Environment Dropdown. A sample Docker environment script will appear.

These settings ensure, that the current task is executed inside a Docker container, or fail if Docker is not installed or the user has no permission to run Docker containers.

If the executing Node has no Docker installed, the task will fail. A selection script can ensure that tasks are executed only on Nodes which have Docker installed.
Docker fork environment script

The default Docker fork environment script contains the following groovy code:

// This script creates a docker fork environment using java as docker image

// If used on windows:
//  - currently, only linux containers are supported
//  - make sure the drives containing the scheduler installation and TEMP folders are shared with docker containers
//  - the container used must have java installed by default in the /usr folder. Change the value of the java home parameter to use a different installation path
// On linux, the java installation used by the ProActive Node will be also used inside the container

// Prepare Docker parameters
import org.ow2.proactive.utils.OperatingSystem;
import org.ow2.proactive.utils.OperatingSystemFamily;

dockerImageName = "java"
cmd = []
cmd.add("docker")
cmd.add("run")
cmd.add("--rm")
cmd.add("--env")
cmd.add("HOME=/tmp")

String osName = System.getProperty("os.name");
println "Operating system : " + osName;
OperatingSystem operatingSystem = OperatingSystem.resolveOrError(osName);
OperatingSystemFamily family = operatingSystem.getFamily();

switch (family) {
    case OperatingSystemFamily.WINDOWS:
    	isWindows = true;
    	break;
    default:
        isWindows = false;
}
forkEnvironment.setDockerWindowsToLinux(isWindows)

// Prepare ProActive home volume
paHomeHost = variables.get("PA_SCHEDULER_HOME")
paHomeContainer = (isWindows ? forkEnvironment.convertToLinuxPath(variables.get("PA_SCHEDULER_HOME")) : variables.get("PA_SCHEDULER_HOME"))
cmd.add("-v")
cmd.add(paHomeHost + ":" + paHomeContainer)
// Prepare working directory (For Dataspaces and serialized task file)
workspaceHost = localspace
workspaceContainer = (isWindows ? forkEnvironment.convertToLinuxPath(workspaceHost) : workspaceHost)
cmd.add("-v")
cmd.add(workspaceHost + ":" + workspaceContainer)

cachespaceHost = cachespace
cachespaceContainer = (isWindows ? forkEnvironment.convertToLinuxPath(cachespaceHost) : cachespaceHost)
cmd.add("-v")
cmd.add(cachespaceHost + ":" + cachespaceContainer)

if (!isWindows) {
     // when not on windows, mount and use the current JRE
    currentJavaHome = System.getProperty("java.home")
	forkEnvironment.setJavaHome(currentJavaHome)
    cmd.add("-v")
    cmd.add(currentJavaHome + ":" + currentJavaHome)
}

// Prepare container working directory
cmd.add("-w")
cmd.add(workspaceContainer)

if (isWindows) {
    // linux on windows does not allow sharing identities (such as AD identities)
} else {
    sigar = new org.hyperic.sigar.Sigar()
    try {
        pid = sigar.getPid()
        creds = sigar.getProcCred(pid)
        uid = creds.getUid()
        gid = creds.getGid()
        cmd.add("--user=" + uid + ":" + gid)
    } catch (Exception e) {
        println "Cannot retrieve user or group id : " + e.getMessage()
    } finally {
        sigar.close()
    }
}
cmd.add(dockerImageName)

forkEnvironment.setPreJavaCommand(cmd)

// Show the generated command
println "DOCKER COMMAND : " + forkEnvironment.getPreJavaCommand()

This script exports a variable composed of a list of command + arguments via the forkEnvironment.getPreJavaCommand function, which is picked up by the Node executing the Task. This variable is supposed to be a Docker run command inside a string. This string will be split by space and added in front of the fork execution command. As an example:

/usr/bin/java -cp [Classpath] [ExecutedClass]

And the Docker run command is:

docker run java

Then both will be combined to:

 docker run java /usr/bin/java -cp [Classpath] [ExecutedClass]

The combined command will execute a JVM inside a docker container. Internally the command is split into docker, run, java, /usr/bin/java, -cp, [Classpath], [ExecutedClass]. That is important to know because the fork execution command is split by spaces. That means, that paths can’t contain spaces. The Docker container mounts the ProActive home and a folder in the temp directory of the operating system. Those cannot contain spaces, because if they do then the fork execution command will be split wrong and fail.

6.3. Pre, Post & Clean Scripts

Another functionality is the possibility to define pre and post scripts. For a given task (Java, Script or Native task), it is possible to launch a script before and after its execution. This possibility can be useful to copy files to a node, or clean a directory before or after task execution, for example.

This is a way to separate from business code the preparation of execution environment and its cleaning. A good example is a script that removes files from a specific directory once the task is finished.

Clean script example
Files.deleteIfExists(new File("/tmp/working_dir/").toPath())

6.3.1. File transfer

Input/Output Data in Shared Directory

The easiest way to use your data on computing machines is to set up a shared directory available on all Compute Hosts. For Linux it’s typically an NFS directory, for Windows it could be an SMB network share. Then, in tasks you can manipulate files seamlessly across ProActive Nodes.

File inputData = new File("/data/username/inputfile");

This approach has some limitations though:

  • The path to the shared folder should be identical on all Compute Hosts

  • It can be difficult to use with heterogeneous resources, e.g. Linux and Windows hosts

6.3.2. Save script

Save the pre-script in a file and skip its execution

Instead of executing a task pre-script, we also provide a way to save the pre-script into a file and skip its execution. To do so, a Generic Information is required for the desired task, with the key PRE_SCRIPT_AS_FILE, and the path of the file that you want to save your pre-script as its value.

The path should contain a file name. If it is an absolute path, then the file will be stored in this absolute path. If it is a relative path, then the file will be stored in the Local Space.

If you don’t give a specific extension in the path, the extension will be automatically assigned to the corresponding one of the language selected for this pre-script.

6.4. Run Computation with a user’s system account

When workflow tasks are executed inside a ProActive Node, they run by default under the system account used to start the Node.

The RunAsMe mode, also called user impersonation, allows to start a workflow Task either:

  • under the job owner system account (default RunAsMe).

  • under a specific system account (RunAsUser).

The RunAsMe mode is defined in the XML representation of a Task:

<task name="Linux_Bash_Task"
    fork="true"
    runAsMe="true" >
      <description>
        <![CDATA[ A task, ran by a bash engine. ]]>
      </description>
      <scriptExecutable>
        <script>
          <code language="bash">
            <![CDATA[
whoami
]]>
          </code>
        </script>
      </scriptExecutable>
    </task>

In the ProActive Studio, the RunAsMe mode is available in the Task Properties Fork Environment section.

RunAsMe in Studio

In order to impersonate a user, RunAsMe can use one of the following methods:

  • PWD: impersonate a user with a login name and password (this is the default, it is also the only mode available on Windows Nodes).

  • KEY: impersonate a user with a login name and SSH private key.

  • NONE: impersonate a user with a login name and sudo.

The above modes are defined by the ProActive server administrator when ProActive Nodes are deployed. It is possible to define for each Node a different RunAsMe mode. The underlying operating system need to be configured appropriately in order to impersonate a task.

See RunAsMe section in the administration guide for further information.

The default mode for RunAsMe is to impersonate as the current user with the user login and password. Accordingly, the user’s login and password inside ProActive must match the login and password of the user in the target machine’s operating system.

It is possible though to impersonate through a different user, using a different password, or even using a different impersonation method as the one configured by default.

This possibility is given by adding Generic Information described in the below table to the workflow task. Each of these generic information can be defined both at task-level or job-level. When at job-level, it will apply to all tasks of this workflow with RunAsMe enabled. Other tasks without RunAsMe enabled will ignore these generic information.

Table 7. RunAsMe Generic Information
Name Description Example value

RUNAS_METHOD

Allows overriding the impersonation method used when executing the task. Can be pwd, key or none.

pwd

RUNAS_USER

Allows overriding the login name used during the impersonation. This allows to run a task under a different user as the user who submitted the workflow.

bob

RUNAS_DOMAIN

Allows defining or overriding a user domain that will be attached to the impersonated user. User domains are only used on Windows operating systems.

MyOrganisation

RUNAS_PWD

Allows overriding the password attached to the impersonated user. This can be used only when the impersonation method is set to pwd.

MyPassword

RUNAS_PWD_CRED

Similar to RUNAS_PWD but the password will be defined inside Third-party credentials instead of inlined in the workflow. This method of defining the password should be preferred to RUNAS_PWD for security reasons. The value of RUNAS_PWD_CRED must be the third-party credential name containing the user password.

MyPasswordCredName

RUNAS_SSH_KEY_CRED

Allows overriding the SSH private key attached to the impersonated user. This can be used only when the impersonation method is set to key. The private key will be defined inside Third-party credentials instead of inlined in the workflow. The value of RUNAS_SSH_KEY_CRED must be the third-party credential name containing the SSH key.

MySSHKeyCredName

6.5. Reserve more than one node for a task execution

To create a multi-nodes execution (often used for MPI applications), you need to add a parallel environment to the task definition. Parallel environment describes how many nodes are needed for a task execution as well as where these nodes should be located. Thus, in addition to the number of ProActive Nodes, you can specify the nodes network topology, e.g the set of nodes within some latency threshold, nodes on the same host and many more.

Here is the list of topologies we support:

  • Arbitrary topology does not imply any restrictions on nodes location

  • Best proximity - the set of closest nodes among those which are not executing other tasks.

  • Threshold proximity - the set of nodes within a threshold proximity (in microseconds).

  • Single Host - the set of nodes on a single host.

  • Single Host Exclusive - the set of nodes of a single host exclusively. The host with selected nodes will be reserved for the user.

  • Multiple Hosts Exclusive - the set of nodes filled in multiple hosts. Hosts with selected nodes will be reserved for the user.

  • Different Hosts Exclusive - the set of nodes each from a different host. All hosts with selected nodes will be reserved for the user.

For instance, if a user wants to run 4 processes in parallel inside the scope of one task and in the same compute host, then she/he should specify that in the Multi-Nodes Execution section as shown in the following screenshot.

multi nodes execution

In the Job xml schema, the multi-nodes execution is available at the parallel tag

<parallel numberOfNodes="4">
        <topology>
          <singleHostExclusive/>
        </topology>
</parallel>

6.6. Handling failures

It’s possible to have errors when your Job is executed. ProActive Scheduler provides several mechanisms to deal with exceptional situations in your Jobs. If a Task contains errors during Job execution, the ProActive Scheduler automatically restarts this Task. The number of automatic restarts is defined by the Workflow owner by assigning an integer value to the counter Number of Automatic Restarts. The user also defines Where he wants that the scheduler restarts In-Error tasks, i.e., on the same ProActive Node or on another ProActive Node.

If automatic restarts does not fix errors, the Scheduler applies one of the following Task error management policies, i.e., the one defined by the user during Workflow creation:

  • Ignore error and continue Job execution - each In-Error Task inside the Job transits to Faulty and its successor Tasks are executed.

  • Only suspend dependencies of In-Error Tasks - successor Tasks of each In-Error Task are Pending until errors are fixed. The Job is in In-Error status. The user can restart In-Error Tasks as many times as he wishes.

  • Pause Job execution (running Tasks can terminate) - successor Tasks of each In-Error Task are paused until errors are fixed. The Job is paused. The user can restart In-Error Tasks as many times as he wishes.

  • Kill Job (running Tasks are killed) - The Job is killed with all Tasks.

6.6.1. Maximum execution time for a task

A timeout (also known as walltime) can be set at the task’s level to stop a task’s execution when the timeout is reached.

This timeout is defined as an attribute of the task element, for example:

<task name="Task_Walltime"
    fork="true"
    walltime="12:00">
...
</task>

The general format of the walltime attribute is [hh:mm:ss], where h is hour, m is minute and s is second. The format still allows for more flexibility. We can define the walltime simply as 5 which corresponds to 5 seconds, 10 is 10 seconds, 4:10 is 4 minutes and 10 seconds, and so on.

The walltime mechanism is started just before a task is launched. If a task does finish before its walltime, the mechanism is canceled. Otherwise, the task is terminated. Note that the tasks are terminated without any prior notice.

The walltime can also be configured using the WALLTIME Generic Information. The syntax of this generic information is similar to the task walltime attribute. The interest of using a generic information rather than a task attribute is that the WALLTIME generic information, if defined at job-level, can be applied to all tasks of a workflow.

When used at job-level, the configured walltime will not be applied to the workflow globally but to each individual task of the workflow.

For example, if the walltime is configured at job-level to be ten minutes, each task of the workflow can run no more than ten minutes, but the workflow itself has no time limitation.

Finally, an administrator can configure the walltime globally at the scheduler server level. To do so, in PROACTIVE_HOME/config/scheduler/settings.ini, the property pa.scheduler.task.walltime must be defined with a global walltime.

For example, to define a global walltime of one hour:

pa.scheduler.task.walltime=1:00:00

As the walltime can be configured in multiple ways, an order or priority applies to the various configuration (from highest to lowest priority):

  1. Walltime configured as a task attribute

  2. Walltime configured as generic information

  3. Walltime configured globally at the scheduler-level.

This order of priority ensures that a user can easily override a larger scope configuration.

6.7. Cron Workflows

The ProActive Scheduler allows to schedule Workflows with the Job Planner. Find more information in the Job Planner User Guide.

6.8. Cron Tasks

ProActive Scheduler supports the execution of time-based tasks or cron tasks. The users can assign a cron expression to the loop variable of a Loop in a ProActive Job. All tasks which belong that Loop will be executed iteratively and indefinitely. The start time of the next iteration is the next point of time specified by the cron expression.

6.8.1. Cron Expression Syntax

The syntax of the follows the UNIX crontab pattern, a string consists of five space-separated segments. Please refer cron4j documentation for further details.

6.8.2. Setting up Cron Tasks

Setting up a cron task requires two steps:

  • Define a Loop with the desired cron tasks

  • Assign a cron expression as the loop variable value

Example that prints Hello every 10 minutes:

<job name=”cron_job”>
     <genericInformation>
        <info name="START_AT" value="2014-01-01T00:00:00+01:00"/>
    </genericInformation>
    <task name=”cron_task”>
        <scriptExecutable>
            <script>
                <code language="javascript">
                    print("Hello");
                </code>
            </script>
        </scriptExecutable>
        <controlFlow>
            <loop target=”cron_task”>
                <script>
                    loop = “10 * * * *”
                </script>
            <loop>
        </controlFlow>
    </task>
</job>

You can find a complete example running a simple task every minutes in this XML workflow.

6.8.3. Notes

  • The execution of the first iteration occurs immediately and does not depend on the cron expression specified. However the user can specify a start time for the first iteration using START_AT generic information. Its value should be ISO 8601 compliant.

  • Each iteration yields one or more task results and the user can query the job status and task status to retrieve information of each iteration.

  • It is assumed that execution time of each iteration is less that time gap specified by the cron expression. If the execution time of an iteration is longer, the execution of the next iteration will occur at next point of time with respect to the current time. For instance it is possible to observe lesser number of iteration executed within a certain time period than number of expected iterations, if some iterations had take longer finish time.

  • If the execution of the task defining the loop (where the <controlFlow> block is defined) fails, the cron execution is terminated.

  • ProActive Scheduler executes any cron task indefinitely. Therefore the status of ProActive Job will remain RUNNING unless the cron task execution terminates either by the user (for instance by killing the job or the task) or due an error in the task execution.

6.9. Remote Visualization

Using the Scheduler Web Interface, it is possible to access the display of the ProActive Node running a given task. First remote visualization must be activated via the Scheduler configuration files (set novnc.enabled to true).

Remote visualization works via VNC where the REST API will connect to the node running the task, i.e a graphical application. The VNC output will be forwarded to the your browser and you will be able to interact with the application.

The task that you want to visualize must output a special string using the following format: PA_REMOTE_CONNECTION;JOB_ID;TASK_ID;vnc;HOSTNAME:PORT where JOB_ID must be the current job id, TASK_ID the current task id and HOSTNAME:PORT the hostname and port where the VNC server runs.

It is the task’s responsibility to print the string. Here is a sample script that starts a Xvnc server and runs a graphical application. It can be used in a native task for instance.

In the Scheduler Web Interface, select the running job and use the Preview tab to enable remote visualization.

Since it runs a graphical application, the task will never exit and run forever. The common use case is that you want to check manually some results via the graphical application and then exit the application. The task will then terminate normally.

6.10. Troubleshoot a Workflow Execution

When a task has IN_ERROR status, a right click on it allows the user to select Mark as finished and resume. This will set the task status as FINISHED and then resume job execution: this task won’t be executed again, subsequent tasks will execute as if the task executed correctly. This allows the user to manually perform actions to the IN_ERROR task, without cancelling the whole job execution.

If the task has not IN_ERROR status, the Mark as finished and resume option will be in grey, meaning that the action is disabled.

6.11. Get Notifications on Job Events

Add a "NOTIFICATION_EVENTS" Generic information in the workflow level, to list events on which a user wants to be notified. List is comma-separated and should be taken from(events are not case sensitive):

  • Job change priority (or JOB_CHANGE_PRIORITY)

  • Job In-Error (or JOB_IN_ERROR)

  • Job paused (or JOB_PAUSED)

  • Job pending to finished (or JOB_PENDING_TO_FINISHED)

  • Job pending to running (or JOB_PENDING_TO_RUNNING)

  • Job restarted from error (or JOB_RESTARTED_FROM_ERROR)

  • Job resumed (or JOB_RESUMED)

  • Job running to finished (or JOB_RUNNING_TO_FINISHED)

  • Job submitted (or JOB_SUBMITTED)

  • Job running to finished with errors (or JOB_RUNNING_TO_FINISHED_WITH_ERRORS)

  • Job aborted (or JOB_ABORTED)

Scheduler can send email to the users when one of the above events is received.

If a user wants to be notified by all these events, the "NOTIFICATION_EVENTS" Generic information should have "All" (not case sensitive) as its value.

Notifications will only be sent for jobs whose generic information (workflow level) contains user’s email address(es) under the "EMAIL" key, if multiple email addresses need to be used, they should be separated by comma. Example:

<genericInformation>
    <info name="EMAIL" value="user0@example.com, user1@example.com"/>
    <info name="NOTIFICATION_EVENTS" value="Job paused, Job resumed"/>
</genericInformation>

To enable this functionality, the configuration need to be done following the steps introduced in Get Notification on Job Events Configuration.

6.12. Chaining Workflows: Submit a Workflow from another Workflow

In ProActive Studio, the SubmitJobNoWait and SubmitJobAndWait templates under the Controls menu allow you to submit a workflow from another workflow using the schedulerapi binding.

These templates accept the single parameter:

  • called_workflow: a ProActive Catalog path where the syntax is checked by the PA:CATALOG_OBJECT model. The path must be a valid expression that matches the following pattern: bucketName/workflowName[/revision] referencing the workflow to submit from the Catalog (e.g. basic-examples/Native_Task). In case one wants to promote the variable called_workflow at the level of the Workflow in order to be able to select the submitted Workflow at submission time, he just has to duplicate the variable at the Workflow level and mark the one at the Task level as inherited.

By default, these templates do not pass any parameter to the submitted workflow (using only default values provided by the workflow).

It is possible to modify this behavior by changing the SubmitJobNoWait or SubmitJobAndWait implementation.

For example, the following line can be replaced:

workflow_variables = Collections.EMPTY_MAP

by

workflow_variables = [firstName:'John', lastName:'Doe']

To provide the firstName and lastName variables parameters to the submitted workflows.

The SubmitJobNoWait or SubmitJobAndWait templates behavior is different in the sense that the latter will wait until the submitted workflow terminates.

In case an error occurs in the chained jobs, a user may want to escalate this error from the submitted (child) job to the parent job by throwing an exception. It is possible to integrate this behavior in the SubmitJobAndWait implementation using the schedulerapi binding as described in the code below.

jobid = variables.get("jobID")
jobState = schedulerapi.getJobState(jobid)
jobStatus = jobState.getStatus().toString().toUpperCase()
isJobSuccessful = !(jobStatus.equals("FAILED") || jobStatus.equals("CANCELED") || jobStatus.equals("KILLED") || (jobStatus.equals("FINISHED") && jobState.getNumberOfFaultyTasks() > 0))
if (isJobSuccessful) {
    println("Submitted job id=" + jobid + " (" + variables.get("called_workflow") + ") has been successfully run")
} else {
    throw new IllegalStateException("Submitted job id=" + jobid + " (" + variables.get("called_workflow") + ") has failed. The error is escalated to the parent job.")
}

6.13. Control and Validation

In order to better control your workflows, two ProActive Studio templates Web Validation and Email Validation are provided under the Manuals menu. These templates aim at the validating part of your workflows manually. It allows to suspend the execution of the validation task dependencies and set the job as Paused. You can put your validation description message under the MESSAGE variable in Web Validation template, and under notification_message variable in Email Validation template. Three options are provided, the Yes option indicates that you want to validate the workflow and continue the execution of the rest tasks; the Maybe option means that you do not know what to do for the moment; the No option denotes that you do not want to validate the workflow, and the remaining of the workflow will be paused.

In order to validate your workflow using Web Validation, you need to go to the ProActive Notification & Validation portal. The workflow will be validated by clicking on one of the three buttons. By using Email Validation, you will receive an email with three links, the workflow can be validated by clicking on one of the links.

7. Data Spaces

If a shared folder or file system is not an option in your environment, the ProActive Scheduler provides a convenient way to transfer files between server and tasks.

This mechanism is called ProActive Data Spaces.

Data Spaces come into two categories:

  • Server Data Spaces: Global and User Spaces belong to this category. It represents data spaces located on the server side.

  • Node Data Spaces: Local and Cache Spaces belong to this category. It represents data spaces located on the Node or during a Task execution.

7.1. Global and User Spaces

Server Data Spaces have two types of storage on the host where the ProActive server is running:

  • A Global Space where anyone can read/write files

  • An User Space which is a personal user data storage

By default, these spaces are linked to folders in the data directory of the ProActive Scheduler host:

  • PROACTIVE_HOME/data/defaultglobal

  • PROACTIVE_HOME/data/defaultuser

But it can be changed in PROACTIVE_HOME/config/scheduler/settings.ini.

See Data Spaces configuration for further details on how to configure the Global and User spaces.

7.2. Local Space

The Local Space is a Node Data Space created dynamically when a Task is executed.

The Local Space allows to:

  • Transfer files from one of the Server Spaces (Global or User) to the task execution context, before the task starts its execution.

  • Transfer files from the task execution context to one of the Server Spaces (Global or User) at the end of the task execution.

These two types of transfers must be declared in the Task definition in order to occur (transfer is not implicit). You need to define which files have to be transferred from a data space to computing nodes by using input files and reciprocally from computing nodes to a data space by using output files:

<task name="task1">
    ...
    <inputFiles>
        <files includes="tata" accessMode="transferFromGlobalSpace"/>
    </inputFiles>
    <outputFiles>
        <files includes="titi*" accessMode="transferToGlobalSpace"/>
    </outputFiles>
    ...
</task>

Upon the execution of the previous example, ProActive Scheduler automatically transfers all files to a ProActive Node where the task will be executed. The files are put in a special place called Local Space on the ProActive Node. It corresponds to the working directory of a task when the task is in fork mode (default). From a task point of view, files can be accessed normally like shown below:

File inputFile = new File("tata")

Then, based on the previous task definition, all the files whose the name starts with titi (e.g titi, titi2, etc.) produced by task1 will be transferred back automatically to Global Space by the Scheduler once the task is finished.

Files can be also transferred from/to User Space by specifying transferFromUserSpace/transferToUserSpace.

Before running your jobs you should first upload your files into either Global or User Space. To do it you can export folders linked to these spaces by one of the standard protocols like FTP, NFS, or use available web interfaces like Pydio.

It is possible to use wildcards in transfer directives, such as:

<task name="task1">
    ...
    <inputFiles>
        <files includes="**/tata" accessMode="transferFromGlobalSpace"/>
    </inputFiles>
    <outputFiles>
        <files includes="*" accessMode="transferToGlobalSpace"/>
    </outputFiles>
    ...
</task>

More information of pattern syntax can be found in the FileSystem class JDK documentation.

7.3. Cache Space

In addition to transferring to the Task Local Space, it is also possible to transfer files to the Node host Cache Space.

From the Node host point of view, the Cache Space is unique and shared for all tasks and all ProActive Nodes deployed on the host.

Transfers to the cache are not concurrent, only a single task can transfer to the cache at the same time. I.e if multiple tasks are run in parallel on the same hosts and they all transfer to the cache, these transfers will be executed sequentially.

If a file declared for the transfer is newer than the existing version in the cache, the file will be updated.

In order to transfer files to the cache, simply use the cache transfer directives inside your job:

<task name="task1">
    ...
    <inputFiles>
        <files includes="tata" accessMode="cacheFromGlobalSpace"/>
        <files includes="toto" accessMode="cacheFromUserSpace"/>
    </inputFiles>
    ...
</task>

The files will not be transferred to the Local Space of the task, so it will not be possible to access these files from the current directory.

In order to access these files, use the cachespace variable which contains the location of the cache, for example in groovy:

inputFile = new File(cachespace, "tata")

There are no transfer directives to transfer output files from the cache. Output files transfers are only possible from the Local Space.

Files in the cache space are automatically deleted after a given period. The default invalidation period is two weeks, which means that files older than two weeks will be automatically deleted.

In order to change this period, the ProActive Node must be started with the property node.dataspace.cache.invalidation.period (in milliseconds).

The property node.dataspace.cachedir can also be used to control the cache location on the node.

8. Script Languages

8.1. Script Engines Names

This section gathers the available scripts engines and the operating system where they run.

Script engine naming is case-sensitive
Script Engine Name Description Operating System Additional Requirements

shell

An engine able to run several linux shell scripts using the shebang notation.

Linux, Mac

bash

An engine able to run linux bash shell scripts.

Linux, Mac

cmd

An engine able to run Windows CMD scripts.

Windows

powershell

An engine able to run PowerShell scripts.

Windows

Powershell 2.0 Engine and .NET Framework 3.5 must be installed

vbscript

An engine able to run VBScript commands and scripts.

Windows

docker

An engine able to run Docker containers defined as Dockerfile.

Linux, Mac, Windows

Docker must be installed

docker-compose

An engine able to run composite Docker containers defined as Docker-compose description file.

Linux, Mac, Windows

Docker and Docker-Compose must be installed

kubernetes

An engine able to run Kubernetes pods using a kubernetes yaml job description file.

Linux, Mac, Windows

Kubernetes client kubectl must be installed

javascript

An engine able to run Javascript code using the Nashorn JSR223 implementation.

Linux, Mac, Windows

groovy

An engine able to run Groovy scripts.

Linux, Mac, Windows

ruby

An engine able to run JRuby scripts.

Linux, Mac, Windows

python

An engine able to run Jython scripts

Linux, Mac, Windows

cpython

An engine able to run native CPython scripts

Linux, Mac, Windows

CPython and py4j module must be installed

R

An engine able to run R language scripts

Linux, Mac, Windows

R must be installed and the rJava library

perl

An engine able to run Perl scripts

Linux, Mac, Windows

Perl must be installed

php

An engine able to create html content using PHP interpreter

Linux, Mac, Windows

PHP must be installed

scalaw

An engine able to run Scala scripts

Linux, Mac, Windows

8.2. Bash and Shell

The Bash and Shell script engines allows to run linux shell scripts inside ProActive Tasks. The main difference between the two script engines is that the Shell script engine must use the Shebang instruction to define the Shell in use inside the script.

Both engines features the following behaviors:

  • Script bindings are translated as environment variables.

  • Workflow variables cannot be modified by shell scripts except when adding a post script written in another language such as the update_variables_from_file script.

  • The result of the script engine execution corresponds to the return code of the bash or shell process executed.

Example bash script (the shebang notation is omitted):

echo "Hello world"

Example shell script:

#!/bin/ksh
print "Hello world"

Writing Bash or Shell scripts in ProActive Workflows is rather straightforward, but a few considerations need to be taken into account.

The automatic binding mechanism works differently as other scripting language: it makes use of environment variables.

For example, suppose that the workflow variable foo is defined. This workflow variable can be accessed in the script engine through a specific environment variable created by the ProActive Scheduler:

echo "Here is the value of the foo variable:"
echo "$variables_foo"

The Script Bindings Reference chapter shows how various bindings can be used in bash or shell script engines.

In chapter Script variables, we explain how a given task script can modify the variable map and propagate its modifications inside the workflow. This does not apply to shell script engines, i.e. a shell script engine cannot modify the variables map directly.

For example, suppose the initial value of the foo variable is "Old Value" and Task_A contains the script:

echo "Here is the value of the foo variable:"
echo "$variables_foo"
echo "I want to modify the value of the foo variable:"
variables_foo="New Value"

And Task_B a child of Task_A contains:

echo "Here is the value of the foo variable:"
echo "$variables_foo"

Then, Task_B will display

Here is the value of the foo variable:
Old Value

We can see that the execution of Task_A had no impact on the foo variable.

In general, when a modification of a workflow variable is needed, other script engines should be used. In ProActive Catalog, an example groovy post script called update_variables_from_file allows to propagate a variable modification from a shell main Task script using a temporary file:

Task_A main bash script:

echo "Here is the value of the foo variable:"
echo "$variables_foo"
echo "I want to modify the value of the foo variable:"
variables_foo="New Value"
echo "foo=$variables_foo">>$localspace/.variables

Task_A grovy post script (update_variables_from_file)

new File(localspace, ".variables").eachLine { line ->
    keyvalue = line.split("\\s*=\\s*")
    if (keyvalue.length == 2) {
        variables.put(keyvalue[0], keyvalue[1])
    }
}

Task_B a child of Task_A:

echo "Here is the value of the foo variable:"
echo "$variables_foo"

Displays:

Here is the value of the foo variable:
New Value

8.3. Windows Cmd

Windows Cmd script engine behaves similarly as bash script engine, but for Microsoft Windows Operating systems.

It features the following behaviors:

  • Script bindings are translated as environment variables.

  • Workflow variables cannot be modified by Windows Cmd scripts except when adding a post script written in another language such as the update_variables_from_file script (see Bash script language for further explanations).

  • The result of the script engine execution corresponds to the return code of the cmd.exe process executed.

Example variable usage:

echo "Here is the value of the foo variable:"
echo "%variables_foo%"

8.4. VBScript

VBScript engine behaves similarly as Windows CMD but using visual basic syntax instead of batch. It uses the cscript.exe command to execute vbs scripts.

It features the following behaviors:

  • Script bindings are translated as environment variables. They must be accessed inside the script as a process variable.

  • Workflow variables cannot be modified by VB scripts except adding a post script written in another language such as the update_variables_from_file script (see Bash script language for further explanations).

  • The result of the script engine execution corresponds to the return code of the cscript.exe process executed.

Example variable usage:

Set wshShell = CreateObject( "WScript.Shell" )
Set wshProcessEnv = wshShell.Environment( "PROCESS" )
Wscript.Echo "Hello World from Job " & wshProcessEnv( "variables_PA_JOB_ID" )

Any error occurring during the script execution does not imply necessarily that the workflow task running the script will fail. This is due to the cscript command not immediately exiting upon error.

In order to report a failure to the task when running a VB script, the following instruction can be added in the beginning of the script:

On Error Quit 1

Example:

On Error Quit 1
Wscript.Echo "This will produce an error" & (1 / 0)

8.5. Javascript

Javascript engine allows running Javascript scripts inside ProActive Tasks. This engine is based on Nashorn Javascript engine. Nashorn Javascript engine provides Rhino compatibility mode which is described in the following documentation.

Example of Javascript ProActive Task which modifies Job Variable:

var status = variables.get("orderStatus")
if (status == "delivered") {
    status = "returned"
}
variables["orderStatus"] = status
variables.put("otherVar", "someConstValue")

Example of Javascript ProActive Task which modifies Job Variable containing JSON object:

var jsonVariable = JSON.parse(variables.get("jsonVariable")) // you need to parse your variable first
jsonVariable["newVar"] = "newValue" // then you can modify it as regular json
var serializedJsonVariable = JSON.stringify(jsonVariable) // then you need to serialize it to string
variables["jsonVariable"] = serializedJsonVariable // and finally you can overwrite existing variable
You have to use variables["myVarName"] = somevar syntax, instead of variables.put("myVarName", somevar) syntax, for storing the variables if you are using Javascript variables. You still can use variables.put("myVarName", "someConstantValue") syntax for storing constants.

Example of Javascript ProActive Task which returns JSON object as a result:

var jsonObject = {"orderStatus": "done", "customerName": "privatePrivacy"};
var serializedJsonObject = JSON.stringify(jsonObject) // then you need to serialize it to string
result = serializedJsonVariable
To save a JSON object as a variable or as a task result you have to serialize to a string first, by using JSON.stringify()

8.6. Python

We support both Jython and Python Script Engines. Jython is an implementation of the Python programming language designed to run on the Java platform, Jython programs use Java classes instead of Python modules. The advantage of using our Jython Script Engine is that you do not need to do any installation. It includes some modules in the standard Python programming language distribution, but lacking the modules implemented originally in C. Besides, the libraries such as numpy, matplotlib, pandas, etc. are not supported by Jython. And the libraries which depends on numpy such as TensorFlow, PyTorch and Keras etc. are not supported neither.

In order to support native Python, we provide also a Python Script Engine. To use the Python Script Engine, the 'Language' field should be put to 'cpython'. By using Python Script Engine, you can personalize the Python version that you want to use. Since there are many different versions of Python (mainly Python2 and Python3) which are not compatible, ProActive supports all the Python versions (Python2, Python3, etc). By default, the Python used to execute the script is the default Python version on your machine. In order to use another Python version to execute the task, it is required to add a 'PYTHON_COMMAND' Generic Information. Its value should contain the symbolic or absolute path to the desired python command to run (for example 'python3' or '/usr/local/opt/python3/bin/python3.6'). If the Generic Information is put at task level this version of Python will be only used for this task, if it is put in the job level this version of Python will be used for all the tasks in this job.

For every tasks which use the native python script engine:

  • Python must be installed on the ProActive Node which will be used to execute the task.

  • The py4j module must be installed. Please refer to Python Script Engine (Python task) for the introduction about the installation of Python Script Engine.

Here is a workflow example (in xml format) about a simple Python task:

  <taskFlow>
    <task name="Python_Task" >
      <description>
        <![CDATA[ The simplest task, ran by a python engine. ]]>
      </description>
      <genericInformation>
        <info name="PYTHON_COMMAND" value="python3"/>
      </genericInformation>
      <scriptExecutable>
        <script>
          <code language="cpython">
            <![CDATA[
import platform
print("The current version of python you are using is: " + platform.python_version())
print("Hello World")
]]>
          </code>
        </script>
      </scriptExecutable>
    </task>
  </taskFlow>

A jython script engine execution runs in the same Java process as the Task execution. A cpython script engine execution runs inside a separate python process.

8.7. R

ProActive R script engine is based on the Java R Interface. In order to use the R script engine inside a ProActive Node (container which executes a workflow Task), the following prerequisites are needed:

  • A R distribution must be installed.

  • The rJava package must be installed.

  • The R_HOME environment variable needs to be configured, to allow the script engine finding the R distribution.

  • The R_LIBS environment variable might need to be configured if R libraries cannot be found automatically in $R_HOME/library.

The ProActive R script engine works on both Linux and Windows.

Here is an example of R script:

jobName <- variables[['PA_JOB_NAME']]
cat('My job is called', jobName, '\n')

The following paragraphs describes the R script language specific syntaxes.

The progress variable is set as follows (notice the leading dot):

.set_progress(50)

In contrary to other languages such as groovy or jruby, the parent tasks results (results variable) is accessed directly:

print(results[[0]])

Variable affectation can be done via:

variables[["myvar"]] <- "some value"

Access to dataspaces variables is similar to other languages:

print(userspace)
print(globalspace)
print(inputspace)
print(localspace)
print(cachespace)
print(outputspace)

Some internal R types (such as lists, vectors, strings) are automatically converted when stored as a result or in the workflow variable map, but other types such as data.table are not automatically converted. Conversion for these types should be done manually, for example using json serialization or an output file.

Java objects such as fork environment variable, scheduler, userspace or globalspace APIs are not available in R.

8.8. PowerShell

ProActive PowerShell script engine is based on jni4net to call the Powershell API from Java.

It requires that Powershell 2.0 Engine and .NET Framework 3.5 are installed on the relevant machines.

An example of Powershell script:

$variables.Set_Item('myvar', 'value')
$result = Get-Date

In contrary to other languages such as groovy or jruby, the parent tasks results (results variable) is accessed directly:

Write-Output $results[0]

Variable affectation can be done via:

$variables.Set_Item('myvar', 'value')

Internal PowerShell types such as Dates are automatically serialized to an internal format which can be understood by another powershell task, for example in the following two tasks:

Task1:

$result = Get-Date

Task2:

Write-Output $results[0].Day

The second task is able to automatically use the Date object received from the first task.

When an internal PowerShell type needs to be used by another language than PowerShell, a manual conversion such as json must be performed.

8.9. Perl

The Perl script engine features the following behaviors:

  • Script bindings are translated as environment variables.

  • Workflow variables cannot be modified by Perl scripts except when adding a post script written in another language such as the update_variables_from_file script (see Bash script language for further explanations).

  • The result of the script engine execution corresponds to the return code of the perl process executed.

In that sense, the Perl script engine behaves similarly as the Bash or Cmd script engines.

Please refer to Script Bindings Reference for the complete list of bindings accessible through environment variables.

Inside Perl, you can access environment variables using the %ENV hash.

Next examples clarify how script engine bindings can be accessed inside Perl scripts.

A workflow variable can be accessed using system environment variables:

my $jobName= $ENV{"variables_PA_JOB_NAME"};

Similarly, the result of a parent task:

my $parent_task_result= $ENV{"results_0"};

Or another script binding (for example, USERSPACE):

my $USERSPACE= $ENV{"USERSPACE"};

8.10. PHP

The PHP script engine features the following behaviors:

  • Script bindings are translated as environment variables.

  • Workflow variables cannot be modified by PHP scripts except when adding a post script written in another language such as the update_variables_from_file script (see Bash script language for further explanations).

  • PHP script engine is meant to return HTML content and makes use of result metadata through the content.type="text/html" generic information. Using the script engine without defining this generic information in the task is not recommended.

  • The result of the script engine execution is a byte array containing the HTML content produced by the php command. In case the php command returns a non zero value, the result of the script engine execution will be the command exit code and the associated workflow task will be in error.

The php command returns a non-zero exit code only in case of syntax errors. When runtime issues occur, PHP will generally output warnings and the php command will return a zero exit code, not triggering a workflow task failure. This can be modified by using PHP exception mechanism and calling exit(error_code) in case of serious failures.
  • The PHP script engine is not well adapted to pre, post or clean script execution. Even though pre, post of clean scripts can be defined using the PHP script engine, they will not produce any HTML output in these contexts.

Inside PHP, you can access environment variables using the getenv() function.

Next examples clarify how script engine bindings can be accessed inside PHP scripts. Please refer to Script Bindings Reference for the complete list of bindings accessible through environment variables.

A workflow variable can be accessed using system environment variables:

<?php
    echo "<p>Hello World from <b>job ".getenv("variables_PA_JOB_ID")."</b></p>";
?>

Similarly, the result of a parent task:

<?php
    echo "<p>Result of parent task is <b>".getenv("results_0")."</b></p>";
?>

Or another script binding (for example, USERSPACE):

<?php
    echo "<p>Userspace is located at <b>".getenv("USERSPACE")."</b></p>";
?>

8.11. Docker Compose

The execution of a Docker_Compose task requires the installation of both Docker and Docker Compose on the host machine of the ProActive Node. Please refer to the official Docker documentation to see how to install Docker and Docker Compose.

A Docker_Compose task expects the content of a Docker Compose file to be implemented inside the Script editor of the Task Implementation section. You can find out how to write Docker Compose files in the official Docker Compose documentation.

To get started, we present below the content of the Script editor of a simple example of a Docker_Compose task.

helloworld:
    image: busybox
    command: echo "Hello ProActive"

The above example describes a container which is called 'helloworld'. That container is created from a busybox image, which will run the command 'echo "Hello ProActive"'

The Docker_Compose task allows to set parameters to the docker-compose tool with regard to the docker-compose CLI reference.

docker-compose [general parameters] COMMAND [options]

It supports general parameters as well as commands options (we currently only support options for the up command). You can specify these options by supplying a space character separated list in the generic informations.

  • To define a general parameter, use the key docker-compose-options and supply "--verbose" as an example value.

  • To define a docker-compose up option, use the key docker-compose-up-options and supply "--exit-code-from helloworld".

The two latter generic information will be used to generate the following command:

docker-compose --verbose up --exit-code-from helloworld

If splitting by space is prohibitive you can specify the split regex in the generic informations with the key docker-compose-options-split-regex. If you supply e.g. "!SPLIT!" as value, then your docker-compose-up-options will need to look like this: "--option1!SPLIT!--option2".

8.12. Dockerfile

8.12.1. Introduction

The Dockerfile engine allows to define a docker image using the Dockerfile syntax, build this image and run a container instance from it. Once the execution is done, the container is stopped and the built image is deleted. The build, start, stop and, remove actions can be parametrized through advanced command line options which are detailed in the Docker actions section.

The execution of a Dockerfile task requires the installation of Docker engine on the host machine of the ProActive Node. Please refer to the official Docker documentation to see how to install Docker.

To use a Dockerfile task, you have to put the content of a Dockerfile inside the Script editor of the Task Implementation section. You can find out how to write Dockerfile in the official Dockerfile documentation. A Dockerfile task allows executing a succession of docker commands according to the lifecycle of Docker containers. In order, docker build, docker run, docker stop, docker rm and docker rmi are run when a Dockerfile task is executed, this sequence of actions can be configured (see below).

To get started, we present below the content of the Script editor of a simple example of a Dockerfile task.

FROM ubuntu:18.04
RUN echo "Hello ProActive" 
RUN sleep 30

The execution of this example will create an image using the Docker build command and start a container from this image using the specified commands (echo and sleep). At the end, the built image and the started container are deleted.

The created image default tag is image_${PA_JOB_ID}t${PA_TASK_ID} and the default container tag is container_${PA_JOB_ID}t${PA_TASK_ID}.

8.12.2. Docker actions

The default sequence of docker commands (build, run, stop, rm, rmi) can be modified to use the docker script engine in different configurations such as multiple workflow tasks.

For example, an image could be built in a first task, and the same image can be reused in other tasks (and removed eventually in a terminating task):

DockerFile Reuse Image

Another example would be to build an image and start a container in a first task, and then reuse this container in a different task:

DockerFile Reuse Container

These different modes are configured using the docker-actions Generic Information.

For example, docker-actions=build,run allows to build an image, run a container, without performing stop, rm and rmi operations.

docker-actions supports the following commands build, run, exec, stop (includes both stop and rm), rmi.

As the default image and container tag names contains the job id and task id, it is important to use the following generic information when sharing images or containers accross multiple tasks or multiples workflows:

  • docker-image-tag : override default tag identifying the docker image.

  • docker-container-tag : override default tag identifying the docker container.

When starting the docker container in one task and reusing this container in another task, it is necessary to start the container in detached mode (see below for explanation on how to define additionnal options to docker commands).
When a build action is not present in docker-actions, the content of the Dockerfile in the task script editor will be ignored. In that case, the docker-image-tag generic information will be used to identify the image, which must be present in the local docker cache.

8.12.3. Additional options

Parsing options

The following command line options need to be parsed by the Java Virtual Machine before generating the real command which will be executed. The default options parsing uses <space> as separator, but when an individual option contains spaces (such as a character string containg spaces), this could lead to parsing issues. In order to solve these parsing issues, the generic information docker-file-options-split-regex can be used to define a pattern which will be used as options separator. This pattern must comply to the Java Regular Expression syntax and will be given as parameter to the split function when parsing options.

For instance, if the docker-file-options-split-regex generic information is equal to !SPLIT!, then the option /bin/sh!SPLIT!-c!SPLIT!echo 'hello', will be parsed as [/bin/sh, -c, echo 'hello']

Docker build

The Dockerfile task allows to set parameters to the docker build command with regard to the docker-build CLI reference.

docker build [OPTIONS] PATH | URL | -

To define a docker-build option, use the generic information docker-build-options.

For instance, by using the docker-build-options generic information with the value --add-host, a custom host-to-IP mapping will be added to the image.

Docker run

The Dockerfile task allows to set parameters to the docker run command with regard to the docker-run CLI reference.

docker run [OPTIONS] IMAGE[:TAG|@DIGEST] [COMMAND] [ARG...]

To define a docker-run option, use the generic information docker-run-options.

For instance, by using the docker-run-options generic information with the value --detach, the container will be started in detached mode.

use the detached mode uption to run a container and share this running container between multiple workflow tasks.
Docker exec

The docker exec command can be executed optionnally if included in Docker actions.

The Dockerfile task allows to set parameters to the docker exec command with regard to the docker-exec CLI reference.

docker exec [OPTIONS] CONTAINER COMMAND [ARG...]

To define the command and arguments which need to be executed inside a running container, use the generic information docker-exec-command.

For instance by using the docker-exec-command generic information with the value my-program arg1 arg2, the resulting command will be docker exec <container-tag> my-program arg1 arg2.

when command arguments contain spaces, use the generic information docker-file-options-split-regex to define an alternate option parsing pattern. For example: /bin/sh!SPLIT!-c!SPLIT!echo 'hello'

To define a docker-exec option, use the generic information docker-exec-options.

For instance, by using the docker-exec-options generic information with the value --workdir /my/workkdir, the command will be executed with the specified work directory.

Docker stop

The Dockerfile task allows to set parameters to the docker stop command with regard to the docker-stop CLI reference.

docker stop [OPTIONS] CONTAINER [CONTAINER...]

To define a docker-stop option, use the generic information docker-stop-options.

For instance by using the docker-stop-options generic information with the value --time 30, the container will be stopped after 30s.

Docker rm

The Dockerfile task allows to set parameters to the docker rm command with regard to the docker-rm CLI reference.

docker rm [OPTIONS] CONTAINER [CONTAINER...]

To define a docker-rm option, use the generic information docker-rm-options.

For instance by using the docker-rm-options generic information with the value --volumes, the anonymous volumes associated with the container will be automatically deleted after the container is removed.

Docker rmi

The Dockerfile task allows to set parameters to the docker rmi command with regard to the docker-rmi CLI reference.

docker rmi [OPTIONS] IMAGE [IMAGE...]

To define a docker-rmi option, use the generic information docker-rmi-options.

9. Proactive Catalog: Object Storage and Versioning

The ProActive Catalog provides the storage and versioning of Workflows and several other objects inside Buckets. It also features a searching functionality using a GraphQL query language to fetch Workflows based on particular parameters. The Catalog is a RESTful service. A Swagger interface is provided to understand and interact with the service.

This chapter will cover:

  • ProActive Catalog main concepts.

  • How to interact with the Catalog from the various ProActive Workflows and Scheduling Portals.

The Catalog administration guide describe how the catalog can be configured and how access control of buckets can be enforced.

9.1. Bucket concept

A Bucket is a collection of ProActive Objects and in particular ProActive Workflows that can be shared between users.

When the ProActive Scheduler is first started, it contains already a predefined set of buckets, such as:

  • controls (set of workflows containing basic constructs such as branch,loop or replicate)

  • basic_examples (set of Workflows samples publicly readable)

  • notification_tools (set of tasks which can be used to send email or web notifications and/or validations).

  • data_connectors (set of Workflows which can be used to transfer files and data from various sources as explained in the Data Connectors chapter)

Listing the Catalog for existing Buckets is done using the following HTTP request (protocol, port and hostname needs to be adapted accordingly):

GET http://localhost:8080/catalog/buckets

It is also possible to list all objects inside a particular Bucket using its bucketName:

GET http://localhost:8080/catalog/buckets/{bucketName}/resources

Then, we can fetch a particular object based on its name:

GET http://localhost:8080/catalog/buckets/{bucketName}/resources/{name}
By default, the Catalog will always return the latest version of the objects.

9.2. Bucket naming requirements

Every bucket is identified by a unique bucket name. The bucket naming should match the following rules:

  • The bucket name must be between 3 and 63 characters long and can contain only lower-case characters, numbers, and dashes.

  • A bucket name must start with a lowercase letter and cannot terminate with a dash.

9.3. Object naming requirements

Each Catalog object is identified by a unique name. The object naming should respect the following rules:

  • Empty values are not allowed.

  • Must not be any trailing whitespace at the beginning or/and at the end.

  • Accept all characters except forward slash and comma ('/', ',').

9.4. Object versioning

The Catalog provides versioning of objects. You can list all the revisions of a particular object using the HTTP request:

GET http://localhost:8080/catalog/buckets/{bucketName}/resources/{name}/revisions

The revision of objects is identified by its commit_time. Commit time specifies the time when the object’s version was added in the Catalog. You can fetch an object detailed metadata information using its name and commit_time:

GET http://localhost:8080/catalog/buckets/{bucketName}/resources/{name}/revisions/{commit_time}

When an object revision is added to the catalog, depending on its type, it may be parsed by the catalog service to extract some meaningful information.

For example, from a ProActive Workflow, the workflow name, project_name, description, documentation, Generic Information and Workflow variables are extracted.

These information are used in the various portals which interacts with the catalog such as the Workflow Execution Portal.

9.5. Retrieving and searching Objects

9.5.1. Get an Object from the REST API

From REST API it’s possible to retrieve the raw content of a specific revision of an object. For this case it’s required to specify name and commit_time of the object’s revision:

GET http://localhost:8080/catalog/buckets/{bucketName}/resources/{name}/revisions/{commit_time}/raw

In order to get the raw content of a last revision of an object use next REST API endpoint:

GET http://localhost:8080/catalog//buckets/{bucketName}/resources/{name}/raw

In this case you need to provide only the name of an object.

A complete documentation of the Catalog REST API is available on the following link.

9.5.2. GraphQL usage

GraphQL is the standardized query language, which provides opportunity to search and retrieve the Workflow by specified criterion from the Catalog. NOTE: Please follow the graphiql endpoint in order to query for the specific Workflow.

9.6. Catalog Portal

The Catalog Portal provides the capability to store objects like Workflows, Calendars, Scripts, etc. Powerful versioning together with full access control (RBAC) is supported, and users can easily share Workflows, templates, and other objects between teams, and various environments (Dev, Test, Staging, Prod).

catalog portal context menu

9.6.1. Creating new buckets

Create a new bucket easily by clicking on the plus button.

create new bucket

In the Create the new bucket window, it is possible to associate a bucket to a specific users group.

By doing so, only users from the same group will be able to access the content of the bucket.

The content of the buckets associated with the default - no group is accessible by any user, regardless of the groups they belong to. A bucket with no associated group is also known as a public bucket.

More information regarding bucket access control is available in the Catalog administration guide.

9.6.2. Removing Objects Inside a Bucket

Select the objects to remove (CTRL + left click to select multiple objects) and click on the trash button.

All the object’s content will be removed from the database and it cannot be restored.

9.6.3. Exporting/Importing Objects into the Catalog

Catalog Objects can be exported to a local archive, and share them easily by email, cloud, etc. The same archive can be imported again in the Catalog on the same scheduler or in a different one.

Use this feature to easily share Catalog objects across different environments.

To export the selected objects as a ZIP archive, press the download button from portal view.

download catalog objects

To import, either a downloaded catalog archive zip file or a single object, click on the upload button and follow the instructions:

upload catalog objects
  • Object Kind: The kind of object to add. New Kinds can be added by clicking the plus button.

  • Content Type: The Content Type is used to indicate the media type of the resource. New content Types can be added by clicking the plus button.

  • Project Name: The name of the project to which belongs the object. The project name can be provided for all catalog objects (e.g. workflow, script, rule, etc). WARNING: In case of uploading an archive, all the objects in the zip must have the same Kind and Content Type.

9.6.4. Create a zip archive with Catalog objects

When a user creates an archive in order to upload a set of Catalog objects, hidden files should be excluded from this archive.

Create a zip archive with Catalog objects in Mac OS. First, go to the folder with the objects that you want to archive. Then, you can execute the following command to archive the files without the .DS_Store and __MACOSX files:
zip -r ArchiveForCatalog.zip . -x "*.DS_Store" -x "__MACOSX"

9.6.5. Object Versioning in Catalog Portal

The HEAD of a Catalog object corresponds to the last commit and it is displayed on the portal.

In order to see all the commits of the selected Object, please click on the clock button inside the description area.

The commits are shown in chronological order by commit time.

It is possible to restore any previous commit to the HEAD position by clicking the restore button.

revisions catalog

9.6.6. Catalog RBAC management

RBAC, or Role Based Access Control, represents a control model for a system in which each access decision is based on the role to which the user is associated. In the catalog, RBAC operations are only performed by admins or bucket owners. RBAC operations can be distinguished in two types:

  1. Managing an authorization for a bucket

  2. Managing an authorization for a specific object in a bucket

Both bucket and object authorizations can be assigned for either a specific user (using her/his username) or a user group (that contains multiple users). For each specific bucket or object, only one authorization can be assigned to a username or a user group. However, the user can eventually have multiple authorizations due to a mix of username and user-group related authorizations for a bucket or an object. Furthermore, the created authorizations can be modified or deleted.

An authorization gives the specified user(s) some access permission to the specific bucket or object. We provide four access types that can be authorized:

  • Read: The user or user group can only read the bucket and its objects (in case of a bucket authorization) or the object only (in case of a catalog-object authorization) and execute it.

  • Write: The user or user group have, in addition to the read right, the privilege to add new objects in the bucket and to modify or delete one or many.

  • Admin: The user or user group have, in addition to the write rights, the ability to modify a bucket (e.g. change the group) or delete a bucket. Moreover, they can create, modify or delete the authorizations of other users for the bucket or its objects. The same apply in case of a catalog-object admin authorization where the user or group of users, in addition to the write rights, can create, modify or delete a catalog-object authorizations for other users. However, they can not create, edit or delete bucket authorization neither edit nor delete them.

  • No Access: The user or group of users have no access to the bucket and its objects (in case of a bucket authorization) or to a specific object (in case of a catalog-object authorization). This access type is usually used when the admin wants to authorize a user group except for some specific users, or authorize access to a bucket except for some specific objects.

To manage the authorizations, there is a dedicated authorization view for each bucket that is accessible by clicking on the shield-like button above the buckets' list, as shown in the figure below.

OpenAuthorizationView

The same is also available for an object. Moreover, when you select an object, the authorization view button will appear above the objects of the selected bucket (the middle view of the portal). The figure below shows the icon location.

objectAuthorizationView

Once you click on it (bucket or object authorization view button), the authorization view will open. The figure belows shows the authorization view when the admin has clicked on the authorization view of a bucket.

AuthorizationView

The authorization view consists in two parts:

  • First, there are three buttons that allow the admin to refresh the authorizations list, add a new authorization and delete an existing one

  • Second, under the buttons, there is the list that shows the authorizations that are assigned to the current bucket. In the case of a bucket authorization view, the list shows the authorizations created for the bucket and all of its objects. However, in the case when the admin click on the authorization view button above the selected object, the list will show only the authorizations created for the selected object. The figure below shows an example.

objectGrantList

Each authorization has the following properties:

  • Bucket Name

  • Catalog Object (optional): If this attribute is empty, the authorization is automatically considered for the bucket. However, if the admin specifies an object name, the access is created for the specified object.

  • Grantee Type: Group or user

  • Grantee (required): the username or the user group name

  • Access Type: read, write, admin or no access

  • Priority: This attribute is only used for access associated with a group. Thus, if the user belongs to 2 groups, each of which has a different type of access, the system will choose the access with the highest priority for these users.

To create a new authorization, the admin just needs to click on the "+" button. Then the view of creating authorization (as shown in the figure below) will be popped up. The admin needs to fill in the authorization attributes (as presented above), then click the "Add authorization" button. The figure below shows an example of an authorization which authorize the user named olivia the write access to the bucket admin-bucket.

CreateAuthorization

Once the admin has created an authorization, it will appear in the authorizations list in the authorization view, as shown in the figure below.

ListOfGrants

Once the authorization has been created, the admin will be able to visualize a "shield" icon next to bucket or the object to which the authorization was created. The figure below shows an example.

admin grant icon

Moreover, when the user, who is benefiting from the authorization, log in to his/her account, she/he will be able to visualize an icon indicating the rights she/he possess over the bucket or the object. These icons are the following:

Authorrizations icons
  • The first icon indicates that the user has "read" rights over the bucket or the object.

  • The second icon indicates that the user has "write" rights over the bucket or the object.

  • The third icon indicates that the user has "admin" rights over the bucket or the object.

  • The fourth icon appears only on a bucket to which the user has "no access" rights over it, but only when the user also has a read, write or admin rights over at least one object in the bucket.

The figure below shows how the user visualize the authorization icon next to the bucket.

user bucket grant icon

The figure below shows how the user visualize the authorization icon next to the objects.

user object grant icon

If the object does not have an icon next to it (e.g. Native_Task_Windows), it implies that it has not a direct authorization assign to it, and the user will have the same right over it as she/he has over the bucket containing it.

To delete an authorization, the admin simply needs to select one from the list and the delete button will be activated. It is the button with a "bin" symbol next to the add a new authorization button "+". The figure below shows an example.

DeleteGrant

Once the admin delete the authorization, it will be removed from the list as shown in the figure below.

DeletedGrant

To update an authorization, the admin needs to click on the drop-down list of the access type or the priority level and select a new value. The figure below shows an example.

updateAGrant

Once the selection is made the authorization will be updated as shown in the figure below.

updatedGrant

However, when a user has admin rights over a bucket or an object, he/she can not downgrade his/her rights while updating the authorization. Another user, with admin rights over the bucket or the object, can do it for him/her.

9.6.7. Authorization calculation rules

Since each user might belong to multiple user groups, a user could have multiple authorizations over a bucket due to his/her username or user group(s) authorizations. In such case, the resulting access type will be calculated as follows:

  • If the username-assigned authorization exists, it is prioritized and its access type will be the user’s resulting rights over the bucket.

  • If multiple user-groups authorizations exist, without a username authorization, the resulting user’s rights over the bucket will be the access type of the group authorization that have the highest priority.

In the case where a user has multiple authorizations over an object, the resulting access type will be calculated as follows:

  • If the username-assigned authorization for the object exists, it is prioritized and its access type will be the user’s resulting rights over the object.

  • If multiple user-groups authorizations exist for the object, without a username authorization, the resulting user’s rights over the object will be the access type of the group authorization that have the highest priority.

  • If both username-assigned authorizations and user-groups authorizations do not exist for the object, the resulting user’s rights for the object will be the same as the user’s resulting rights over the bucket that contains the object.

10. Notification-service

10.1. Overview

The Notification Portal allows users to subscribe, be notified and react when different types of events occur during Job execution, on the state of the ProActive Scheduler or when automating and scheduling jobs with the Job Planner.

notifications view
Figure 1. Notifications view

10.2. Subscriptions

10.2.1. Principle

A notification Subscription consists in a set of events selected by a user for a particular service, a type of workflows or a specific workflow.

Event reception is configurable and allows portal notifications, email sending and custom script execution.

A user creating a subscription will need to:

  • Select one of the subscription types

  • Select one or more events from a predefined list that user will be notified of.

  • Activate one or more Notification Methods.

10.2.2. Subscription types

There are three types of Subscriptions:

  • Service subscriptions.

  • Job subscriptions.

  • Channel subscriptions.

Service subscriptions

Service subscriptions are unique for each service they are associated to and created by default for every user. It enables to subscribe to events that may occur on a particular service like the ProActive Scheduler or the Job Planner.

Service subscriptions are deactivated by default, to start receiving notifications users only needs to:

jobplanner subscription
Figure 2. Example with job planner subscription

To modify a service subscription simply click on it and do your adjustments. The subscription is automatically saved.

Job subscriptions

Job subscriptions must be defined and created in order to receive notifications. Users can have multiple job subscriptions with different granularity on the workflows they are associated to.

At least one field of Workflow data must be provided to create the subscription, that means it is not possible to create a job subscription that will listen for all jobs submitted to the Scheduler.

A workflow name which references a variable (e.g., my_workflow_${my_var} or my_workflow_$my_var) is replaced by substituting the variable with its value, in case this variable is defined at the workflow level. The replacement is performed when the job subscription is created.
job subscription workflow data
Figure 3. Workflow data to create a job subscription

Two scenarios are possible when creating a job subscription:

  • If a bucket name only is set, then the user will be notified for all Workflows of the specified bucket.

  • Otherwise there has to be an exact match between the bucket name, project name and workflow name of the workflow you wish to be notified of and the associated subscription.

To create a job subscription, go to the Notification Portal, job subscriptions tab and click on the + button

job subscriptions summary
Figure 4. Button to create a job subscription

Then fill and select the mandatory fields and click on Save

job subscription creation
Figure 5. Job subscription creation

The Submitted by switch may be activated to be notified of events for job submitted only by the user who created the subscription.

To modify a job subscription simply click on it, do your adjustments and save it.

Channel subscriptions

Like service subscriptions, channel subscriptions are unique to each user. It allows users to subscribe to channels created beforehand. A user that is a notification service administrator can subscribe to any channel. Otherwise, a user can subscribe to:

  • A public channel (a channel authorized for all groups).

  • A channel whose group authorization matches one of the user groups.

To visualize past notifications and start receiving new ones, a user must update his channel subscription with the following actions:

  • Activate a notification method for the channel subscription.

  • Select desired channels from the user’s available channels list

  • Finally, activate the subscription.

channel sub
Figure 6. Channel subscription

Go to the Channel section to learn more about channels.

10.2.3. Notification methods

To define the way to be notified, a user must select at least one notification method. It is mandatory to have at least one active notification method in order to create or update a subscription.

There are three notification methods available

  • "Portal": The notification will be displayed in the Notification Portal.

  • "Email": The notification will be sent by email. In order to use this method, the SMTP settings must be defined in the dedicated configuration file, inside the ProActive server. Please refer to the corresponding Email notification configuration section in the administration guide.

  • "Third-party": To trigger a script execution. It also requires configuration to specify the location of the script, or optional options such as multiple script execution. Go to the Third party notification method section in the Admin guide to look at the full configuration.

Notification Method confirmation

An administrator can activate confirmation request for notification methods other than portal. In such case, after providing the notification method parameter (Email, Third-party parameter, Telephone) , an input field will appear below the parameter asking for a confirmation code. It is also possible to resend from there a confirmation code in case it hasn’t been received , or if it has exceeded its 15 minutes life span and is now expired.

confirmation request
Figure 7. Code confirmation request
  • "Email": An email will be sent to the provided address with the confirmation code in the body.

  • "Third-party": An execution will occur with the confirmation code in the accessible MESSAGE script field.

For notification methods which parameters are mandatory (ie: Email), the user may go to his subscriptions and activate the notification method to start receiving notification by this mean once the parameter is confirmed.

If a notification method parameter is deleted or changed and reused in the future, then a new confirmation request will be triggered.

10.3. Channels

10.3.1. Principle

A notification service Channel is a centralized space in the Notification Portal for groups of users to receive notifications created from a workflow using the Channel_Notification task. It consists of a unique name and a set of user groups authorized to subscribe to the channel.

In order to receive channel notifications, it is required to create one or more channels and update the Channel subscription to specify which channels we wish to be notified of.

10.3.2. Rules

  • Unlike other notifications, channel notifications are retroactive. It means that when a user subscribes to a channel that already had notifications , they will appear to the user.

  • Channel notifications cannot be deleted, the Housekeeping takes care of it once their life span is exceeded.

  • Notification service administrators can subscribe to any channels. Other users must be part of at least one of the authorized groups defined in the channel in order to subscribe to a channel, unless all groups are authorized (public channels).

10.3.3. Channel Creation

All users are able to create channels for any group. It is required to provide a unique channel name and a list of groups which will be authorized to subscribe to the channel.

It can either be authorized for all groups, a subset of the channel owner groups, or any free input.

create channel
Figure 8. Channel creation

10.3.4. Channel Management

Channel management is where users can update channels by modifying a channel’s name, add or remove authorized groups or delete it.

A notification service administrator is able to update any channels, whether he is the creator of the channel or not. Other users can update only channels that they have created.

To update a channel’s name or authorized groups, simply make the update and click on the Save button.

To delete a channel, click on the Bin icon on the left of the channel definition.

channel management
Figure 9. Channel management

10.3.5. Channel Notification

In order to send a channel notification, a user must append the Channel_Notification task to a Workflow.

The notification’s message, severity and notified groups are configurable by the user in the Channel_Notification. Simply update the following Task variables values to do so.

channel notif task variables
Figure 10. Channel_Notification task variables
  • MESSAGE: The message that will be displayed to the notified users. This variable is a free input and cannot be empty.

  • SEVERITY: Users can define the notification’s severity by updating this variable. Available severities are INFO, WARNING, ERROR, CRITICAL. The default severity is INFO.

  • CHANNELS: channels that will be notified. It can be a single channel, a comma separated list of channels, or the All keyword.

All (default) means: - For a notification-service administrator, send a notification to all channels - For a standard user, send a notification to all authorized channels (the user belongs to an authorized group of these channels).

10.4. Event types

Table 8. Job notifications

Type

Readable name

Severity

Message

Description

JOB_RUNNING_TO_FINISHED

Job finished

INFO

The job is finished

The job was running and is now finished

JOB_PENDING_TO_RUNNING

Job running

INFO

The job is running

The job was pending and is now running

JOB_IN_ERROR

In-error

ERROR

A task in the job encountered an error, the error handling is set to suspend in error task dependencies and set the job In-error

The job is in-error

JOB_PAUSED

Paused

WARNING

The job was running and has been paused

The job is paused

JOB_REMOVE_FINISHED

Removed

INFO

The job has been removed from the scheduler

The job is removed

JOB_RESTARTED_FROM_ERROR

Restarted

INFO

The job was in error and the user has restarted it

The job is restarted after being in-error for a while

JOB_RESUMED

Resumed

INFO

The job was paused and has been resumed

The job is resumed

JOB_SUBMITTED

Submitted

INFO

The job has been submitted to the scheduler

The job is submitted to the scheduler

JOB_FINISHED_WITH_ERRORS

Finished with issues

ERROR

A task in the job encountered an error, the error handling is set to ignore it and continue job execution

The job has finished with errors

Table 9. Task notifications

Type

Readable name

Severity

Message

Description

TASK_IN_ERROR

Task In_Error

ERROR

A task in the job encountered an error and has been set In-Error

A task in the job is In-Error

TASK_WAITING_FOR_RESTART

Task waiting on error

INFO

A task in the job is waiting for a new execution attempt after being in error, faulty or failed.

A task in the job is waiting for restart after an error

TASK_RUNNING_TO_FINISHED

Task finished

INFO

A task in the job is finished

A task in the job is finished

TASK_IN_ERROR_TO_FINISHED

In-Error Task finished

INFO

A task in the job was in error and is now finished

An In-Error task in the job is finished

TASK_PENDING_TO_RUNNING

Task running

INFO

A task in the job was pending and is now running

A task in the job is running

TASK_SKIPPED

Task skipped

INFO

The task was not executed, it was the non-selected branch of an IF/ELSE control flow action.

A task in the job is skipped

TASK_REPLICATED

Task replicated

INFO

A task in the job is replicated

A task in the job is replicated

TASK_FINISHED_WITH_ERRORS

Task finished with errors

ERROR

A task with no execution attempts left is faulty, failed or in error.

A task has finished with the an error status

TASK_PROGRESS

Task progress updated

INFO

Task progress updated to x%

The task progress variable has been updated, it can be a value between 0 and 100.

TASK_VISU_ACTIVATED

Task visualization activated

INFO

Task visualization activated

A task’s remote visualisation has been activated

Table 10. Scheduler notifications

Type

Readable name

Severity

Message

Description

PAUSED

Paused

WARNING

A user has paused the scheduler, all running jobs run to completion but jobs wont be submitted

The Scheduler is paused, all running jobs run to completion

DB_DOWN

Database down

CRITICAL

The Schedulers database is down

The Schedulers database is down

KILLED

Killed

WARNING

A user has killed the Scheduler, ProActive server needs to be restarted

The Scheduler is killed! The process of the scheduler’s executable is killed. No interaction can be done anymore.

RESUMED

Resumed

INFO

The scheduler was paused or frozen and has been resumed

The Scheduler is resumed! Jobs can be submitted again

STOPPED

Stopped

WARNING

The scheduler has been stopped

The Scheduler is stopped! Jobs cannot be submitted anymore. Already running jobs run to completion, but not pending jobs.

FROZEN

Frozen

WARNING

The scheduler has been stopped

The Scheduler is frozen! All running tasks run to completion, but running jobs wait for the scheduler to resume.

SHUTTING_DOWN

Shutting down

WARNING

The scheduler has been shutdown

The Scheduler is shutting down…​ The scheduler is shutting down itself.

STARTED

Started

INFO

The scheduler has started

The Scheduler is started! Jobs can be submitted.

Table 11. Job Planner notifications (with example values)

Type

Readable name

Severity

Message

Description

ASSOCIATION_FAILED

Association failed

CRITICAL

The server was unable to retrieve the resource Resource_Name in the bucket Bucket_Name located URL. Changing the status of the association Association_Id of the calendar Calendar_Name to FAILED.

A Catalog resource is not found, the association changes to FAILED status

ERROR

The association Association_Id of the workflow Workflow_Name to the calendar Calendar_Name had a failed execution, as it is configured, the association status is set to FAILED

An association’s status has been updated to FAILED due to the previous execution having an error

Update the association Association_Id of calendar Calendar_Name status to failed: Failed to retrieve planned calendar.

Job Planner could not retrieve the calendar of a Workflow-Calendar association. The association’s status has been updated to FAILED.

CANNOT_SUBMIT

Cannot submit

ERROR

Job-planner was unable to submit the Date_Time the workflow Workflow_Name in the bucket Bucket_Name from the association Association_Id of the calendar Calendar_Name to the scheduler

Job Planner was unable to submit the workflow from the bucket to the scheduler

SUBMISSION_POSTPONED

Submission postponed

WARNING

Submission of the workflow Workflow_Name planned at Date_Time by association Association_Id of calendar Calendar_Name is postponed because job Job_Id is on-going

A submission has been postponed because the previous execution is still under way

SUBMISSION_CANCELED

Submission canceled

WARNING

Submission of the workflow Workflow_Name planned at Date_Time by association Association_Id of calendar Calendar_Name is canceled because job Job_Id is on-going

A submission has been canceled because the previous execution is still under way

The workflow Variables_Propagation in the bucket Bucket_Name from the association Association_Id of calendar Calendar_Name was not executed at Date_Time

The workflow execution time is before the current time that means we missed an execution and it is not configured to postpone the execution

The workflow Workflow_Name in the bucket Bucket_Name from the association Association_Id of Calendar_Name was not submitted 10 times, skipping execution to the next association

Binded to the upper notification, it informs how many execution were missed

DELAYED_SUBMISSION_EXECUTING

Delayed submission executing

INFO

The workflow Workflow_Name in the bucket Bucket_Name from the association Association_Id of calendar Calendar_Name planned the Date_Time was postponed and executed the Date_Time

A planned workflow execution that has been postponed is now executing

Job Planner has detect that the execution of the workflow Workflow_Name in the bucket Bucket_Name from the association Association_Id of calendar Calendar_Name has been missed the Date_Time. As it is configured, Job-planner will submit an execution now at Date_Time

The workflow execution time is before the current time that means we missed an execution and it is configured to submit a new execution

11. Third-party credentials

11.1. Overview

Tasks executed by ProActive Scheduler might need to use a password or another kind of credentials to access third-party services or applications. For example a user might wish to pass his MySQL password to Native Task which runs a mysql command to dump a database.

In such cases it is often desirable to store the credentials separately from the Workflow definition. ProActive Scheduler provides such facility.

Third-party credentials in ProActive Scheduler are user-defined key-value pairs of strings stored on the server side in encrypted form. The credentials are accessible in the tasks executed by the user via API (Script and Java Tasks) or variable substitution (arguments of Native Tasks and parameters of Java Tasks).

11.2. Managing third-party credentials

Methods to add or remove the credentials and to list the stored credential keys are exposed in both Java and REST APIs.

End users can manage their credentials using the Scheduler Web Interface (Portal → Manage third-party credentials) or the Command Line Interface.

11.2.1. Example CLI usage

Add credential:

$ ./bin/proactive-client --put-credential MY_MYSQL_PASSWORD mypassword
Credential "MY_MYSQL_PASSWORD" successfully stored.

List credentials (only keys are listed):

$ ./bin/proactive-client --list-credentials
[...]
MY_MYSQL_PASSWORD

Remove credential:

$ ./bin/proactive-client --remove-credential MY_MYSQL_PASSWORD
Credential "MY_MYSQL_PASSWORD" successfully removed.

11.3. Using third-party credentials

11.3.1. In a Script Task

In a Script Task, the credentials script binding is a hash map containing all user’s credentials:

// mysqlPassword will contain the value corresponding to "MY_MYSQL_PASSWORD" key
mysqlPassword = credentials.get('MY_MYSQL_PASSWORD')
The credentials script binding can also be used in additional scripts defined inside a ProActive Task. To know in which scripts credentials is available, study the Script Bindings Reference.

11.3.2. In Java Task

In a Java Task, the method getThirdPartyCredential of JavaExecutable returns the credential corresponding to the key passed as parameter:

public class PrintAndReturnCredentialsTask extends JavaExecutable {
  @Override
  public Serializable execute(TaskResult... results) throws Throwable {
    // mysqlPassword will contain the value corresponding to "MY_MYSQL_PASSWORD" key
    String mysqlPassword = getThirdPartyCredential("MY_MYSQL_PASSWORD");
    [...]
  }
}

Another way to use credentials in Java tasks is via parameters. If a parameter contains a string of the form $credentials_<credential key>, the string is replaced with the value of the corresponding credential:

<task name="...">
  <javaExecutable class="...">
    <parameters>
      <!-- mysqlPassword will contain the value corresponding to "MY_MYSQL_PASSWORD" key -->
      <parameter name="mysqlPassword" value="$credentials_MY_MYSQL_PASSWORD"/>
    </parameters>
  </javaExecutable>
</task>

11.3.3. In a Native Task

Finally, a credential can be passed to a Native Task via arguments. Note that, as with Java Task parameters, the $credentials_ prefix is necessary. In the example below, /bin/echo will be called with the value corresponding to the key MY_MYSQL_PASSWORD (if the key is not present, no replacement occurs):

<task name="Task1">
  <nativeExecutable>
    <staticCommand value="/bin/echo" >
      <arguments>
        <argument value="$credentials_MY_MYSQL_PASSWORD"/>
      </arguments>
    </staticCommand>
  </nativeExecutable>
</task>

12. Scheduler, Resource Manager, DataSpace, Synchronization, and Signal APIs

From a Script Task, several APIs are accessible via Java objects bound to dedicated script bindings:

  • Scheduler API: using the schedulerapi binding, it is possible to interact directly with the scheduler server and do several operations (submit jobs, wait for tasks termination, list running jobs, etc.).

  • Resource Manager API: using the rmapi binding, it is possible to interact directly with the RM and do several operations (lock nodes, change node tags, monitor nodes, get thread dump, etc.).

  • DataSpace API: the userspaceuri or globalspaceuri bindings allow to manipulate files present in User or Global spaces. Operations include listing, uploading, downloading, or deleting files.

  • Synchronization API: using the synchronizationapi binding, it is possible to handle advanced task synchronization dynamically.

12.1. Scheduler API

From inside a Script Task, you can use the Scheduler API binding like the following:

// importing necessary classes
import org.ow2.proactive.scheduler.common.job.*
import org.ow2.proactive.scheduler.common.task.*
import org.ow2.proactive.scripting.*


// connect to the scheduler
schedulerapi.connect()

// create a hello world job
job = new TaskFlowJob()
job.setName("HelloJob")
task = new ScriptTask()
task.setName("HelloTask")
task.setScript(new TaskScript(new SimpleScript("println 'Hello World'", "groovy")))
job.addTask(task)

// submitting the job
jobid = schedulerapi.submit(job)

// Wait for the task termination
taskResult = schedulerapi.waitForTask(jobid.toString(), "HelloTask", 120000)

// displaying the task output
println taskResult.getOutput().getAllLogs(false)

The complete API description can be found in the SchedulerNodeClient JavaDoc.

The schedulerapi script binding can also be used in additional scripts defined inside a ProActive Task. To know in which scripts schedulerapi is available, study the Script Bindings Reference.

12.2. Resource Manager API

From inside a Script Task, you can use the Resource Manager API binding like the following:

// connect to the rm
rmapi.connect()

// displaying the RM state
full = rmapi.getRMStateFull()
full.getNodesEvent().each { event ->
    println(event.getNodeUrl())
    println(event.getNodeState())
}

// add node token to the current node
// get current node url
nodeUrl = variables.get("PA_NODE_URL")
rmapi.addNodeToken(nodeUrl, "MyOwnToken")


println "Test rmapi.getTopology() " + rmapi.getTopology()
println "Test rmapi.getExistingNodeSources() " + rmapi.getExistingNodeSources()
println "Test rmapi.getInfrasToPoliciesMapping() " + rmapi.getInfrasToPoliciesMapping()
println "Test rmapi.getConnectionInfo() " + rmapi.getConnectionInfo()
println "Test rmapi.getRMStateFull() " + rmapi.getRMStateFull()
println "Test rmapi.getRMThreadDump() " + rmapi.getRMThreadDump()
println "Test rmapi.getState() " + rmapi.getState()
println "Test rmapi.getSupportedNodeSourceInfrastructures() " + rmapi.getSupportedNodeSourceInfrastructures()
println "Test rmapi.getSupportedNodeSourcePolicies() " + rmapi.getSupportedNodeSourcePolicies()
println "Test rmapi.getVersion() " + rmapi.getVersion()
println "Test rmapi.isActive() " + rmapi.isActive()


// do not forget to disconnect from the rm
rmapi.disconnect();

The complete API description can be found in the documentation of RMRestInterface Java interface. The sessionid argument has to be ignored when using the rmapi script binding.

The rmapi script binding can also be used in additional scripts defined inside a ProActive Task. To know in which scripts rmapi is available, study the Script Bindings Reference.

12.3. DataSpace API

The traditional way to transfer files from the user or global space to a task is by using file transfer directives as described in chapter Data Spaces.

Where file transfer directives are sufficient to cover usual cases, it is sometimes necessary to manipulate directly the dataspace API to have a finer control level.

Below is an example use of this api inside a Script Task:

// connect to the user space
userspaceapi.connect()

// push file
inFile = new File("inFile.txt");
inFile.write("HelloWorld")
userspaceapi.pushFile(inFile, "remoteDir/inFile.txt")

// list files

remotefiles = userspaceapi.listFiles("remoteDir", "*.txt"
println remotefiles

// pull File
outFile = new File("outFile.txt")
userspaceapi.pullFile("remote/inFile.txt", outFile)

println outFile.text

The complete description of user and global space APIs can be found in the DataSpaceNodeClient JavaDoc.

The userspaceapi and globalspaceapi script bindings can also be used in additional scripts defined inside a ProActive Task. To know in which scripts these bindings are available, study the Script Bindings Reference.

12.4. Synchronization API

This Synchronization API allows for both communication and synchronization between Tasks and between Jobs dynamically at execution. It is based on a Key-Value Store offering atomic operations. It enables to easily put into action all classical synchronization patterns: Critical Sections and Mutual Exclusion, Rendez-vous, Barriers, Producer/Consumer…​

In summary and for simple use, just go to ProActive Studio and use the predefined templates in the Controls menu (e.g., Key_Value). Use those templates in your own workflow to see how it works. You can further try the examples below to better understand the usage of the Synchronization API.

Task synchronization is traditionally handled by the ProActive Scheduler through Task Dependencies, where dependent ProActive Tasks are started only after their parents Tasks completed.

In some cases, static task dependencies are not enough to synchronize tasks execution.

This is the case when a task depends on the status of another task executed in a different branch of a workflow, a sibling task or even another job/workflow.

For example, let’s suppose workflow A contains a single task A_1 starting an apache server. Task A_1 starts the server and remains alive while the server is up. When workflow A is killed, task A_1 stops the Apache server.

A task B_1 of another workflow B wants to interact with this server.

Problem: task B_1 does not know if the apache server is up and how to contact it.

Using Task Synchronization API, task A_1 can advertise the server URL to the Synchronization Service.

Task B_1 can use Task Synchronization API to wait until the apache server is advertised, it can also perform the necessary operations and advertise task A_1 that the operation is complete and the apache server can be terminated.

As we can see, Task Synchronization API can be used for two-ways synchronization.

Synchronizations can be used in any part of a Task (Pre/Post/Clean Scripts, Task Implementation, Selection Script). Using Synchronization API within Selection Scripts is a way to avoid a Task to be started (consuming a Node) and to be actually doing nothing, just waiting for a specific condition to occur.

12.4.1. Usage

Task Synchronization API is a Java API defined by the following API documentation.

The Synchronization API is connected to a centralized Synchronization Service. This service ensures that each request is executed atomically, and that changes are persisted on disk.

The Synchronization API is available inside ProActive Tasks through the synchronizationapi script binding.

Channels

The Synchronization Service works as a Key-Value Store organized in channels. Each channel acts as a private Key-Value Store.

This means that each put operation is identified by a 3-tuple: <Channel_identifier, Key_binding, Value>.

Users of the Synchronization API are responsible to create channels as fit their needs. Channels have no notion of ownership, a channel created by workflow A started by user Alice can be used by workflow B started by user Bob.

This is because Synchronization API is often used to synchronize workflow tasks across multiple users.

Therefore, channel naming is important and should be precise enough to avoid accidental overlap. Channels should be created when needed and deleted after use.

For example, a channel which is used only inside a specific job instance (to synchronize tasks withing this job) should be created inside some initialization task at the beginning of this job, and deleted in some cleaning task at the end. The channel name should contain the Job Id.

When channels are created, the user can decide if the channel should be persisted on disk or kept in memory only.

On startup, the Synchronization Service contains only the persistent channels which were previously created using the Synchronization API. In-memory channels created before the reboot of the scheduler are lost.

API Description

The Synchronization API is similar to the Java 8 Map interface, with the following differences:

  • It contains methods to create/delete channels and methods to interact with the channel Map itself.

  • Each channel Map operations must take a channel identifier parameter.

  • The Synchronization API supports Java 8 lambdas, but lambdas must be passed as string arguments defining Groovy Closures. For example, compute("channel1", "key1", "{k, x -> x+1}") instead of compute("channel1", "key1", (k, x) -> x+1).

  • The Synchronization Service guaranties that multiple operations, coming from different ProActive Tasks, will all be executed atomically.

  • In addition to the methods present in the Java 8 Map interface, Synchronization API provides blocking methods to wait for a specific condition on a channel. This is mainly why this API is called Synchronization, it allows for different ProActive Tasks to be blocked until specific conditions are met.

  • When an error occurs during the Groovy Closure compilation or execution, dedicated exceptions (CompilationException or ClosureEvaluationException) will be thrown.

12.4.2. Examples

Example 1: Simple Key/Value With Selection Script

In this first example, we will show how one task can delay the execution of another task using the Synchronization API and a selection script.

Key/Value With Selection Script
Figure 11. Key/Value With Selection Script

This workflow contains the following Tasks:

  • Init: Initialize the channel. This task creates a channel using the current job id. It also sets a lock binding in the key/value store with an initial true value.

    jobid = variables.get("PA_JOB_ID")
    synchronizationapi.createChannel(jobid, false)
    synchronizationapi.put(jobid, "lock", true)
    println "Channel " + jobid + " created."
  • TaskB_Wait: Wait to be unlocked. This task will not be executed until the lock binding changed to false. A selection script allows to handle this verification:

    selected = !(synchronizationapi.get(variables.get("PA_JOB_ID"), "lock"))
  • TaskA: Unlock TaskB. This task will sleep for a few seconds and then unlock TaskB using the Synchronization API:

    println "Sleeping 5 seconds"
    Thread.sleep(5000)
    println "Unlocking Task B"
    synchronizationapi.put(variables.get("PA_JOB_ID"), "lock", false)
  • Clean: Delete the channel. This task simply deletes the channel used in this job. As there is no automatic mechanism to remove channels, it is necessary to delete them explicitly when they are not used any more.

    jobid = variables.get("PA_JOB_ID")
    synchronizationapi.deleteChannel(jobid )
    println "Channel " + jobid + " deleted."

Here is output of the Key/Value workflow:

[28t2@trydev.activeeon.com;14:24:22] Channel 28 created.
[28t0@trydev.activeeon.com;14:24:28] Sleeping 5 seconds
[28t0@trydev.activeeon.com;14:24:33] Unlocking Task B
[28t1@trydev.activeeon.com;14:24:40] Task B has been unlocked by Task A
[28t3@trydev.activeeon.com;14:24:46] Channel 28 deleted.
It is also possible to debug the operations performed on the Synchronization Store. This is possible by using the Server Logs associated with the job. Server Logs are very verbose, the [Synchronization] pattern helps to find logs associated with the Synchronization API:
[2018-04-24 14:24:22,685 de72716883 INFO            o.o.p.s.u.TaskLogger] task 28t2 (Init) [Synchronization]Created new memory channel '28'
[2018-04-24 14:24:22,786 de72716883 INFO            o.o.p.s.u.TaskLogger] task 28t2 (Init) [Synchronization][Channel=28] Put true on key 'lock', previous value was null
[2018-04-24 14:24:34,431 de72716883 INFO            o.o.p.s.u.TaskLogger] task 28t0 (TaskA) [Synchronization][Channel=28] Put false on key 'lock', previous value was true
[2018-04-24 14:24:46,882 de72716883 INFO            o.o.p.s.u.TaskLogger] task 28t3 (Clean) [Synchronization]Deleted memory channel '28'

The Complete Workflow example can be downloaded here.

Example 2: Simple Key/Value with Explicit Wait

In this second example, we will show how one task can delay the execution of another task using the Synchronization API and an explicit wait call.

This example is very similar to the first example, but instead of delaying the execution of TaskB using a selection script, TaskB will start its execution and explicitly call the Synchronization API to wait.

  • Tasks Init, TaskA and Clean do not change

  • TaskB_Wait is modified. The selection script is removed and instead the following code is used in the main task definition:

    println "Waiting for Task A"
    synchronizationapi.waitUntil(variables.get("PA_JOB_ID"), "lock", "{k, x -> x == false}")
    println "Task B has been unlocked by Task A"

    As you can see, the waitUntil method expects a BiPredicate defined as a string Groovy Closure. The predicate receives two parameters: k contains the key, and x contains the binding associated with this key. This predicate is evaluated on the lock binding, and waits until this binding == false.

Here is the output of the Example 2 workflow:

[29t2@trydev.activeeon.com;14:43:42] Channel 29 created.
[29t0@trydev.activeeon.com;14:43:49] Sleeping 5 seconds
[29t1@trydev.activeeon.com;14:43:48] Waiting for Task A
[29t0@trydev.activeeon.com;14:43:54] Unlocking Task B
[29t1@trydev.activeeon.com;14:43:55] Task B has been unlocked by Task A
[29t3@trydev.activeeon.com;14:44:01] Channel 29 deleted.

The Complete Workflow example can be downloaded here.

Example 3: Critical Section

In this last example, we will show how we can create a critical section inside a workflow, which means, a section of the workflow which can only be accessed by a configurable number of slots.

The Critical Section workflow is defined with a Replicate Control Structure:

Critical Section
Figure 12. Critical Section

It contains the following Tasks:

  • Init: Initialize the channel. This task creates a channel using the current job id. It also sets a available_slots binding in the key/value store with an initial value based on the workflow user-defined NB_SLOTS job variable.

    jobid = variables.get("PA_JOB_ID")
    synchronizationapi.createChannel(jobid, false)
    synchronizationapi.put(jobid, "available_slots", Integer.parseInt(variables.get("NB_SLOTS")))
    println "Channel " + jobid + " created."

    This tasks also define a control flow script used to define the number of replicated tasks. The number of replicated tasks is parametrized with the user-defined NB_TASKS job variable:

    runs=Integer.parseInt(variables.get("NB_TASKS"))
  • Section: Wait for the critical section to be available. This task will block until the number of available slots is greater than 0. As soon as at least one slot is available, it decrements the available_slots counter and perform some computation. When the computation is terminated, it releases the slot by incrementing back the available_slots counter.

    println "Waiting for available slot"
    jobid = variables.get("PA_JOB_ID")
    synchronizationapi.waitUntilThen(jobid, "available_slots", "{k, x -> x > 0}", "{k, x -> x - 1}")
    println "Enter critical section"
    // simulate some computation
    Thread.sleep(10000)
    println "Release slot"
    synchronizationapi.compute(jobid,"available_slots", "{k, x -> x + 1}")

    As you can see, the first method used in this section is waitUntilThen. This method expects both a BiPredicate and a BiFunction defined as strings Groovy Closure.

    The semantic of this method is to wait until the predicate is met and, as soon as it is, immediately perform a remapping.

    In our scenario, the number of available slots is immediately decremented to reserve a slot and enter the critical section.

    Releasing of the slot is performed at the end of the task using the compute method.

  • Clean: Delete the channel. Same code as the previous examples.

Here is the output of the Example 3 workflow:

[3t0@trydev.activeeon.com;16:14:14] Channel 3 created.
[3t1@trydev.activeeon.com;16:14:22] Waiting for available slot
[3t4@trydev.activeeon.com;16:14:22] Waiting for available slot
[3t3@trydev.activeeon.com;16:14:22] Waiting for available slot
[3t5@trydev.activeeon.com;16:14:22] Waiting for available slot
[3t1@trydev.activeeon.com;16:14:23] Enter critical section
[3t4@trydev.activeeon.com;16:14:23] Enter critical section
[3t4@trydev.activeeon.com;16:14:33] Release slot
[3t1@trydev.activeeon.com;16:14:33] Release slot
[3t3@trydev.activeeon.com;16:14:33] Enter critical section
[3t5@trydev.activeeon.com;16:14:34] Enter critical section
[3t6@trydev.activeeon.com;16:14:38] Waiting for available slot
[3t3@trydev.activeeon.com;16:14:43] Release slot
[3t5@trydev.activeeon.com;16:14:44] Release slot
[3t6@trydev.activeeon.com;16:14:44] Enter critical section
[3t6@trydev.activeeon.com;16:14:54] Release slot
[3t2@trydev.activeeon.com;16:14:59] Channel 3 deleted.

We can see that no more than two replicated tasks enter the critical section at the same time, and that releasing slots allows new tasks to enter the critical section.

The complete Workflow example can be downloaded here.

Implementing the Critical Section example using selection scripts rather than explicit wait methods is more complex. The reason behind is that selection scripts are meant to be stateless. The same selection script is executed on all ProActive Nodes by the Scheduler before deciding if a Node can be used to execute a single Task. In our use case, we must both decide atomically if a slot is available (stateless) and reserve a slot (stateful). This decision will be performed simultaneously on all available ProActive Nodes. An example implementation can be downloaded here. It makes use of a reserved_slots binding containing a set of task ids.

12.4.3. Conclusion

Task Synchronization API can be used for multiple dynamic synchronization patterns such as:

  • Cross-workflow synchronization

  • Critical Sections

  • Barriers (Wait for several parallel tasks to reach a certain point)

  • Producer / Consumer

A complete example of Producer / Consumer, ServiceDeployerConsumer can be downloaded here.

12.5. Signal API and Templates

The Signal API is a high-level API that enables users to interact with running jobs via signals (we also call them actions). That is, it allows a job to expose specific signals (e.g., soft_stop to properly finish a job) that can be triggered by users from the Workflow Execution portal or Scheduler portal. You can try the examples below to better understand how it works.

In summary and for simple use, just go to ProActive Studio and use the predefined templates in the Controls bucket, 4. Signal Templates project: for example: Check_For_Signals, Wait_For_Signals, Wait_For_Signals_Reactive or Wait_For_Signals_With_Variables. Fill up the Task Variable SIGNALS, and use those templates in your own workflow to be able to receive input from, for instance, the Workflow Execution portal or Scheduler portal.

Check_For_Signals is a template workflow composed of two tasks. The first sends a ready notification for the signals specified in the variable SIGNALS'. The second task checks if one signal is received, and if not, the workflow runs in a loop where each iteration lasts one minute. This workflow allows users to do an iterative processing, and decide at each iteration to continue this processing or not, i.e., (i) declare being ready to receive one or more signals, (ii) do some processing, then (iii) receive signals and decide whether to continue the processing or not.

Wait_For_Signals is a template workflow that contains a single task, which sends a ready notification for the specified signals in the 'SIGNALS' variable, then checks if one or more signals are received by the job. If not, the workflow runs in a loop where each iteration lasts one minute. This workflow allows users to iteratively wait for the reception of one or more signals, then trigger some processing.

Wait_For_Signals_Reactive is a template workflow that sends a ready notification for all specified signals in the 'SIGNALS' variable, then waits until one signal among those specified is received. This workflow contains no loop. It performs a blocking wait, then immediately triggers some processing at a signal reception.

Wait_For_Signals_With_Variables is a template workflow that uses signals having input parameters. It contains a single task that sends a ready notification for all the signals (with input parameters) specified in the variable SIGNALS, then loops until one signal among those specified is received by the job. The signal input parameters have predefined values that can be changed when sending the signal.

At a more detailed level and advanced use, the Signal API allows jobs and users to exchange signals using a communication channel provided by the synchronization API. The typical usage scenario of the Signal API is the following:

  1. Signals are defined by their name and, optionally, a list of variables. Variables attached to a signal have the same structure as Workflow variables but are declared exclusively using the Signal API.

  2. If the signal has variables, then a new window will appear where the user can set the variables values and then send the signal with variables to the job. If the signal does not have variables then the window will not appear. Adding variables to a signal allows to enhance the manual control of the workflow execution. After receiving the signal with attached variables, the workflow will then need to process this information and react accordingly (e.g. continue a calculation with a new set of parameters, etc…​)

  3. A job declares to be ready to handle one or more signals that can have attached variables. The signals are then displayed in the Workflow Execution portal or Scheduler portal(e.g., display the soft_stop signal).

  4. The job waits to receive these signals, or check for their reception regularly.

  5. A user clicks on the signal displayed in the Workflow Execution portal or Scheduler portal (e.g., click on the soft_stop signal), which sends this signal to the job.

  6. At the signal reception, the job executes the corresponding behavior (defined in the workflow). When the signal has attached variables, variables must be valid according to their model specification (see Variable Model). Sending a signal will not be possible unless they are validated.

As already mentioned, the Signal API uses the synchronization API to store jobs' signals, with respect to the following specifications:

  • It uses a single persistent channel (called PA_SIGNALS_CHANNEL + {jobId}) to store the signals and variables of all jobs. The signal channel contains an entry per signal.

  • Job signals are automatically removed when the job execution finishes (i.e., the job is in the status Finished, Cancelled, Failed or Killed).

The Signal API is based on the synchronization API. However, these APIs are intended for different purposes. The Synchronization API is used to exchange information (e.g., messages) between tasks and between jobs, whereas the Signal API is useful to exchange signals between a job and users (via the Workflow Execution portal or the REST API), it cannot be used to send signals from one job to another.

12.5.1. Usage

Task Signal API is a Java API defined by the following API documentation.

The Signal API is available inside ProActive Tasks through the signalapi script binding.

It is easy to use, and offers a set of basic operations on signals summarized as follows:

  • readyForSignal: declares that the current job is ready to receive and handle one or more signals with or without variables. Variables are passed as a list of JobVariable objects This operation constructs new signals (by adding the prefix ready_ to the input signals), then adds them to job signals. For example, if a job declares ready for the signal soft_stop, the signal ready_soft_stop is added to the set of job signals. Additionally, if the input signal (e.g., soft_stop) already exists among the job signals (which means that it has been sent and handled previously), the operation ready for signal automatically removes it, thus the job is ready again to receive and handle the same signal.

  • sendSignal, sendManySignals: sends programmatically one or more signals with or without variables given as input to the current job signals. This operation removes the corresponding ready signals if they already exist. For instance, sending soft_stop removes the signal ready_soft_stop if it already exists among the job signals. Though signals are meant to be sent manually by a human user through to ProActive web portals, it is also possible to use these methods to send them programmatically inside the workflow.

  • checkForSignals: verifies whether all of the signals given as input exist among the set of job signals.

  • waitFor, waitForAny: blocking instruction that waits until one signal (among those given as input) is added to the set of job signals. The instruction waitFor will return a map that represents the name and value of the signal variables. waitForAny instruction returns the first signal object that is received among those given as input

  • removeSignal, removeManySignals: removes one or more signals from the set of job signals.

  • getJobSignals: returns the current job signals and variables.

In addition to the above operations available within ProActive Tasks, users and third-party systems can send signals to a specific job via ProActive Signals REST endpoint.

12.5.2. Examples

Standard and simple use with predefined tasks in the Control template menu
  1. Check_For_Signals: a control pattern that executes a loop until one ore more signals, given as input, are received.

  2. Wait_For_Signals: a blocking control pattern that blocks the task execution waiting for the reception of one or more signals given as input.

Advanced use with Signal templates

The Signal menu contains many workflow examples that use all signal operations. Below, we give an example of a groovy code that uses some Signal API operations.

//Declare that the current job is ready to receive and handle the 'soft_stop' signal
signalapi.readyForSignal("soft_stop")

//Display current job signals
println("Job signals "+ signalapi.getJobSignals())

//Wait util the 'soft_stop' signal is sent (via the Workflow Execution portal, Scheduler portal or REST API)
signalapi.waitFor("soft_stop")

//Remove the 'soft_stop' signal from the set of job signals
signalapi.removeSignal("soft_stop")

Here is another example of a groovy code that uses some Signal API operations with signal variables.

import org.ow2.proactive.scheduler.common.job.JobVariable

//Declare the signal variables
List <JobVariable> signalVariables = new java.util.ArrayList<JobVariable>()
signalVariables.add(variables.get("INTEGER_VARIABLE"))
signalVariables.add(variables.get("BOOLEAN_VARIABLE"))

//Declare that the current job is ready to receive and handle the 'soft_stop' signal with the given variables
signalapi.readyForSignal("soft_stop", signalVariables)

//Display current job signals
println("Job signals "+ signalapi.getJobSignals())

//Wait util the 'soft_stop' signal with variables is sent (via the Workflow Execution portal, Scheduler portal or REST API)
outputParameters = signalapi.waitFor("soft_stop")

//The outputParameters represent a hashmap containing the names of the variables and their values entered by the user
println "Output parameters = " + outputParameters

//Remove the 'soft_stop' signal from the set of job signals
signalapi.removeSignal("soft_stop")
Usage of the Signals REST endpoint

Below, we provide an example based on curl, which shows how to use the Signals REST endpoint for sending a signal to a specific job:

curl -X POST -H "sessionid:$sessionid"
http://try.activeeon.com:8080/rest/scheduler/job/job_id/signals?signal="soft_stop"

12.5.3. Conclusion

The Signal API allows users to interact with jobs during their execution, thus enabling to impact and change jobs' behaviors at runtime. Many use cases can be fulfilled using this API. For instance:

  • Softly stopping a job: when the stop signal is received, the job executes some code that properly finish this job.

  • Stop or continue job execution based on the user decision: a job suspends its execution waiting for the user to send a signal, either stop or continue.

  • Scaling up and down resources on user demand: a job that deploys resources (e.g., compute resources, workers/agents, etc) can scale up and down these resources when receiving the corresponding signals from users (via the Workflow Execution portal), or from third-party systems (via the Signals REST endpoint).

13. Scheduling policies

See Admin Guide to configure scheduling policies.

13.1. Earliest deadline first (EDF) policy

Scheduler has Earliest deadline first policy (EDF policy) that prioritizes jobs with deadlines. Overall, the general principal is the following: a job that needs to be started as soon as possible to meet its deadline taking into account its execution time has higher priority.

The EDF policy considers two Generic Information items: JOB_DDL and JOB_EXEC_TIME.

JOB_DDL represents job deadline. Job deadline can be set in one of the following format:

  1. absolute format: YYYY-MM-dd`T`HH:mm:ss+ZZ, e.g. 2018-08-14T08:40:30+02:00

  2. relative (to the submission time) format: +HH:mm:ss , e.g. +5:30:00, or +mm:ss, e.g. +4:30, or +ss, e.g. +45 (45 seconds)

If job deadline is absolute then its value equals to JOB_DDL.

If job deadline is relative then its value equals to submission time + JOB_DDL.

JOB_EXEC_TIME represents job expected execution time in the following format: HH:mm:ss , e.g. 12:30:00, 5:30:00, or mm:ss, e.g. 4:30, or ss, e.g. 45 (45 seconds).

EDF policy orders jobs by applying the rules below. The rule considers jobs at submission time when they are still pending, and orders them by priority.

Thus EDF policy applies these rules (in the given order) until one rule actually applies. Here is the ordered list of rules for the EDF policy:

  1. compare job’s intrinsic priority (HIGHEST, HIGH, NORMAL, LOW, LOWEST)

  2. job with deadline has priority over job without deadline

  3. job that is already started has priority over job that has not yet started

  4. job that started earlier has priority over job that was started later

  5. job with the earliest JOB_DDL minus JOB_EXEC_TIME has a priority

  6. job with earliest submission time.

13.1.1. Email notifications

EDF policy sends 3 types of email notifications:

  1. (Arriving soon) When job deadline is at risk (if the job is not started in the next X minutes, its deadline will be missed)

  2. When job will likely miss its deadline (i.e if the provided JOB_EXEC_TIME exceeds the left time until the deadline)

  3. (Arriving soon) When job has already missed its deadline.

Policy sends emails to the address specified in EMAIL Generic Information of the job.

To enable email notifications see Admin Guide.

14. Data Connectors

The data-connectors bucket contains diverse generic data connectors for the most frequently used data storage systems (File, SQL, NoSQL, Cloud, ERP). The aim of a data connector is to facilitate and simplify data access from your workflows to external data sources.

14.1. File

The File connectors allow to import and export data from HTTP, FTP and SFTP servers. We begin by presenting FTP, FTPS and SFTP connectors then, URL connector.

14.1.1. FTP, FTPS and SFTP connectors

The FTP, FTPS and SFTP file connector workflows consists of two tasks:

  • Import_from_FTP/SFTP_Server: Downloads files or folders recursively from an FTP/SFTP server to the data space.

  • Export_to_FTP/SFTP_Server: Uploads files or folders recursively from the data space (global or user space) to an FTP/SFTP server.

Variables:

The FTP, FTPS and SFTP connectors share the same list of variables with the exception of the FTPS connector which has in addition two other variables related to the SSL Certificate configuration. Consequently, we describe these variables in the following table using a unique notation.

Table 12. FTP/FTPS/SFTP Connector common variables

Variable name

Description

Scope

Required?

Type

Default/Examples

HOST

Server host.

Workflow, both Tasks

Yes

String

localhost

USERNAME

Username to use for the authentication.

Workflow, both Tasks

Yes

String

e.g. toto

PORT

Listening port.

Workflow, both Tasks

No

Integer

e.g. 21, 22

FILE_PATTERN

Either a file name or a wildcard string pattern.

Task

Yes

String

e.g. file.txt, *.txt, **

LOCAL_BASE

Either a global path or a local relative path from which we export (or to which we import) file(s). It can be either a directory terminated by / or an empty value for the root.

Task

No

String

e.g. localDirectory/, /global/path/localDirectory/

REMOTE_BASE

Remote relative path on the server to which we export (or from which we import) file(s). It can be either a path to a folder terminated by / or an empty value for the root.

Task

No

String

e.g. remoteDirectory/

The following table describe the specific workflow variables of the FTPS Connector to Import/Export file(s) from/to an FTP server over an SSL/TLS certificate.

Table 13. FTPS Connector specific variables

Variable name

Description

Scope

Required?

Type

Default/Examples

CLIENT_CERTIFICATE_AUTHENTICATION

If true, enable client certificate authentication.

Workflow

Yes

Boolean

Default: False

CLIENT_CERTIFICATE_CRED

Client SSL certificate credential.

Workflow

No

String

e.g. client_certificate_cred

CLIENT_PRIVATE_KEY_CRED

The private key credential.

Workflow

No

String

e.g. private_key_cred

CLIENT_PRIVATE_KEY_PASSWORD

The password to protect the key.

Workflow

No

String

e.g. my_password

CLIENT_PRIVATE_KEY_ALIAS

The alias of the key to use, may be null in which case the first key entry alias is used.

Workflow

No

String

e.g. private_key_alias

SERVER_CERTIFICATE_VERIFICATION

If true, enable to verify server certificate.

Workflow

Yes

Boolean

Default: True

PROVIDE_SERVER_CERTIFICATE

If true, enable to provide server certificate.

Workflow

No

Boolean

Default: False

SERVER_CERTIFICATE_CRED

Public Server SSL certificate.

Workflow

No

String

e.g. server_certificate_cred

How to use this connector:

The connector requires the following third-party credentials: {key: <PROTOCOL>://<username>@<hostname>, value: PASSWORD} where <PROTOCOL> can take one of the following values: {ftp, ftps, sftp}. Please refer to the User documentation to learn how to add third-party credentials.

14.1.2. URL connector

The URL connector allows, in addition, to import data using HTTP and HTTPS protocols.

Variables:

Table 14. URL Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

FILE_URL

Link to a file accessible using HTTP or HTTP protocols.

Task

Yes

String

e.g. http://www.pdf995.com/samples/pdf.pdf

LOCAL_BASE

Local relative path to which we download file(s). LOCAL_RELATIVE_PATH can contain either a path to a file, a directory terminated by / or an empty value for the root.

Task

No

String

e.g. localDirectory/, example.zip

EXTRACT_ARCHIVE

If set to true, the imported file will be extracted if it is an archive.

Task

Yes

Boolean [true or false]

false

14.2. SQL

In this part, we first present the data connector tasks that allow users to access and manipulate data of the most frequently used databases, then we show how to use pooled connection to SQL databases across multiple task executions.

14.2.1. Data Connectors

The data connectors allow to import and export data from and to Relational DataBase Management Systems (RDBMS). Currently, we have connectors for MySQL, Oracle, PostgreSQL, Greenplum and SQL Server.

All SQL connector workflows are composed of two tasks:

  • Import_from_<Database>: Import data from an SQL database server to be processed in the same workflow or just to be stored as a csv file in the data space.

  • Export_to_<Database>: Export data from the data space (global or user space) to an SQL database server.

where Database ∈ {MySQL, Oracle, PostgreSQL, Greenplum and SQL Server}

Variables:

All SQL connectors share the same list of variables. Consequently, we describe them in the following table using a unique notation.

Table 15. SQL Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

HOST

Label or the IP address of the server hosting the database.

Workflow, both Tasks

Yes

String

localhost

PORT

Listening port.

Workflow, both Tasks

No

Integer

e.g. 5432, 1521

USER

Username to use for connecting to the database.

Workflow, both Tasks

Yes

String

DATABASE

Database name.

Workflow, both Tasks

Yes

String

e.g. MY_DATABASE

QUERY

Requires an SQL query or a table name to fetch data from.

Import Task

Yes

String

e.g. SELECT * FROM …​

OUTPUT_FILE

Relative path in the data space used to save the results in a CSV file.

Import Task

No

String

e.g. path/to/my/output.csv

OUTPUT_TYPE

Task result output type (HTML or CSV). If set to HTML, it allows to preview the results in Scheduler Portal in HTML format.

Import Task

No

String

Values: {CSV, HTML}.

Default: CSV

TABLE

The table to insert data into.

Export Task

Yes

String

e.g. MY_TABLE

INSERT_MODE

Indicates the behavior to follow when the table exists in the database amongst:

- fail: If table exists, do nothing.

- replace: If table exists, drop it, recreate it, and insert data.

- append: (default) If table exists, insert data. Create if does not exist.

Export Task

Yes

String

Default: append

INPUT_FILE

- It can be a relative path in the dataspace of a csv file containing the data to import.

- or a valid URL amongst http, ftp, s3, and file.

Export Task

Yes

String

e.g. path/to/data.csv or http://link/to/my/data/csv

How to use this task:

When you drag & drop an SQL connector, two tasks will be appended to your workflow: an import task and an export task. Each task can be used as a separate component in an ETL pipeline; thus, you can keep one of them depending on your needs and remove the other or you can use them both.

The task requires the following third-party credential:

{key: <mysql|postgresql|sqlserver|oracle|greenplum>://<<USER>@<<HOST>:<<PORT>, value: PASSWORD}.

e.g.

key value

mysql://myuser@localhost:1234

your MYSQL_PASSWORD

sqlserver://anotheruser@10.0.0.1:5432

your SQLSERVER_PASSWORD

This is a one-time action and will ensure that your credentials are securely encrypted. Please refer to the User documentation to learn how to add third-party credentials.

In the import mode, the output containing the imported data takes one or many of the following forms:

  • in a CSV format to saved to:

    • the OUTPUT_FILE in the data space if specified by the user

    • and to the result variable to make is previewable in the scheduler portal and to make it accessible for the next tasks.

  • in a JSON format using the variable DATAFRAME_JSON.

14.2.2. Database Connections Pooling

To allow the persistence of a single connection across multiple SQL tasks (basically when running several tasks concurrently on the same machine), we have implemented the connection pooling feature to any external SQL database based on the HikariCP Framework. This feature is demonstrated through two workflow templates: SQL_Pooled_Connection_Query and SQL_Pooled_Connection_Update. Both are single-task workflows.

Variables:

The variables listed in the following table exist for both SQL_Pooled_Connection_Update and SQL_Pooled_Connection_Query workflows.

Table 16. Workflows' variables

Variable name

Description

Scope

Required?

Type

Default/Examples

RDBMS_NAME

Name of the relational database management system.

Workflow, Task

Yes

String

Values: {postgresql, mysql, greenplum, sqlserver, oracle, hsqldb}

HOST

Server host.

Workflow, Task

Yes

String

localhost

PORT

Listening port.

Workflow, Task

No

Integer

e.g. 5432, 1521

USERNAME

Username to use for connecting to the database.

Workflow, Task

Yes

String

e.g MY_USER

DATABASE

Database name.

Workflow, Task

Yes

String

e.g. MY_DATABASE

POOL_PROPERTY

HikariCP framework provides many data source properties that can be used to configure the DB pooled connection such as autoCommit, maximumPoolSize, maxLifetime, idleTimeout …​. You can add as many properties as you want. For each one, please add a new task variable where the variable name is the property name having "POOL_" as a prefix and the variable value is the property value. For more info, please refer to https://github.com/brettwooldridge/HikariCP.

Workflow, Task

No

String

e.g. POOL_autoCommit, POOL_maximumPoolSize

SQL_STATEMENTS

To change or update data, you have to use CREATE, INSERT , UPDATE , or DELETE statements. To query data, you have to perform SELECT statements.

Workflow, Task

Yes

String

e.g. SELECT * FROM …​, DROP TABLE …​

In addition to the aforementioned variables, SQL_Pooled_Connection_Query workflow has two more variables.

Table 17. SQL_Pooled_Connection_Query additional variables

Variable name

Description

Scope

Required?

Type

Default/Examples

OUTPUT_TYPE

If set to HTML, it allows to preview the results in the Scheduler Portal in a HTML format. If set to CSV, it allows to download the results from the Scheduler Portal in a CSV format.

Workflow, Task

Yes

String

Values: {CSV, HTML}.

Default: HTML

STORE_RESULT_VARIABLE

if not empty, the value will be the name of a variable that contains the resultSet of the query (converted into a List of Maps). This variable can be used in other tasks.

Workflow, Task

No

String

Values: {CSV, HTML}.

Default: HTML

How to use these workflows:

The scheduler should be configured in non-fork mode to execute tasks in a single JVM rather than starting a dedicated JVM to run the task (In $PROACTIVE_HOME/config/scheduler/settings.ini, set pa.scheduler.task.fork=false).

Data connectors workflows may need to add some third-party dependencies to their classpath (notably, JDBC jar libraries), so they can be executed properly. Please refer to the Admin guide documentation to learn how to add link:../admin/ProActiveAdminGuide.html#__add_custom_third_party_libraries_to_proactive_server_and_nodes_classpath].

In case the password is required, you have to add the following third-party credential: {key: <mysql|postgresql|sqlserver|oracle|hsqldb:hsql>://<USERNAME>@<HOST>, value: PASSWORD}. This is a one-time action and will ensure that your credentials are securely encrypted. Please refer to the User guide documentation to learn how to add third-party credentials.

14.3. NoSQL

The NoSQL connectors allow to import data from NoSQL Databases. Currently, we have connectors for MongoDB, Cassandra and Elasticsearch. All NoSQL connector workflows are composed of two tasks:

  • Import_from_<Database>: Import data from a NoSQL database server to be processed in the same workflow or just to be stored as a csv file in the data space.

  • Export_to_<Database>: Export data from the data space (global or user space) to a NoSQL database server.

14.3.1. MongoDB

Variables:

Table 18. MongoDB Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

MONGODB_HOST

Label or the IP address of the server hosting the database.

Workflow, Import Task, Export Task

Yes

String

localhost

MONGODB_PORT

Listening port.

Workflow, Import Task, Export Task

No

Integer

27018

MONGODB_USER

Username to use for connecting to MongoDB server if authentication is required.

Workflow

No

String

MONGODB_DATABASE

Database to use. In export mode, it is created if it does not exist.

Workflow, Import Task, Export Task

Yes

String

e.g. my_database

MONGODB_COLLECTION

Collection to use. In export mode, it is created if it does not exist.

Workflow, Import Task, Export Task

Yes

String

e.g. my_collection

MONGODB_QUERY

Requires a NoSQL query to fetch data. If empty ({}), it will fetch all documents.

Import Task

No

String

{}

MONGODB_OUTPUT

Relative path in the data space used to save the results in a JSON file.

Import Task

No

String

e.g. path/to/my/output.json

LABEL

If the imported data is labeled for machine learning, a label attribute can be specified using this variable.

Import Task

No

String

e.g. class

MONGODB_INPUT

A JSON Object/Array to be inserted in MongoDB. This variable can:

- A String describing the JSON Object/Array

- A relative path in the data space of a JSON file.

Export Task

Yes

String

e.g.

{"document":{"key":"value"}}

or path/to/input.json

How to use this task:

The task might require (if the MongoDB server requires authentication) a MONGODB_USER in addition to the following third-party credential: {key: mongodb://<MONGODB_USER>@<MONGODB_HOST>:<MONGODB_PORT>, value: MONGODB_PASSWORD}. Please refer to the User documentation to learn how to add third-party credentials.

In the import mode, the output containing the imported data takes one or many of the following forms:

  • in a JSON format to saved to:

    • the MONGODB_OUTPUT file in the data space if specified by the user

    • and to the result variable to make is previewable in the scheduler portal and to make it accessible for the next tasks.

14.3.2. Cassandra

Variables:

Table 19. Cassandra Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

CASSANDRA_HOST

Label or the IP address of the server hosting the database.

Workflow, Import Task, Export Task

Yes

String

localhost

CASSANDRA_PORT

Listening port.

Workflow, Import Task, Export Task

No

Integer

27018

CASSANDRA_KEYSPACE

Keyspace to use.

Workflow, Import Task, Export Task

Yes

String

e.g. my_keyspace

CASSANDRA_QUERY

Query to fetch data.

Import Task

Yes

String

SELECT * FROM …​

CASSANDRA_OUTPUT

Relative path in the data space used to save the results in a CSV file.

Import Task

No

String

e.g. path/to/my/output.csv

LABEL

If the imported data is labeled for machine learning, a label attribute can be specified using this variable.

Import Task

No

String

e.g. class

CASSANDRA_TABLE

Data is stored in tables containing rows of columns, similar to SQL definitions.. It is created if it does not exist.

Export Task

Yes

String

e.g. my_table

CASSANDRA_PRIMARY_KEY

A primary key identifies the location and order of stored data. The primary key is defined when the table is created and cannot be altered.

Export Task

Yes

String

e.g. (my_primary_key) or (attr_1, attr_2)

CASSANDRA_INPUT

Path of the CSV file that contains data to be imported. This variable can:

- A URL. Valid URL schemes include http, ftp, s3, and file.

- A relative path in the data space of a CSV file.

Export Task

Yes

String

e.g. path/to/input.csv

How to use this task:

The task might require (if applicable) the following third-party credentials: CASSANDRA_USERNAME and CASSANDRA_PASSWORD. Please refer to the User documentation to learn how to add third-party credentials.

In the import mode, the output containing the imported data takes one or many of the following forms:

  • a CSV format to saved to:

    • the CASSANDRA_OUTPUT file in the data space if specified by the user

    • and to the result variable to make is previewable in the scheduler portal and to make it accessible for the next tasks.

14.3.3. ElasticSearch

Variables:

Table 20. ElasticSearch Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

ELASTICSEARCH_HOST

Label or the IP address of the ElasticSearch server.

Workflow

Yes

String

localhost

ELASTICSEARCH_PORT

Listening port.

Workflow

No

Integer

9200

ELASTICSEARCH_USER

Username to use for connecting to Elasticsearch server if authentication is required.

Workflow

No

String

ELASTICSEARCH_INDEX

Index to use. In export mode, it is created if it does not exist.

Import Task, Export Task

Yes

String

e.g. my_index

ELASTICSEARCH_QUERY

A query to fetch data. If empty, it will fetch all documents from the index by default.

Import Task

No

String

{ "query": { "match_all": {} } }

ELASTICSEARCH_SIZE

Maximum number of results to return.

Import Task

No

Integer

10

ELASTICSEARCH_DOC_TYPE

The documents type.

Import Task

Yes

String

e.g. my_doc_type

ELASTICSEARCH_INPUT

A JSON Object/Array to be indexed in ElasticSearch. This variable can:

- A String describing the JSON Object/Array

- A relative path in the data space of a JSON file.

Export Task

Yes

String

e.g.

{"document":{"key":"value"}}

or path/to/input.json

How to use this task:

The task might require (if the Elasticsearch server requires authentication) an ELASTICSEARCH_USER in addition to the following third-party credential: {key: elasticsearch://<ELASTICSEARCH_USER>@<ELASTICSEARCH_HOST>:<ELASTICSEARCH_PORT>, value: ELASTICSEARCH_PASSWORD>. Please refer to the User documentation to learn how to add third-party credentials.

In the import mode, the output containing the imported data takes the form of a JSON format to saved to the result variable to make is previewable in the scheduler portal and to make it accessible for the next tasks.

14.4. Cloud

Cloud data connectors allow to interact with cloud storage services. Currently we provide support for Amazon S3, Azure Blob Storage, Azure Data Lake and Snowflake.

14.4.1. Amazon S3

The Amazon S3 connector allows to upload and download data from S3. The connector workflow consists of two tasks:

  • Import_from_S3: Downloads files or folders recursively from S3 to the global space.

  • Export_to_S3: Uploads files or folders recursively from the global space to S3.

Variables:

Table 21. Amazon S3 Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

ACCESS_KEY

S3 access key.

Workflow, Export and Import Tasks

Yes

String

e.g. 12DSS21EFGFR

URL

A valid URL to an existing S3 object that can be a file or a folder.

Import Task

Yes

String

e.g. https://s3.eu-west-2.amazonaws.com/activeeon-public/images/

or https://s3.eu-west-2.amazonaws.com/activeeon-public/images/logo.jpg

BUCKET

S3 Bucket name. If it does not exist, a new bucket is created (if possible) and assigned the specified region REGION.

Export Task

Yes

String

e.g. activeeon-public

REGION

A valid AWS region code that corresponds to the region of the indicated bucket.

Export Task

Yes

String

e.g. eu-west-2, us-east-1

LOCAL_RELATIVE_PATH

Relative path to a folder in the data space to which the downloaded file(s) will be saved in import mode. In export mode, it should point to an existing file/folder that needs to be uploaded.

Import Task, Export Task

Yes

String

e.g. path/to/input/file

or path/to/input/folder/

or path/to/output/folder/

REMOTE_RELATIVE_PATH

Prefix (relative path) used to store the uploaded data in the given bucket under the given path. If empty, the data will be uploaded to the bucket root folder.

Export Task

No

String

e.g. path/to/output/

or path/to/input/file/or/folder

How to use these tasks:

Amazon S3 connector tasks require your AWS credential keys to be set as a third-party credential (key:value pairs) where {key: ACCESS_KEY, value: SECRET_KEY}; this is a one-time action and will ensure that your credentials are securely encrypted. Please refer to the User documentation to learn how to add third-party credentials.

14.4.2. Azure Data Lake

The Azure Data Lake connector allows to upload U-SQL scripts and then execute them as Data Lake Analytics (DLA) jobs. It requires an existing Data Lake Analytics account with its corresponding Data Lake Store account. The connector workflow consists of three tasks:

  • Submit_job: Connects to Azure Data Lake and submits the provided script.

  • Wait_for_job: Periodically monitors the DLA job status until its finalization.

  • Display_result: Downloads the result file and displays it.

Variables:

Table 22. Azure Data Lake Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

AZ_DLA_ACCOUNT

Data Lake Analytics account to be used. It should already exist.

Workflow

Yes

String

e.g. my_dla_account

AZ_DLA_JOB

Name to identify the job to be submitted.

Workflow

Yes

String

e.g. my_dla_job

AZ_DLA_SCRIPT

File name of the U-SQL script to submit. The file must be located in the Global Space directory. An example file script.usql is provided.

Workflow

Yes

String

Sample file: script.usql e.g. my_usql_script.usql

AZ_DLA_OUTPUT

Name of the output file to store the result of the script.

Workflow

Yes

String

e.g. my_output_file.csv

AZ_CRON_MONITOR

Cron expression to determine how frequently to monitor the completion of the job.

Workflow

Yes

String

Default: "* * * * *" (every minute) e.g. "*/2 * * * *" (every 2 minutes)

AZ_SUBSCRIPTION

Optional variable to set an Azure subscription when not using the default one. Value can be a subscription’s name or id.

Workflow

No

String

e.g. Pay-As-You-Go

How to use these tasks:

Azure Data Lake tasks require your Azure login credentials to be set as third-party credentials (key:value pairs); this is a one-time action and will ensure that your credentials are securely encrypted. Please refer to the User documentation to learn how to add third-party credentials.

You have two options for providing your login credentials:

  • Standard Azure login: AZ_U:your_user (usually an email). AZ_P:your_password.

  • Using an Azure service principal: AZ_U:appId. AZ_P:password. AZ_T:tenant. By default, if AZ_T is set, the tasks will attempt to connect through a service principal.

The Output File
  • Instead of hardcoding the name of your output file in your U-SQL script, you can use the placeholder OUTPUT_FILE, which is automatically replaced by the value of AZ_DLA_OUTPUT.

  • Once downloaded, the output file will be stored in your User Space (and not in the Global Space).

  • You can visualize a table-like version of your output data in the Preview tab of the Display_result task.

14.4.3. Azure Blob Storage

The Azure Blob Storage connector allows to upload and download unstructured data (documents, videos, photos, audio files, presentations, web pages …​) from Azure Blob Storage. The connector workflow consists of two tasks:

  • Import_from_Azure_Blob: Downloads blob(s) from Azure Blob Storage to the data space. Note that a virtual folder structure is possible in Azure Blob Storage. In that case, blobs are locally downloaded respecting this structure.

  • Export_to_Azure_Blob: Uploads file(s) or directories recursively from the data space to Azure Blob Storage.

Variables:

Table 23. Azure Blob Storage Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

STORAGE_ACCOUNT

storage account name.

Workflow, Export and Import Tasks

Yes

String

e.g. storageaccountexample

INPUT_PATH

Relative path to a directory in the data space to which the downloaded blob(s) will be saved. INPUT_PATH can contain either a path to a file, a directory terminated by / or an empty value for the root.

Import Task

No

String

e.g. path/to/input/file.file

or path/to/input/folder/

BLOB_NAME

In import mode, it should refer to an existing blob. If it’s empty, the entire container is downloaded. In export mode, this value corresponds either to the name of the blob to be uploaded or to the folder structure in which blob(s) will be uploaded.

Import Task, Export Task

No

String

e.g. doc.txt

or images/logo.jpg

or folder

CONTAINER_NAME

Azure storage container name. For the export mode, if it does not exist, a new container will be created (if possible).

Import Task, Export Task

Yes

String

e.g. azure-storage-container

How to use these tasks:

Azure Blob Storage connector tasks require your azure storage account and account key to be set as a third-party credential (key:value pairs) where {key: STORAGE_ACCOUNT, value: ACCOUNT_KEY}; this is a one-time action and will ensure that your credentials are securely encrypted. Please refer to the User documentation to learn how to add third-party credentials.

14.4.4. Snowflake

The Snowflake data connector allows to import and export structured as well as semi-structured data (JSON, Avro, XML) from and to a Snowflake cloud Data Warehouse. The Snowflake connector workflow is composed of two tasks:

Import_from_Snowflake: Import data from a Snowflake cloud Data Warehouse to be processed in the same workflow or just to be stored as a csv file in the data space.

Export_to_Snowflake: Export data from the data space (global or user space) to a Snowflake cloud Data Warehouse.

Variables:

Table 24. Snowflake Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

USER

Username to use for connecting to the data warehouse.

Workflow, both Tasks

Yes

String

e.g. MY_USERNAME

ACCOUNT

Name of your account (provided by Snowflake).

Workflow, both Tasks

Yes

String

e.g. xy12345.east-us-2.azure

WAREHOUSE

Warehouse name.

Workflow, both Tasks

Yes

Strings

e.g. MY_WAREHOUSE

DATABASE

Database name.

Workflow, both Tasks

Yes

String

e.g. MY_DATABASE

SCHEMA

Name of the default schema to use for the database. After login, you can use USE SCHEMA to change the schema.

Workflow, both Tasks

Yes

String

e.g. PUBLIC

ROLE

Name of the default role to use. After login, you can use USE ROLE to change the role.

Workflow, both Tasks

Yes

String

e.g. SYSYADMIN

PROTOCOL

Protocol used for the connection.

Workflow, both Tasks

Yes

String

Default: https

QUERY

Requires an SQL query or a table name to fetch data from.

Import Task

Yes

String

e.g. SELECT * FROM …​

OUTPUT_FILE

Relative path in the data space used to save the results in a CSV file.

Import Task

No

String

e.g. path/to/my/output.csv

OUTPUT_TYPE

Task result output type (HTML or CSV). If set to HTML, it allows to preview the results in Scheduler Portal in HTML format.

Import Task

No

String

Values: {CSV, HTML}.

Default: CSV

TABLE

The table to insert data into.

Export Task

Yes

String

e.g. MY_TABLE

INPUT_FILE

It can be a relative path in the dataspace of a csv file containing the data to import.

Export Task

Yes

String

e.g. path/to/data.csv

How to use these tasks:

When you drag & drop a Snowflake connector, two tasks will be appended to your workflow: an import task and an export task. Each task can be used as a separate component in an ETL pipeline; thus, you can keep one of them depending on your needs and remove the other or you can use them both.

Each task requires the following third-party credential:

{key: <snowflake>://<<USER>@<<ACCOUNT>:<<WAREHOUSE>, value: SNOWFLAKE_PASSWORD}.

14.5. ERP

ProActive’s ERP connectors enable communication and data exchange with popular ERP software providers.

14.5.1. SAP ECC

This connector allows you to interact with an SAP ECC system (not S/4HANA) by means of Remote Function Calls (RFC).

Variables:

Table 25. SAP ECC Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

JCO_ FUNCTION

The name of a remote-enabled function module on your SAP System. You can verify the available functions in your ECC transaction SE37.

Workflow

Yes

String

e.g. RFC_SYSTEM_INFO

How to use this connector:

In order to securely connect to your SAP ECC system, you first need to store your SAP logon information as ProActive’s third-party credentials. The required credentials are:

Table 26. SAP connection credentials

Credential

Description

Example

JCO_ASHOST

Application server host IP.

10.10.10.1

JCO_SYSNR

System number.

00

JCO_CLIENT

Client number.

100

JCO_USER

SAP logon user name.

myuser

JCO_PASSWD

SAP logon password (it will be hidden).

mypassword

JCO_LANG

Preferred language.

EN

JCO_SAPROUTER

(Optional) An SAP Router string, in case you are connecting from an external network. It is important that you keep the /H/ tags.

/H/192.168.0.1/H/

Once the credentials are set, the workflow can be executed. If the connection is successful, the connection attributes will be listed as output. NB. ProActive’s SAP connector provides libraries to connect to an SAP server from a 64-bit OS (Windows, Linux, MacOS); libraries for other OS can be obtained through SAP Marketplace.

Querying for a function

The SAP connector will search for the function module provided as the JCO_FUNCTION variable. If the function module exists and is remote-enabled, the script will display the function’s parameters (import, export, tables) right after the Connection attributes.

Executing a function

The SAP connector includes an example of how to execute and handle the result of a function using the default RFC_SYSTEM_INFO, which returns a structure containing the System’s information. After the function is executed, the result can be accessed through the method getExportParameterList().

// execute RFC_SYSTEM_INFO function
sapFunction.execute(destination)
// obtain structure RFCSI_EXPORT
JCoStructure structure = sapFunction.getExportParameterList().getStructure("RFCSI_EXPORT")
if (structure != null) {
  println("System info:")
  // loop on structure to get the info
  for(JCoField field : structure) {
    println(field.getName() + " : " + field.getString())
  }
}

For a more detailed guide of ProActive’s SAP ECC connector, please refer to this document.

Further information and examples using the SAP JCO library are available in the SAP Cloud Platform SDK Documentation and the SAP Java Connector documentation, accessible through SAP Service Marketplace.

14.6. Business Intelligence

Proactive’s BI connectors enable the exportation of data to robust analytics and business intelligence servers.

14.6.1. Export_Data_to_Tableau_Server

This connector allows you to publish data to a Tableau Server.

Variables:

Table 27. Tableau_Server Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

DOCKER_ENABLED

True if you want to enable docker fork environment.

Workflow

Yes

Boolean

e.g. True

DOCKER_IMAGE

The name of the docker image that you want to use.

Task

Yes if DOCKER_ENABLED is set as True

String

e.g. activeeon/dlm3

TASK_ENABLED

True if you want to enable the execution of the task.

Task

Yes

Boolean

e.g. True

SERVER_ENDPOINT

The endpoint of the Tableau server.

Task

Yes

String

e.g https://eu-west-1a.online.tableau.com

SITE_ID

The site id defined in the Tableau server.

Task

Yes

String

e.g test_connector

PROJECT_NAME

The name of the project where you will save your data in the Tableau server.

Task

Yes

String

e.g. test

OUTPUT_FILE_NAME

The name of the file that will be created and saved in the Tableau server.

Task

Yes

String

e.g. test.hyper

How to use this connector:

In order to securely connect to your Tableau Server, you first need to store your Tableau login information as ProActive’s third-party credentials. The required credentials are: TABLEAU_SERVER_USERNAME and TABLEAU_SERVER_PASSWORD.

14.7. LDAP Query

The LDAP Query connector enables connection to an existing Microsoft Active Directory server to perform search operations and to check users' credentials and their groups access.

Variables:

Variable Name Description Example

ldapURL

URL the LDAP server

ldap://localhost:389 (unsecured) or ldaps://127.0.0.1:636 (secured)

ldapDnBase

The base distinguished name of the catalog

dc=yourOrganization,dc=com

ldapUsername

The username

cn=admin for global admin or cn=yourName,ou=users for a specific user

ldapSearchBase

base domain name for ldap queries

dc=specificDepartment or can be empty for base DN search

ldapSearchFilter

LDAP search filter

(objectclass=*)

ldapSelectedAttributes

list of attributes to retrieve

attributeName1,attributeName2

If any of these variables must stay private you’ll need to set them as third-party credentials (details are given below).

14.7.1. Authenticate with Third-party credentials

In order to be able to authenticate to the LDAP server you need to define for the right Scheduler user the ldapPassword in the Third-party credentials.

14.7.2. Result of task

The task provides a specific variable for the result, which is called result.

The task result is a String in JSON format.

Output example

Successful task:

{"status":"Ok","attributes":[{"loginShell":"/bin/csh","uid":"yaro","userPassword":"[B@2d6a9952","homeDirectory":"/home/users/yaro","uidNumber":"1000","givenName":"yaro","objectClass":"inetOrgPerson","sn":"yaya","gidNumber":"500","cn":"yaro"}]}

Failed task:

{"status":"Error","errorMessage":"javax.naming.NameNotFoundException: [LDAP: error code 32 - No Such Object]; remaining name 'cn=yaro,dc=activeeon,dc=com1'"}

14.7.3. Secure connection to LDAP server

In order to use secure connection for the LDAP server, specify a correct LDAP over SSL URL (for example ldaps://localhost:636).

In order to be able to connect to a LDAP server, a valid certificate is needed.

When a self-signed certificate is used, the certificate needs to be added to the trusted certificates of the JRE provided with the Scheduler.

Below is an example for adding a certificate to a Java cacerts file:

keytool -import -alias ldap-self-signed -keystore </pathToScheduler/jre/lib/security/cacerts> -file ldaps.crt -storepass changeit

15. Application Connectors

The application-connectors bucket contains diverse generic application connectors for the most frequently used cloud data management systems (ERP, ETL, LDAP, Reporting and Monitoring, and ServiceNow). The aim of an application connector is to facilitate and simplify the integration of a cloud application in your workflows.

15.1. ETL

The ETL connectors allow connecting to data servers and benefit from their features. The following presents the various ETL connectors that PWS handle.

15.1.1. Informatica

The Informatica workflow consists of one task.

Variables:

The Informatica workflow consists of a list of variables described in the following table.

Table 28. Informatica Connector variables

Variable name

Description

Scope

Required?

Type

Default/Examples

ENDPOINT

Server host.

Workflow

Yes

String

e.g. https://xxx.informaticacloud.com/test/test

USERNAME

Username to use for the authentication.

Workflow

No

String

e.g. toto

PASSWORD

Password to use for the authentication.

Workflow

No

String

e.g. toto

RESPONSE_FORMAT

Format of the server response .

Task

Yes

string, json, xml, html, contentView

json

RESPONSE_PATH

…​

Task

Yes

String

.

PARAM_var

Name of the api parameter (in case of multiple parameters, add several parameters fields).

Task

No

String

e.g. PARAM_input

HEADER_header

Name of the api request header (in case of multiple parameters, add several headers fields).

Task

No

String

e.g. HEADER_sessionId

JWT_TOKEN

JWT Token to use for the authentication.

Workflow

No

String

e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9

DEBUG

Allows printing more logs in the workflow results.

Task

Yes

Boolean

False

How to use this task:

This WorkFlow performs a GET request to an Informatica API associated to an application already created by the Informatica user. This workflow template supports the following authentication methods: a basic authentication using a username and a password, an advanced authentication using a JWT access token and finally an anonymous authentication where no authorization is required. The authentication method is determined based on the definition of the parameters when launching the WF.

16. Addons

16.1. Statistics on ProActive Jobs and Resource Usage

This addon generates a gantt chart to visualize executions of jobs and tasks, per node and over the time.

An example of a generated gantt chart

job and tasks statistics

This chart can be generated on-demand, running Gantt_chart workflow from the Analytics bucket in the Catalog.

Please note that to run properly, this workflow requires a third party credential named "SCHEDULER_PASSWORD" with your scheduler password (so the tool can connect to the scheduler and retrieve information to build the chart) See Managing third-party credentials

17. Reference

17.1. Job and task specification

Workflows are written in XML and we provide a XSD grammar to help you write and validate them. The Workflows XSD is available here and you can also find the documentation here.

Setup your IDE to use the Workflow XSD, you will benefit from live validation and completion!

17.2. Studio Tasks

This section describes the available predefined tasks in the Tasks menu of ProActive Studio and the operating systems where they run. These tasks are grouped in three sections:

  1. OS Shells: Shell, Bash, Windows Cmd, PowerShell

  2. Languages: Java, Scalaw, Javascript, Groovy, Ruby, Jython, Python, Perl, PHP, R

  3. Containers: Dockerfile, Docker_Compose, Kubernetes

Please find in the table below a further description of these tasks.

Predefined Task Name Description Operating System Additional Requirements

Shell

Sample Script Task using the generic posix shell script engine.

Linux,Mac

Bash

Sample Script Task using the posix bash script engine.

Linux,Mac

Windows Cmd

Sample Script Task using the windows cmd script engine.

Windows

PowerShell

Sample Script Task using the powershell script engine.

Windows

.NET Framework 3.5 and Poweshell 2.0 Engine must be installed

Java

Sample Java Task.

Linux,Windows,Mac

Scalaw

Sample Script Task using the scalaw script engine.

Linux,Windows,Mac

Javascript

Sample Script Task using the javascript script engine (Nashhorn).

Linux,Windows,Mac

Groovy

Sample Script Task using the groovy script engine.

Linux,Windows,Mac

Ruby

Sample Script Task using the ruby (JRuby) script engine.

Linux,Windows,Mac

Jython

Sample Script Task using the python script engine. This script engine is based on the Jython implementation of Python.

Linux,Windows,Mac

Python

Sample Script Task using the cpython script engine. This script engine is based on native cpython commands.

Linux,Windows,Mac

CPython and py4j module must be installed

Perl

Sample Script Task using the perl script engine.

Linux,Windows,Mac

Perl must be installed

PHP

Sample Script Task using the php script engine.

Linux,Mac,Windows

PHP interpreter must be installed

R

Sample Script Task using the R script engine.

Linux,Windows,Mac

R must be installed and the rJava library

Dockerfile

Sample Script Task using the dockerfile script engine.

Linux,Windows,Mac

Docker must be installed

Docker_Compose

Sample Script Task using the docker-compose script engine.

Linux,Windows,Mac

Docker Compose and Docker must be installed

Kubernetes

Sample Script Task using the kubernetes script engine.

Linux

Kubernetes CLI must be installed

17.3. Script Bindings Reference

This section describes useful bindings and variables, their scope, and their usage.

The following script names refer to:

  • selection: a script executed to select nodes suitable to execute a task. This script is executed on many nodes from the scheduling loop, before the task is deployed.

  • environment: the default mode to execute a task is to fork a dedicated JVM from the selected ProActive Node. The environment script, if defined in the workflow, is executed on the node to let the user define which environment parameters will be given to the forked JVM.

  • pre: a script which is executed before the task execution, in the same context (for example in the forked JVM).

  • task: the main script of a task when the task is a script task.

  • post: a script which is executed after the task execution, in the same context. The post script is executed only if there was no error inside the task.

  • flow: a script which is executed after the task execution, in the same context, and when the task is part of a control flow.

  • clean: a script which is executed after the task execution, directly on the node selected to execute the task (so not inside the forked JVM), whether there was a task error or not. The clean script is mainly used to clean the task host. If there was a node failure (the node is lost during the task execution), then the clean script is not executed and the task will be executed on another node.

The order in this list corresponds to the task life-cycle order. all scripts refers to all above scripts

Variable name and description Use from a script Use from a native task Use from the workflow Available in Not Available in

Task result. Variable set by the user to return the result of a task. See Result variables.

result = "value";

The result will be the exit code.

-

task

-

Task arguments. Arguments of a script defined in the workflow. Available via all kind of scripts, when referencing an external file or directly as XML elements.

…​ = args[0];

Passed to native executable. Can also be used with $args_0

-

all scripts, as external file, or XML element

-

Results of previous tasks. An array containing the results of parent tasks. See Result variables.

…​ = results[0];

$results_0

-

task

-

Branch. Either "if" or "else". See branch.

branch = "if";

-

-

flow (if)

bash, cmd, perl, php, vbscript

Parallel runs. Number of replications. See replication.

runs = 3;

-

-

flow (replicate)

bash, cmd, perl, php, vbscript

Loop variable. A boolean variable used in a flow script to decide if the loop continues or not. See loop.

loop = false;

-

-

flow (replicate) (loop)

bash, cmd, perl, php, vbscript

Task progress. Represents the progress of the task. Can be set to a value between 0 and 100.

import org.ow2.proactive.scripting.helper.progress.ProgressFile; ProgressFile.setProgress(variables.get("PA_TASK_PROGRESS_FILE"), 50);

echo "50" > $variables_PA_TASK_PROGRESS_FILE

-

task

-

Workflow variables. Job variables defined in the workflow. See Workflow variables.

variables.get("key")

$variables_key

${key}

all scripts

-

Generic information. Generic information defined in the workflow. See Generic Information.

genericInformation.get("key")

$genericInformation_key

-

all scripts

-

Job ID. ID of the current job.

variables.get( "PA_JOB_ID" )

$variables_PA_JOB_ID

${PA_JOB_ID}

all scripts

-

Job name. Name of the current job.

variables.get( "PA_JOB_NAME" )

$variables_PA_JOB_NAME

${PA_JOB_NAME}

all scripts

-

Task ID. Id of the task.

variables.get( "PA_TASK_ID" )

$variables_PA_TASK_ID

${PA_TASK_ID}

all scripts

-

Task name. Name of the task.

variables.get( "PA_TASK_NAME" )

$variables_PA_TASK_NAME

${PA_TASK_NAME}

all scripts

-

User. The name of user who submitted the job.

variables.get( "PA_USER" )

$variables_PA_USER

${PA_USER}

all scripts

-

Scheduler Home. Installation directory of the scheduler on the ProActive node.

variables.get( "PA_SCHEDULER_HOME" )

$variables_PA_SCHEDULER_HOME

-

environment, pre, task, post, flow, clean

-

Task iteration index. Index of iteration for this task inside a loop control. See Task executable.

variables.get( "PA_TASK_ITERATION" )

$variables_PA_TASK_ITERATION

${PA_TASK_ITERATION}

all scripts

-

Task replication index. Index of replication for this task inside a replicate control. See Task executable.

variables.get( "PA_TASK_REPLICATION" )

$variables_PA_TASK_REPLICATION

${PA_TASK_REPLICATION}

all scripts

-

Task Success. A variable accessible from a clean script, a flow script or a direct child task. It is a boolean value which is true if the task succeeded and false if the task failed.

variables.get( "PA_TASK_SUCCESS" )

$variables_PA_TASK_SUCCESS

-

clean, flow

-

Exit Value. A variable accessible from a clean script, a flow script or a direct child task. It is an integer value which contains the exit value of the task, if and only if the task contains a bash, shell, cmd, or vb script. If the task contains several of these scripts, the one executed last will prevail.

variables.get( "EXIT_VALUE" )

$variables_EXIT_VALUE

-

clean, flow

-

Node URL. The URL of the node which executes the current task.

variables.get( "PA_NODE_URL" )

$variables_PA_NODE_URL

${PA_NODE_URL}

environment, pre, task, post, flow, clean

-

Node URL (Selection Script). The URL of the node which executes the selection script. See Selection of ProActive Nodes.

selected = (nodeurl == "pnp://mymachine:14200/Node1")

-

-

selection

-

Node Name. The name of the node which executes the current task.

variables.get( "PA_NODE_NAME" )

$variables_PA_NODE_NAME

${PA_NODE_NAME}

environment, pre, task, post, flow, clean

-

Node Name (Selection Script). The name of the node which executes the selection script. See Selection of ProActive Nodes.

selected = (nodename == "Node1")

-

-

selection

-

Node Host. The hostname of the node which executes the current task.

variables.get( "PA_NODE_HOST" )

$variables_PA_NODE_HOST

${PA_NODE_HOST}

environment, pre, task, post, flow, clean

-

Node Source. The nodesourcename of the node which executes the current task.

variables.get( "PA_NODE_SOURCE" )

$variables_PA_NODE_SOURCE

${PA_NODE_SOURCE}

environment, pre, task, post, flow, clean

-

Node Host (Selection Script). The hostname of the node which executes the selection script. See Selection of ProActive Nodes.

selected = (nodehost == "mymachine")

-

-

selection

-

Scheduler REST URL. The REST url of the scheduler server. The public REST url is also available (if the setting pa.scheduler.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini)

variables.get( "PA_SCHEDULER_REST_URL" ) or variables.get( "PA_SCHEDULER_REST_PUBLIC_URL" )

$variables_PA_SCHEDULER_REST_URL or $variables_PA_SCHEDULER_REST_PUBLIC_URL

${PA_SCHEDULER_REST_URL} or ${PA_SCHEDULER_REST_PUBLIC_URL}

all scripts

-

Catalog REST URL. The REST url of the catalog service. The public REST url is also available (if the setting pa.catalog.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini)

variables.get( "PA_CATALOG_REST_URL" ) or variables.get( "PA_CATALOG_REST_PUBLIC_URL" )

$variables_PA_CATALOG_REST_URL or $variables_PA_CATALOG_REST_PUBLIC_URL

${PA_CATALOG_REST_URL} or ${PA_CATALOG_REST_PUBLIC_URL}

all scripts

-

Service Automation REST URL. The REST url of Service Automation. The public REST url is also available (if the setting pa.cloud-automation.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini)

variables.get( "PA_CLOUD_AUTOMATION_REST_URL" ) or variables.get( "PA_CLOUD_AUTOMATION_REST_PUBLIC_URL" )

$variables_PA_CLOUD_AUTOMATION_REST_URL or $variables_PA_CLOUD_AUTOMATION_REST_PUBLIC_URL

${PA_CLOUD_AUTOMATION_REST_URL} or ${PA_CLOUD_AUTOMATION_REST_PUBLIC_URL}

all scripts

-

Job Planner REST URL. The REST url of the job-planner service. The public REST url is also available (if the setting pa.job-planner.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini)

variables.get( "PA_JOB_PLANNER_REST_URL" ) or variables.get( "PA_JOB_PLANNER_REST_PUBLIC_URL" )

$variables_PA_JOB_PLANNER_REST_URL or $variables_PA_JOB_PLANNER_REST_PUBLIC_URL

${PA_JOB_PLANNER_REST_URL} or ${PA_JOB_PLANNER_REST_PUBLIC_URL}

all scripts

-

Notification Service REST URL. The REST url of the notification service. The public REST url is also available (if the setting pa.notification-service.rest.public.url is defined in PROACTIVE_HOME/config/scheduler/settings.ini)

variables.get( "PA_NOTIFICATION_SERVICE_REST_URL" ) or variables.get( "PA_NOTIFICATION_SERVICE_REST_PUBLIC_URL" )

$variables_PA_NOTIFICATION_SERVICE_REST_URL or $variables_PA_NOTIFICATION_SERVICE_REST_PUBLIC_URL

${PA_NOTIFICATION_SERVICE_REST_URL} or ${PA_NOTIFICATION_SERVICE_REST_PUBLIC_URL}

all scripts

-

Third party credentials. Credentials stored on the server for this user account. See Managing third-party credentials

credentials.get( "pw" )

$credentials_pw

$credentials_pw (only in the task arguments)

environment, pre, task, post, clean, flow

-

SSH private key. Private SSH Key used at login. See Run Computation with a user’s system account.

credentials.get( "SSH_PRIVATE_KEY" )

$credentials_SSH_PRIVATE_KEY

-

environment, pre, task, post, clean, flow

-

Number of nodes. Number of nodes used by this task. See Example: MPI application.

nodesurl.size()

$variables_PA_NODESNUMBER

-

environment, pre, task, post, flow

-

Url of nodes. List of URL of nodes. See Example: MPI application.

nodesurl.get(0)

$variables_PA_NODESFILE

-

environment, pre, task, post, flow

-

User space. Location of the user space. See Data Spaces.

println userspace

$userspace

$userspace (only in tasks arguments)

environment, pre, task, post, flow

-

Global space. Location of the global space. See Data Spaces.

println globalspace

$globalspace

$globalspace (only in tasks arguments)

environment, pre, task, post, flow

-

Input space. Location of the input space. See Data Spaces.

println inputspace

$inputspace

-

environment, pre, task, post, flow

-

Local space. Location of the local space. See Data Spaces.

println localspace

$localspace

$localspace (only in tasks arguments)

environment, pre, task, post, flow

-

Cache space. Location of the cache space. See Data Spaces.

println cachespace

$cachespace

$cachespace (only in tasks arguments)

environment, pre, task, post, flow

-

Output space. Location of the output space. See Data Spaces.

println outputspace

$outputspace

-

environment, pre, task, post, flow

-

Selection. Variable which must be set to select the node. See Selection of ProActive Nodes.

selected = true

-

-

selection

bash, cmd, perl, php, vbscript

Fork Environment. Fork Environment variable is a ForkEnvironment java object allowing a script to set various initialization parameters of the forked JVM. See Fork Environment

forkEnvironment.setJavaHome( "/usr/java/default" )

-

-

environment

bash, cmd, perl, php, R, powershell, vbscript

Scheduler API. Scheduler API variable is a SchedulerNodeClient java object which can connect to the ProActive Scheduler frontend and interact directly with its API.

schedulerapi.connect()

-

-

environment, pre, task, post, clean, flow

bash, cmd, perl, php, R, powershell, vbscript

Resource Manager API. Resource Manager API variable which can connect to the ProActive RM frontend and interact directly with its API.

rmapi.connect()

-

-

environment, pre, task, post, clean, flow

bash, cmd, perl, php, R, powershell, vbscript

UserSpace API. UserSpace API variable is a DataSpaceNodeClient java object which can connect to the User Space and interact directly with its API.

userspaceapi.connect()

-

-

environment, pre, task, post, clean, flow

bash, cmd, perl, php, R, powershell, vbscript

GlobalSpace API. GlobalSpace API variable is a DataSpaceNodeClient java object which can connect to the Global Space and interact directly with its API.

globalspaceapi.connect()

-

-

environment, pre, task, post, clean, flow

bash, cmd, perl, php, R, powershell, vbscript

Synchronization API. Synchronization API variable is a ../javadoc/org/ow2/proactive/scheduler/synchronization/Synchronization.html[Synchronization java object] which can connect to the Synchronization Service and interact directly with its API.

synchronizationapi.createChannel("channel1", false)

-

-

all scripts

bash, cmd, perl, php, R, powershell, vbscript

Signal API. Signal API variable is a ../javadoc/org/ow2/proactive/scheduler/signal/SignalApi.html[SignalApi java object] that allows to manage job signals using its underlying API.

signalapi.sendSignal("stop")

-

-

all scripts

bash, cmd, perl, php, R, powershell, vbscript

17.3.1. Variables maps

The syntax for accessing maps (like variables, credentials or genericInformation) is language dependent.

For Groovy:

print variables.get("key")

For Python/Jython:

print variables["key"]

For Ruby:

puts $variables["key"]

For R:

print(variables[["key"]])

For Bash (using environment variables):

echo $variables_key

For Cmd (using environment variables):

echo %variables_key%

For VBScript (using process environment variables):

Set wshShell = CreateObject( "WScript.Shell" )
Set wshProcessEnv = wshShell.Environment( "PROCESS" )
Wscript.Echo "Hello World from Job " & wshProcessEnv( "variables_key" )

For PowerShell:

Write-Output $variables.Get_Item('key')

For PHP:

<?php
    echo "<p>Value of variable: ".getenv("variables_key")."</b></p>";
?>

17.3.2. Script results

The last statement of a script corresponds to the script result. The result can also be explicitly set with a manual affectation to a result variable.

Different kind of scripts (selection, flow, etc) will need to affect different kind of variable as results (for example selected, branch, runs, etc).

Example for Groovy selection scripts:

selected = java.net.InetAddress.getLocalHost().getHostName() == "mymachine"

It is important to note that the result of a script will be converted to Java, and that some internal language types are not automatically convertible. If the task displays an error due to the result conversion, several approaches can be used:

  1. the script can manually convert the internal type to a more primitive type.

  2. the result can instead be stored in a file and transferred as an output file.

Results of parent tasks are stored in the results variable. Like the variables map, accessing this results variable is language-dependant.

For ruby, python, jython or groovy script languages, the parent tasks results (results variable) contains a list of TaskResult java object. In order to access the result real value, the value() method of this object must be called:

Example for Python/Jython:

print results[0].value()

Other languages such as R or PowerShell can access the results directly

Example for R:

print(results[[0]])

More information about various script engines particularities and syntax is available in the Script Languages chapter.

17.4. Topology Types

ProActive Scheduler supports the following nodes topologies for multi-nodes tasks.

  • Arbitrary topology does not imply any restrictions on nodes location

<parallel numberOfNodes="4">
    <topology>
        <arbitrary/>
    </topology>
</parallel>
  • Best proximity - the set of closest nodes among those which are not executing other tasks.

<parallel numberOfNodes="4">
    <topology>
        <bestProximity/>
    </topology>
</parallel>
  • Threshold proximity - the set of nodes within a threshold proximity (in microseconds).

<parallel numberOfNodes="4">
    <topology>
        <thresholdProximity threshold="100"/>
    </topology>
</parallel>
  • Single Host - the set of nodes on a single host.

<parallel numberOfNodes="4">
    <topology>
        <singleHost/>
    </topology>
</parallel>
  • Single Host Exclusive - the set of nodes of a single host exclusively. The host with selected nodes will be reserved for the user.

<parallel numberOfNodes="4">
    <topology>
        <singleHostExclusive/>
    </topology>
</parallel>

For this task the scheduler will try to find a host with 4 nodes. If there is no such a host another one will be used with a bigger capacity (if exists). In this case extra nodes on this host will be also occupied by the task but will not be used for the execution.

  • Multiple Hosts Exclusive - the set of nodes filled in multiple hosts. Hosts with selected nodes will be reserved for the user.

<parallel numberOfNodes="4">
    <topology>
        <multipleHostsExclusive/>
    </topology>
</parallel>

For this task the scheduler will try to find 4 nodes on a set of hosts. If there is no such a set which gives you exactly 4 nodes another one will be used with a bigger capacity (if exists). In this case extra nodes on this host will be also occupied by the task but will not be used for the execution.

  • Different Hosts Exclusive - the set of nodes each from a different host. All hosts with selected nodes will be reserved for the user.

<parallel numberOfNodes="4">
    <topology>
        <differentHostsExclusive/>
    </topology>
</parallel>

For this task the scheduler will try to find 4 hosts with one node on each. If number of "single node" hosts are not enough hosts with bigger capacity will be selected. In this case extra nodes on this host will be also occupied by the task but will not be used for the execution.

17.5. Scheduler GraphQL API

17.5.1. Scheduler GraphQL API portal

ProActive Scheduler GraphQL API portal can be accessed in your environment using the following link:

Scheduler GraphQL API

In general, the GraphiQL interface can be opened at http://your-server:8080/scheduling-api/v1/graphiql or https://your-server:8443/scheduling-api/v1/graphiql when the scheduler server is configured in https mode.

In order to use it, you must be first connected to one of the ProActive portals. If you are not currently connected or if your session has expired, you will receive a MissingSessionIdException when trying to run queries.

This portal allows to run GraphQL queries against the scheduler database, to find jobs according to various filters. Queries are read-only, you cannot use this portal to perform stateful operations (e.g job submission, job removal, etc).

The GraphiQL interface is displayed as the following:

SchedGraphqlMain

You can explore the Docs section to see detailed information about all GraphQL nodes, filters and attributes:

SchedGraphqlDocs
GraphQL Basic Usage

The left panel allows to write queries. You can start by entering the query { jobs }, this query will be automatically expanded to match the correct syntax:

{jobs {
  edges {
    node {
      id
    }
  }
}}

Executing this query will return a json result such as:

{
  "data": {
    "jobs": {
      "edges": [
        {
          "node": {
            "id": "1"
          }
        },
        {
          "node": {
            "id": "2"
          }
        },
        {
          "node": {
            "id": "3"
          }
        }
      ]
    }
  }
}

In this basic example, only the job ids are displayed, it is of course possible to display more information:

{
  jobs {
    edges {
      node {
        id
        name
        owner
        submittedTime
        status
      }
    }
  }
}

Will return something similar to:

{
  "data": {
    "jobs": {
      "edges": [
        {
          "node": {
            "id": "1",
            "name": "script_groovy",
            "owner": "admin",
            "submittedTime": 1589799002016,
            "status": "FINISHED"
          }
        },
        {
          "node": {
            "id": "2",
            "name": "script_groovy",
            "owner": "admin",
            "submittedTime": 1589799356783,
            "status": "FINISHED"
          }
        },
        {
          "node": {
            "id": "3",
            "name": "Script_Cmd",
            "owner": "admin",
            "submittedTime": 1589810586677,
            "status": "FINISHED"
          }
        }
      ]
    }
  }
}
GraphQL Pagination

GraphQL results are paginated, a request can use one of the following attributes to control pagination:

  • last(integer) : displays the specified last number of results.

  • first(integer) : displays the specified first number of results.

  • before (cursor) : displays results before the specified cursor

  • after (cursor) : displays results after the specified cursor

The following requests uses last to display the 5 most recent jobs, and displays all pages information.

{
  jobs(last: 5) {
    edges {
      node {
        id
      }
      cursor
    }
    pageInfo {
      hasNextPage
      hasPreviousPage
      startCursor
      endCursor
    }
  }
}

It returns something similar to:

{
  "data": {
    "jobs": {
      "edges": [
        {
          "node": {
            "id": "11"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxMQ=="
        },
        {
          "node": {
            "id": "12"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxMg=="
        },
        {
          "node": {
            "id": "13"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxMw=="
        },
        {
          "node": {
            "id": "14"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxNA=="
        },
        {
          "node": {
            "id": "15"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxNQ=="
        }
      ],
      "pageInfo": {
        "hasNextPage": false,
        "hasPreviousPage": true,
        "startCursor": "Z3JhcGhxbC1jdXJzb3IxMQ==",
        "endCursor": "Z3JhcGhxbC1jdXJzb3IxNQ=="
      }
    }
  }
}

To display the previous page, the following request is used:

{
  jobs(last: 5, before: "Z3JhcGhxbC1jdXJzb3IxMQ==") {
    edges {
      node {
        id
      }
      cursor
    }
    pageInfo {
      hasNextPage
      hasPreviousPage
      startCursor
      endCursor
    }
  }
}

Which returns a result such as:

{
  "data": {
    "jobs": {
      "edges": [
        {
          "node": {
            "id": "6"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3I2"
        },
        {
          "node": {
            "id": "7"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3I3"
        },
        {
          "node": {
            "id": "8"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3I4"
        },
        {
          "node": {
            "id": "9"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3I5"
        },
        {
          "node": {
            "id": "10"
          },
          "cursor": "Z3JhcGhxbC1jdXJzb3IxMA=="
        }
      ],
      "pageInfo": {
        "hasNextPage": true,
        "hasPreviousPage": true,
        "startCursor": "Z3JhcGhxbC1jdXJzb3I2",
        "endCursor": "Z3JhcGhxbC1jdXJzb3IxMA=="
      }
    }
  }
}
GraphQL Filters

GraphQL API allows to select jobs according to various filters. The following example filters jobs according to a given name:

{
  jobs(filter: {name: "*Python"}) {
    edges {
      node {
        id name
      }
    }
  }
}

The wildcard "*Python" selects all jobs ending with "Python". The special character * can be used at the beginning or at the end of the expression (or both). It cannot be used in the middle.

Here are the results of this GraphQL query:

{
  "data": {
    "jobs": {
      "edges": [
        {
          "node": {
            "id": "8",
            "name": "Script_Python"
          }
        },
        {
          "node": {
            "id": "9",
            "name": "Script_Python"
          }
        },
        {
          "node": {
            "id": "10",
            "name": "Script_Python"
          }
        },
        {
          "node": {
            "id": "11",
            "name": "Script_Python"
          }
        }
      ]
    }
  }
}

The complete list of filters is available in the GraphQL Docs section.

17.5.2. Scheduler GraphQL REST API

After testing queries in the GraphQL Portal, you can use the scheduling-api REST interface to run these queries from any REST client.

Few aspects are to be considered when running graphql queries from a REST client:

  • The url of the GraphQL REST API is different: http://your-server:8080/scheduling-api/v1/graphql (graphql instead of graphiql).

  • The REST request method must be POST and its Content-Type be application/json.

  • It requires in the header section, a sessionid retrieved from a login to the ProActive scheduler REST API.

  • The query itself is given as a string value of the "query" attribute inside the json body content. This value must be escaped to be a valid json. You can use, for example, online tools such as https://codebeautify.org/json-escape-unescape to escape the query as a json string.

Here is an example of executing a GraphQL REST query using curl.

  • First, we acquire a sessionid by login to the scheduler REST API:

    sessionid=$(curl -d "username=mylogin&password=mypassword" http://try.activeeon.com:8080/rest/scheduler/login)
  • Second, we use a json escape tool to produce the query parameter.

    {
      jobs {
        edges {
          node {
            id
            name
          }
        }
      }
    }

    Converts into: {\r\n jobs {\r\n edges {\r\n node {\r\n id\r\n name\r\n }\r\n }\r\n }\r\n}

  • Third, we write the json body.

    {"query":"{\r\n  jobs {\r\n    edges {\r\n      node {\r\n        id\r\n        name\r\n      }\r\n    }\r\n  }\r\n}"}
  • Finally, we execute the query using curl:

     curl -H "sessionid:$sessionid" -H "Content-Type:application/json" --request POST --data '{"query":"{\r\n  jobs {\r\n    edges {\r\n      node {\r\n        id\r\n        name\r\n      }\r\n    }\r\n  }\r\n}"}' http://try.activeeon.com:8080/scheduling-api/v1/graphql
  • We receive the following json answer to our request:

    {"data":{"jobs":{"edges":[{"node":{"id":"14","name":"script_groovy"}},{"node":{"id":"66","name":"Script_Python"}},{"node":{"id":"67","name":"Script_Python"}},{"node":{"id":"68","name":"Script_Python"}},{"node":{"id":"69","name":"Script_Python"}},{"node":{"id":"70","name":"Script_Python"}},{"node":{"id":"71","name":"script_groovy"}},{"node":{"id":"72","name":"script_groovy"}},{"node":{"id":"73","name":"script_groovy"}},{"node":{"id":"74","name":"script_groovy"}},{"node":{"id":"75","name":"script_groovy"}}]}}}

17.6. Command Line

The ProActive client allows to interact with the Scheduler and Resource Manager. The client has an interactive mode started if you do not provide any command.

The client usage is also available using the -h parameter as shown below:

$ PROACTIVE_HOME/bin/proactive-client -h

17.6.1. Command Line Examples

Deploy ProActive Nodes
In non-interactive mode
$ PROACTIVE_HOME/bin/proactive-client -cn 'moreLocalNodes' -infrastructure 'org.ow2.proactive.resourcemanager.nodesource.infrastructure.LocalInfrastructure' './config/authentication/rm.cred' 4 60000 '' -policy org.ow2.proactive.resourcemanager.nodesource.policy.StaticPolicy 'ALL' 'ALL'
In interactive mode
$ PROACTIVE_HOME/bin/proactive-client
> createns( 'moreLocalNodes', ['org.ow2.proactive.resourcemanager.nodesource.infrastructure.LocalInfrastructure', './config/authentication/rm.cred', 4, 60000, ''], ['org.ow2.proactive.resourcemanager.nodesource.policy.StaticPolicy', 'ALL', 'ALL'])
Install ProActive packages

To install a ProActive package, you can use the ProActive CLI by providing a path to your package. It can be a local directory, a ZIP file or can be a URL to a web directory, a direct-download ZIP file, a GitHub repository or ZIP or a directory inside a GitHub repository. Please note that URL forwarding is supported.

In non-interactive mode
  • To install a package located in a local directory or ZIP

$ PROACTIVE_HOME/bin/proactive-client -pkg /Path/To/Local/Package/Directory/Or/Zip
  • To install a package located in a web folder (Supports only Apache Tomcat directory listing)

$ PROACTIVE_HOME/bin/proactive-client -pkg http://example.com/installablePackageDirectory/
  • To install a package with a direct download ZIP URL:

$ PROACTIVE_HOME/bin/proactive-client -pkg https://s3.eu-west-2.amazonaws.com/activeeon-public/proactive-packages/package-example.zip
  • To install a package located in a GitHub repository (either in the repository root or in a sub-folder within a repository):

$ PROACTIVE_HOME/bin/proactive-client -pkg https://github.com/ow2-proactive/hub-packages/tree/master/package-example
In interactive mode
  • To install a package located in any of the aforementioned possible locations

$ PROACTIVE_HOME/bin/proactive-client
> installpackage(PackagePathOrURL)

Activeeon SAS, © 2007-2019. All Rights Reserved.

For more information, please contact contact@activeeon.com.