ProActive Workflows & Scheduling (PWS)

1. Introduction

The Job Planner schedules Workflows at specific points in time which are defined by Calendars.

A Workflow is scheduled by the Job Planner if it is associated to a Calendar Definition. A Calendar Definition can have several Workflows associated to it. Each of them can easily be activated and deactivated. The Workflows variables can be set with specific values when associating them to a calendar.

1.1. Glossary

The following terms are used throughout the documentation:

ProActive Workflows & Scheduling

The full distribution of ProActive for Workflows & Scheduling, it contains the ProActive Scheduler server, the REST & Web interfaces, the command line tools. It is the commercial product name.

ProActive Scheduler

Can refer to any of the following:

  • A complete set of ProActive components.

  • An archive that contains a released version of ProActive components, for example activeeon_enterprise-pca_server-OS-ARCH-VERSION.zip.

  • A set of server-side ProActive components installed and running on a Server Host.

Resource Manager

ProActive component that manages ProActive Nodes running on Compute Hosts.

Scheduler

ProActive component that accepts Jobs from users, orders the constituent Tasks according to priority and resource availability, and eventually executes them on the resources (ProActive Nodes) provided by the Resource Manager.

Please note the difference between Scheduler and ProActive Scheduler.
REST API

ProActive component that provides RESTful API for the Resource Manager, the Scheduler and the Catalog.

Resource Manager Web Interface

ProActive component that provides a web interface to the Resource Manager. Also called Resource Manager Portal.

Scheduler Web Interface

ProActive component that provides a web interface to the Scheduler. Also called Scheduler Portal.

Workflow Studio

ProActive component that provides a web interface for designing Workflows.

Catalog

ProActive component that provides storage and versioning of Workflows and other ProActive Objects through a REST API. It is also possible to query the Catalog for specific Workflows.

Job Planner

A ProActive component providing advanced scheduling options for Workflows.

Job Planner Portal

A ProActive portal to manage the Job Planner service.

Bucket

ProActive notion used with the Catalog to refer to a specific collection of ProActive Objects and in particular ProActive Workflows.

Server Host

The machine on which ProActive Scheduler is installed.

SCHEDULER_ADDRESS

The IP address of the Server Host.

ProActive Node

One ProActive Node can execute one Task at a time. This concept is often tied to the number of cores available on a Compute Host. We assume a task consumes one core (more is possible, see multi-nodes tasks, so on a 4 cores machines you might want to run 4 ProActive Nodes. One (by default) or more ProActive Nodes can be executed in a Java process on the Compute Hosts and will communicate with the ProActive Scheduler to execute tasks. We distinguish two types of ProActive Nodes:

  • Server ProActive Nodes: Nodes that are running in the same host as ProActive server;

  • Remote ProActive Nodes: Nodes that are running on machines other than ProActive Server.

Compute Host

Any machine which is meant to provide computational resources to be managed by the ProActive Scheduler. One or more ProActive Nodes need to be running on the machine for it to be managed by the ProActive Scheduler.

Examples of Compute Hosts:

Node Source

A set of ProActive Nodes deployed using the same deployment mechanism and sharing the same access policy.

Node Source Infrastructure

The configuration attached to a Node Source which defines the deployment mechanism used to deploy ProActive Nodes.

Node Source Policy

The configuration attached to a Node Source which defines the ProActive Nodes acquisition and access policies.

Scheduling Policy

The policy used by the ProActive Scheduler to determine how Jobs and Tasks are scheduled.

PROACTIVE_HOME

The path to the extracted archive of ProActive Scheduler release, either on the Server Host or on a Compute Host.

Workflow

User-defined representation of a distributed computation. Consists of the definitions of one or more Tasks and their dependencies.

Workflow Revision

ProActive concept that reflects the changes made on a Workflow during it development. Generally speaking, the term Workflow is used to refer to the latest version of a Workflow Revision.

Generic Information

Are additional information which are attached to Workflows or Tasks. See generic information.

Calendar Definition

Is a json object attached by adding it to the Generic Information of a Workflow.

Job

An instance of a Workflow submitted to the ProActive Scheduler. Sometimes also used as a synonym for Workflow.

Job Id

An integer identifier which uniquely represents a Job inside the ProActive Scheduler.

Job Icon

An icon representing the Job and displayed in portals. The Job Icon is defined by the Generic Information workflow.icon.

Task

A unit of computation handled by ProActive Scheduler. Both Workflows and Jobs are made of Tasks. A Task must define a ProActive Task Executable and can also define additional task scripts

Task Id

An integer identifier which uniquely represents a Task inside a Job ProActive Scheduler. Task ids are only unique inside a given Job.

Task Executable

The main executable definition of a ProActive Task. A Task Executable can either be a Script Task, a Java Task or a Native Task.

Script Task

A Task Executable defined as a script execution.

Java Task

A Task Executable defined as a Java class execution.

Native Task

A Task Executable defined as a native command execution.

Additional Task Scripts

A collection of scripts part of a ProActive Task definition which can be used in complement to the main Task Executable. Additional Task scripts can either be Selection Script, Fork Environment Script, Pre Script, Post Script, Control Flow Script or Cleaning Script

Selection Script

A script part of a ProActive Task definition and used to select a specific ProActive Node to execute a ProActive Task.

Fork Environment Script

A script part of a ProActive Task definition and run on the ProActive Node selected to execute the Task. Fork Environment script is used to configure the forked Java Virtual Machine process which executes the task.

Pre Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, before the Task Executable.

Post Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, after the Task Executable.

Control Flow Script

A script part of a ProActive Task definition and run inside the forked Java Virtual Machine, after the Task Executable, to determine control flow actions.

Control Flow Action

A dynamic workflow action performed after the execution of a ProActive Task. Possible control flow actions are Branch, Loop or Replicate.

Branch

A dynamic workflow action performed after the execution of a ProActive Task similar to an IF/THEN/ELSE structure.

Loop

A dynamic workflow action performed after the execution of a ProActive Task similar to a FOR structure.

Replicate

A dynamic workflow action performed after the execution of a ProActive Task similar to a PARALLEL FOR structure.

Cleaning Script

A script part of a ProActive Task definition and run after the Task Executable and before releasing the ProActive Node to the Resource Manager.

Script Bindings

Named objects which can be used inside a Script Task or inside Additional Task Scripts and which are automatically defined by the ProActive Scheduler. The type of each script binding depends on the script language used.

Task Icon

An icon representing the Task and displayed in the Studio portal. The Task Icon is defined by the Task Generic Information task.icon.

ProActive Agent

A daemon installed on a Compute Host that starts and stops ProActive Nodes according to a schedule, restarts ProActive Nodes in case of failure and enforces resource limits for the Tasks.

2. Creation of a Planning

2.1. Creating Calendar Definitions

To create new calendar definition, use the Job Planner Portal, Calendar Definition page. When clicking the "+" button, a calendar definition is created with default parameters.

calendar definition

Click on a Calendar Definition to modify it. Its fields will be visible on the main panel ("Calendar Definition"). The name of the Calendar Definition must be unique in a bucket. The cron expression can be set using the widget, or manually with the "Advanced" tab. The resulting cron expression is shown under the widget.

Changes made in the Calendar Definition panel will be automatically saved. When a calendar definition is selected, it can be removed by clicking on the trash button.

2.2. Associating workflows to Calendar Definitions

Once a workflow has been published to the Catalog, it can be planned with the Job Planner Portal, Calendar Association page. To plan a workflow, you need to associate it to a Calendar Definition.

calendar wf association

On this page, select an already existing Calendar Definition and add/remove workflows associated to it. When creating an association, it is possible to set the workflows variables with specific values, that can be different from the default values stored in the Catalog for that workflow.

calendar wf association 2

The created association can be deactivated by clicking on the unlink button. This means that the workflows won’t be submitted for the next calendar occurrences, but it will still be visible on the Execution Planning and the Gantt chart. You can reactivate it with the link button. On the left panel, you can easily see the number of associated workflows to each calendar, and how many of them are activated.

This page can also be opened when clicking on the "plan" button on the Studio or the Scheduler Portal.

3. Execution Planning visualisation and GANTT

On the Job Planner Portal, Execution Planning page, you can see the recurrence of Calendar Definitions . You can select one or several Calendar Definitions on the left panel and associated workflows will appear on the right panel. There are 4 ways to visualize planning of the selected Calendar Definitions:

  • by workflow

  • Chronologically

  • On a standard Calendar

  • with a GANTT chart

3.1. Visualization by workflow, Chronologically or on a standard Calendar

There are 3 different tabs for the planning visualization:

  • "Sort by workflow" tab: For each workflow, a list of execution time is displayed. The combo box allows you to choose the period of the execution planning. If several workflows are associated to one Calendar Definition, they will be grouped. If a workflow is associated to several Calendar Definitions, it will appear several times. The name of the Calendar Definition is always displayed next to the workflow name, in order to know which Calendar Definition is related to the given executions.

  • "Sort chronologically" tab: All executions are displayed chronologically in a list, with for each execution, the workflow that will be executed and the corresponding Calendar Definition. The combo box allows you to choose the period of the execution planning.

  • "Calendar" tab: You can see the execution planning of the selected Calendar Definitions, by year/month/week or day. In the year and month views, you can see the number of executions in one month/day. If you click on it, you can see more details on which workflows are executed and when. In the week view, if several workflows are executed at the same time, you will only be able to see one at a time.

calendar planning

If a calendar has no association, you can still select it and see its recurrence on the Execution Planning panel. If nothing is displayed, it means the selected Calendar Definitions have no recurrence on the selected period.

3.2. GANTT Visualization of the Job Planning

You can also use the Gantt chart ("Open Gantt" button at the top) to visualize the executions:

calendar planning gantt

In addition to the Execution Planning panel, the Gantt chart allows you to see the past job history, the current one being executed, as well as the future ones, all in a comprehensive interactive view. In this view, you will easily see the potential difference between the Planned Submission time and the Actual Start Time of the Jobs. You will also get estimations of the Finished Time, taking into account the Actual Start Time. Moreover, you can visualize the Job that stayed PENDING for some time (in Yellow), as well as the Jobs that had issues and got KILLED, CANCELLED, or FAILED (in red).

The planned jobs (in orange) are on the row just above the actual jobs so that you can easily compare what was planned and what actually executed. You can use the legend button for a better understanding of the colors.

calendar planning gantt legend

The length of the displayed planned jobs can be:

  • the expected execution time given by users in the workflow with the Generic Information: JOB_EXEC_TIME

  • if the workflow doesn’t have any JOB_EXEC_TIME Generic Information, it will be the average execution time computed from the past association executions

  • if the workflow doesn’t have any JOB_EXEC_TIME Generic Information and has never been executed with the association, it will be a default 15 minutes duration

calendar planning gantt tooltip

If a job has been PENDING before starting, depending on the scale you have selected, you can see a yellow part on the displayed job. This corresponds to the time the job was PENDING and helps to highlight a resource problem.

You can see more information jobs in the tooltips, when passing your mouse over a specific job:

  • Workflow and bucket name

  • Job ID: the ID given by the scheduler when a job has been submitted

  • Status: the status of the job when the Gantt has been generated

  • User estimated execution duration: Generic Information JOB_EXEC_TIME given by users in the workflow. This information doesn’t appear if the Generic Information is not defined in the workflow.

  • Submission time: the time when the job had been submitted by the job planner

  • Actual start time: the time when the job has started. If it has been PENDING before starting, Submission time and Actual start time will be different.

  • Actual finish time: the time when the job has finished (event if it was killed, cancelled or that it failed)

  • Full duration (wall time): the duration of the job, from the time it was submitted (Submission time) to the time it finished (Actual finish time)

  • Average execution time: the average duration of the workflow submitted with this specific calendar by the job planner

  • Minimal execution time: the minimal duration of the workflow submitted with this specific calendar by the job planner

  • Maximal execution time: the maximal duration of the workflow submitted with this specific calendar by the job planner

  • Displayed with […​]: which one of the values above was used to display the bar representing the job

Depending on the status of the job, the information won’t be the same. For example, if the job is RUNNING or STALLED, Actual finish time will be replaced by Planned finish time: the time when the job should finish, depending on when it started or how long it has been delayed.

If you select a calendar that will occur frequently (such as "every_10_min"), you might encounter troubles with big scales (such as "year"). The Gantt chart will take a long time to load and events will be too condensed to be readable. This is why for these kind of calendars, it is easier to select a smaller scale (such as "1 hour"). You can also select only the calendars you need to see before opening the Gantt chart modal, to make it load faster.

The "Save Gantt" button will take a screenshot of the visible part of the Gantt chart. Like for Gantt chart loading, it might take a while if there are too many events. You can also chose a smaller scale and select only the calendars you need.

4. Calendar Definition Syntax

Job Planner uses a Calendar Definition to know how the job will be planned over the time. As shown on the example below, this definition is composed of 4 fields:

  • a description (saying what the cron expression means, when to use the Calendar Definition, etc.)

  • a cron expression to define the recurrence (every morning at 6am, etc.)

  • a set of inclusions calendars to add specific job executions which cannot be defined by a cron expression (holidays, etc.)

  • a set of exclusions calendars to exclude specific occurrences of the job executions defined in cron and inclusion definitions (maintenances operations, holidays, etc.)

calendar definition inclusions exclusions

Based on the above configuration, the following JSON object will be stored in the Catalog.

{
   "description":"Every Week Day at 9:00 AM including holidays (except Christmas and Easter holidays)",
   "cron":"0 0 9 ? * MON-FRI *",
   "inclusion_calendars":[
      {
         "calendar":{
            "url":"http://localhost:8080/all_holidays_calendar.ics"
         },
         "rule":{
            "action":"EXECUTE_AT_START"
         }
      }
   ],
   "exclusion_calendars":[
      {
         "calendar":{
            "url":"http://localhost:8080/christmas_holidays_calendar.ics"
         },
         "rule":{
            "action":"CANCEL_NEXT_EXECUTION"
         }
      },
      {
         "calendar":{
            "url":"http://localhost:8080/easter_holidays_calendar.ics"
         },
         "rule":{
            "action":"CANCEL_NEXT_EXECUTION"
         }
      }
   ]
}

4.1. Description

The description allows users who are not familiar with cron expressions to know when it will occur. It might also be used for other purpose, for example saying when to use a Calendar Definition.

4.2. Cron

The aim of the cron expression is to launch the planned workflow according to the cron syntax. One can see the cron expression "0 0 9 ? * MON-FRI *", which follows the quartz cron expression syntax explained in the Quartz Cron Expression Syntax section. The cron expression in this example executes at 9:00 AM on working days (Monday to Friday).

4.3. Inclusion Calendar

The purpose of the inclusion calendar section is to use an ICS file to specify a workflow launching policies during calendar events. For instance automatically submit a worklfow at event start. Given an event, a predefined action will be applied on the workflow execution.

Inclusion action Description

EXECUTE_AT_START

The workflow will be submitted at each event start.

4.4. Exclusion Calendar

The purpose of the exclusion calendar is to use an ICS file to prevent workflows to be executed during a calendar event. Given an event, a predefined action will be applied on the workflow execution.

Exclusion action Description

CANCEL_NEXT_EXECUTION

All workflow submissions are canceled during the calendar events.

4.5. External calendar retrieved from URL

If an inclusion or exclusion calendar is not retrievable, it is blocking the Workflow submission. An inclusion or exclusion calendar can become not retrievable if it cannot be downloaded from its URL and the Job Planner cache doesn’t hold a copy.

Activeeon SAS, © 2007-2019. All Rights Reserved.

For more information, please contact contact@activeeon.com.