ProActive Workflows & Scheduling (PWS)

  • PWS User Guide

     (Workflows, Workload automation, Jobs, Tasks, Catalog, Resource Management, Big Data/ETL, …​)

  • PWS Modules

  • PWS Admin Guide

     (Installation, Infrastructure & Nodes setup, Agents,…​)

1. Main concepts of PCW

ProActive Cloud Watch (PCW) monitors a system to detect complex events, and triggers actions when activated rules are fired.
PCW is configured with rules to define the metrics to monitor, actions to be performed, and conditions to be met to trigger them.

ProActive Cloud Watch is organized as a Catalog of rules (which can be activated or deactivated) that will automatically trigger user-specified actions if defined conditions are fulfilled.
For example, PCW is able to monitor some computing resources (ProActive Nodes) and deploy new resources (scale up) when the monitored resources are overloaded. Another use case is to notify users with alerts about some deteriorated resources.
In general PCW allows to trigger any action by submitting a workflow to the ProActive Scheduler.

Definitions:

Rule

- defines a monitoring behavior, triggering actions according to some predefined conditions. Rules generally use a template for certain monitoring use cases, configurable using specific variables.

Rule Instance

- instantiated rule with concrete set of variables. One rule can have several rule Instances with different variable values. The rule instance can be activated/deactivated. PCW service will monitor metrics defined inside the rule only for activated instances.

PCW General overview

The diagram above illustrates the general overview of PCW functionality.

1.1. Monitoring

There are several types of metrics that can be monitored by PCW. ProActive Cloud Watch implements a polling mechanism to collect the monitoring metrics. A delay can also be added to resume monitoring after an action has been triggered only after a certain time. This feature is very useful in case of scale up/down use cases and helps to avoid triggering extra actions until the new resource is deployed/undeployed.
The design of PCW allows to add new types of monitoring metrics.

1.1.1. Monitoring physical metrics

ProActive provides a way to monitor system metrics of virtual and physical machines (Nodes) deployed by ProActive Resource Manager.
Thanks to the JMX protocol, ProActive Nodes can expose many system metrics from the underlying operating system (disk, RAM, CPU usage, network activity, etc…​). ProActive Nodes use the Sigar library to collect physical metrics.

ProActive Cloud watch can monitor any ProActive Node with JMX URL to access the following metrics:

  • System memory, swap, CPU, load average, uptime, logins.

  • Per-process memory, CPU, credential info, state, arguments, environment, open files.

  • File system threshold detection and metrics.

  • Network interface detection, configuration information and metrics.

  • Network route and connection tables.

1.1.2. Monitoring host accessibility

Another type of monitoring available in PCW is to check host accessibility. The implementation of this metric follows the ping host approach. With the help of this metric a user can easily check if a host is up or down and automatically trigger an action.

1.2. Processing mechanism

The core of rule processing is managed by the Drools Engine for performance processing of collected metrics. It features powerful mechanisms for complex event processing. The advanced Drools algorithm helps fast detection of fulfilling the conditions and triggering actions.

1.3. Rule’s action

According to the concepts of PCW, the action will be triggered when specified condition is met.
In order to make an action flexible to any user needs, the rule’s action is firing a job inside scheduler. A user just need to specify which workflow from Catalog will be fired as rule’s action.
For example, one can design an action workflow that sends a notification (alert) whenever a metric reaches a critical value.
For scalability use cases, the Cloud Automation service can be started as a rule’s action that will deploy the new resources.

2. PCW portal - Using existing rules

In the Automation Dashboard portal, the Cloud Watch view allows users to activate and parametrize a predefined set of rules and also to create new ones.

The next screenshot illustrates PCW portal.

automation dashboard cloud watch

The portal is divided into several sections:

  1. In the Rules panel a user can find the set of predefined PCW rules ready to be activated.

  2. Activated Rule Instances panel shows all rule instances after being activated with certain variables.

  3. Rule - Fired Jobs table provides information about triggered jobs inside Scheduling as actions of activated rules.

  4. Removed Rule Instances panel gives the list of all rule instances that were removed from the list of activated rules.

In order to better understand the usage of PCW portal, let’s take an example of using existing PCW rules provided by the ProActive distribution.

2.1. Activating a rule

Here is an example that illustrates a simple use case of PCW portal.

The next screenshot shows how to activate a rule, when a user selects a certain rule from the list of Rules.

PCW rule activation

A user can change the default values.
A user should Obtain a JMX node url and paste it in the nodeUrlVar variable.
After modifying the variables a user can Check (validate) the rule or directly Activate it.

2.2. Using rule variables

The rule variables depend on the definition and purpose of the rule. There are different variables for different rules. The designer of a new rule can choose the set of needed variables.

Let’s give explanation of rules for our example:

  • nodeUrlVar - URL of the JMX endpoint what will be monitored

  • pollingPeriodInSecondsVar - time frequency when collecting metrics from monitored resource

  • calmPeriodInSecondsVar - time period during which the metrics are not collected after the rule action was triggered

  • workflowNameAsActionVar - name of the workflow inside ProActive Catalog that will be fired as action

  • bucketNameAsActionVar - name of the Catalog bucket identifying the workflow that will be fired as action

  • usageCombinedCPUMetric - specific Sigar metric for this rule, identifying combined CPU usage kpi

  • percentageUsedLimit - threshold of CPU consumption (in percentage) which will trigger the action. This variable is directly connected to the rule condition.

2.3. Activated rule instances

After activation of a rule, the instance will appear in section Activated Rule Instances.

All the activated rule instances are monitoring the specified resources. If the triggering condition is met, then the rule will fire the job. This fired job description will appear in Rule - Fired Jobs table. The monitoring and triggering actions can be stopped by removing an Activated rule instance. After removing a rule instance from active list, it will appear in the Removed Rule Instances table. It is possible to see the history of all fired jobs for this table.

This behaviour is demonstrated in the next screenshot.

PCW activated with removed

2.4. Obtaining JMX node URLs

ProActive Resource Manager portal provides access to monitoring capabilities. During the activation of a rule, a user should enter as rule parameter a JMX endpoint corresponding to the monitored resource (for example a ProActive Node).
In order to get this information from a Node, a user can go to the RM portal, click on the right button of a Node which should be monitored and click Copy JMX Endpoint. Then, he can come back to Cloud Watch portal and paste the JMX URL in the corresponding field.

RM copy jmx ur

3. Create new Cloud Watch rules

ProActive Cloud Watch portal allows to customize the predefined Catalog of rules and also to create new rules from scratch.
The provided rules in ProActive Catalog are a good starting point to design more complex scenarios.

3.1. Rule’s structure explanation

The rule definition consists of the following main parts:

  1. Variables definition. A user can add variables according to its needs. All variables will be replaced with their values inside the rule implementation using the following pattern: #variableName

  2. Definition of polling metrics. Polling metrics are specific parameters which depends on the polling class implementation. Polling parameters can reuse the rule variables. The collecting metrics are usually the physical metrics explained in the section Monitoring physical metrics provided by JMX endpoint, like CPU, Memory, etc.

  3. Definition of Drool’s rule. In this part we use Drools standardized language to describe a triggering condition and an action.

    <?xml version="1.0" encoding="UTF-8"?>
    <rule xmlns="urn:proactive:ruledescriptor:1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="urn:proactive:ruledescriptor:1.0 https://www.activeeon.com/public_content/schemas/proactive/ruledescriptor/1.0/ruleschema.xsd">
    
        <variables>
            <variable name="nodeUrlVar" value="service:jmx:rmi:///jndi/rmi://tryqa.activeeon.com:59885/rmnode" />
            <variable name="percentageUsedLimit" value="90" />
            <variable name="bucketNameAsActionVar" value="basic-examples" />
            <variable name="workflowNameAsActionVar" value="Native_Task_Linux" />
            <variable name="pollingPeriodInSecondsVar" value="60" />
            <variable name="calmPeriodInSecondsVar" value="120" />
            <variable name="usageCombinedCPUMetric" value="sigar:Type=CpuUsage Combined" />
        </variables>
    
        <ruleImplementation>
            <pollingClass value="Jmx_Metrics">
                <parameters>
                    <parameter name="jmxCredentialsFile" value="#pathToCredentialFileVar" />
                    <parameter name="nodeUrl" value="#nodeUrlVar" />
                    <parameter name="pollingPeriodInSeconds" value="#pollingPeriodInSecondsVar" />
                    <parameter name="calmPeriodInSeconds" value="#calmPeriodInSecondsVar" />
                </parameters>
            </pollingClass>
            <kpis>
                <kpi>#usageCombinedCPUMetric</kpi>
            </kpis>
            <droolContent>
            <![CDATA[
    package org.ow2.proactive.cloud_watch.rules
    import java.util.*;
    import  org.ow2.proactive.cloud_watch.model.*;
    import  org.ow2.proactive.cloud_watch.service.*;
    global org.ow2.proactive.cloud_watch.action.ActionExecutorFromCatalog executorWorkflowFromCatalog;
    declare isNodeAction
     nodeMetrics: NodeMetrics
    end
    
    rule #autogenerated_rule_id
    when
        exists  kpi : NodeMetrics( url == "#nodeUrlVar" )
        $metric: NodeMetrics(metrics["#usageCombinedCPUMetric"] * 100  > #percentageUsedLimit)
    
    then
        Map<String, String> parameters = new HashMap();
        parameters.put("credentialsPath","#pathToCredentialFileVar");
        executorWorkflowFromCatalog.submitToScheduler(parameters, "#autogenerated_rule_id", "{'bucket_name':'#bucketNameAsActionVar','workflow_name':'#workflowNameAsActionVar','variables':{}}");
    end
    ]]>
            </droolContent>
        </ruleImplementation>
    </rule>

    You can see above an example of portable rule in xml format that can be saved in ProActive Catalog.

3.2. Rule’s variables definition

The above provided example describes a rule for monitoring CPU usage. The aim of this rule is to monitor the CPU usage and trigger an action when the threshold is reached.

 <variables>
    <variable name="nodeUrlVar" value="service:jmx:rmi:///jndi/rmi://tryqa.activeeon.com:59885/rmnode" />
    <variable name="percentageUsedLimit" value="90" />
    <variable name="bucketNameAsActionVar" value="basic-examples" />
    <variable name="workflowNameAsActionVar" value="Native_Task_Linux" />
    <variable name="pollingPeriodInSecondsVar" value="60" />
    <variable name="calmPeriodInSecondsVar" value="120" />
    <variable name="usageCombinedCPUMetric" value="sigar:Type=CpuUsage Combined" />
 </variables>

A user can define his own set of rule variables as it has a simple key-value structure.

3.3. Monitoring part

As explained in the document section Monitoring, PCW service can monitor 2 types of metrics: Monitoring physical metrics and Monitoring host accessibility. Each monitoring type has an implementation of a Polling class. The polling parameters depends on the chosen implementation of a Polling class. We will provide below examples of polling parameters for each monitoring type.

3.3.1. Polling parameters for Monitoring physical metrics

<ruleImplementation>
    <pollingClass value="Jmx_Metrics">
        <parameters>
            <parameter name="jmxCredentialsFile" value="#pathToCredentialFileVar" />
            <parameter name="nodeUrl" value="#nodeUrlVar" />
            <parameter name="pollingPeriodInSeconds" value="#pollingPeriodInSecondsVar" />
            <parameter name="calmPeriodInSeconds" value="#calmPeriodInSecondsVar" />
        </parameters>
    </pollingClass>
    <kpis>
        <kpi>#usageCombinedCPUMetric</kpi>
    </kpis>

The code above describes a polling configuration for Monitoring physical metrics. This particular example is for monitoring CPUs.
The polling and calm periods correspond to the frequency of collecting the monitoring metrics. The concrete set of parameters relies on the implementation of a Polling class. In our example, the class is Jmx_Metrics. This implementation takes two parameters: the JMX node URL and the JMX credentials. The actual metric to monitor is represented by the <kpi> tag. Several metrics can be monitored at the same time. For our example after variables replacement the monitoring metric will be "sigar:Type=CpuUsage Combined".

3.3.2. Polling parameters for Monitoring host accessibility

The other type of monitoring is checking the host accessibility. The example of the polling parameters for this case is shown next:

<ruleImplementation>
    <pollingClass value="Ping">
        <parameters>
            <parameter name="nodeUrl" value="google.com" />
            <parameter name="pollingPeriodInSeconds" value="60" />
            <parameter name="calmPeriodInSeconds" value="120" />
        </parameters>
    </pollingClass>
    <kpis> <kpi>upAndRunning</kpi> </kpis>

The polling class name for monitoring host accessibility is Ping. The nodeUrl parameter’s value defines the hostname or IP that should be monitored. The polling and calm periods specify the frequency of checking the provided hostname.

3.4. Rule’s condition and action

As said in Rule’s action section, we will specify rule’s condition and action in Drools language.
The possible actions are:

  1. submit workflow to ProActive Scheduler

  2. start ProActive Cloud Automation service.

Both examples are provided below.

3.4.1. Submit to ProActive Scheduler as rule’s action

The described condition and action follows an example of CPU monitoring. The job will be submitted to ProActive Scheduler as a rule’s action.

<droolContent>
<![CDATA[
package org.ow2.proactive.cloud_watch.rules
import java.util.*;
import  org.ow2.proactive.cloud_watch.model.*;
import  org.ow2.proactive.cloud_watch.service.*;
global org.ow2.proactive.cloud_watch.action.ActionExecutorFromCatalog executorWorkflowFromCatalog;
declare isNodeAction
 nodeMetrics: NodeMetrics
end

rule #autogenerated_rule_id
when
    exists  kpi : NodeMetrics( url == "#nodeUrlVar" )
    $metric : NodeMetrics(metrics["#usageCombinedCPUMetric"] * 100  > #percentageUsedLimit)

then
    Map<String, String> parameters = new HashMap();
    parameters.put("credentialsPath","#pathToCredentialFileVar");
    executorWorkflowFromCatalog.submitToScheduler(parameters, "#autogenerated_rule_id", "{'bucket_name':'#bucketNameAsActionVar','workflow_name':'#workflowNameAsActionVar','variables':{}}");
end
]]>
</droolContent>

The next part uses pure Drools syntax for specifying a rule condition and an action. In the provided JMX CPU metric example a when part defines two conditions:

  • the metric should arrive from the specified node URL

  • CPU usage percentage should be bigger than the specified limit

The then part is an actual action triggered when a condition is met. In the example, a workflow stored in ProActive Catalog will be submitted to ProActive Scheduler.

3.4.2. Start ProActive Cloud Automation service as rule’s action

This section will show the described condition and action of a host accessibility monitoring example. The ProActive Cloud Automation service will be started as a rule’s action.

<droolContent>
<![CDATA[
package org.ow2.proactive.cloud_watch.rules
import java.util.*;
import  org.ow2.proactive.cloud_watch.model.*;
import  org.ow2.proactive.cloud_watch.service.*;
global org.ow2.proactive.cloud_watch.action.ActionExecutorFromCatalog executorWorkflowFromCatalog;
declare isNodeAction
 nodeMetrics: NodeMetrics
end

rule #autogenerated_rule_id when
    exists  kpi : NodeMetrics( url == "#hostAddressVar" )
    $metric : NodeMetrics(metrics["upAndRunning"]  == false)

then
    Map<String, String> parameters = new HashMap();
    parameters.put("credentialsPath","#pathToCredentialFileVar");
    executorWorkflowFromCatalog.submitToCloudAutomation(parameters, "#autogenerated_rule_id", "{'bucket_name':'#bucketNameAsActionVar','workflow_name':'#workflowNameAsActionVar','variables':{}}");
end
]]>
</droolContent>

The provided host accessibility example will trigger an action when the specified host is down. This triggering behavior is declared in when part with 2 conditions:

  • the monitoring URL should be equal to specified host address

  • upAndRunning metric should be false (the host is not accessible in this case)

In then part of provided example you can observe how to start Cloud Automation service as rule’s action.

Using standard Drools language for description of the condition gives a big advantage. It allows to specify complex use cases and benefit from Drools Engine processing.
With ProActive distribution there are provided examples for monitoring with slicing time window, for example: monitoring an average of memory usage for last 30 seconds. In the Drools language, we can specify such condition with slicing time window and triggering action.

Even if the design of the xml rule looks complex, choosing Drools syntax gives a huge flexibility to express user requirements.

3.5. Display a created rule in Cloud Watch portal

Once the rule is defined, it should be saved in Catalog with the specific kind Rule. Please check out the Catalog documentation section for more information about saving objects in Catalog.

Activeeon SAS, © 2007-2019. All Rights Reserved.

For more information, please contact contact@activeeon.com.