public abstract class BatchJobInfrastructure extends InfrastructureManager
getDeleteJobCommand()
fails, the node is removed from the core
anyway, if the SSH command to the frontend or getSubmitJobCommand()
fail, this IM ensure that after the nodeTimeOut
occurs no more nodes
will be registered (this IM maintains an internal "black list").
Service Providers have to implements 3 methods:
getBatchinJobSystemName()
: The name of the target resource
manager, PBS, Torque, LSF are such examples. The returned string is not
really significant, it is only used to build meaningful nodes' name or for
logging.getDeleteJobCommand()
: The command required to delete a job on
the target resource manager.getSubmitJobCommand()
: The command required to submit a new job.
InfrastructureManager.PersistedInfraVariablesHandler<T>, InfrastructureManager.RMDeployingNodeAccessor
Modifier and Type | Field and Description |
---|---|
protected String |
javaOptions
Additional java options to append to the command executed on the remote
host
|
protected String |
javaPath
Path to the Java executable on the remote hosts
|
protected int |
maxNodes
maximum number of nodes this infrastructure can ask simultaneously to the
Job Batching system
|
protected int |
nodeTimeOut
time out after which one nodes are not expected to register anymore.
|
protected File |
rmCredentialsPath
Path to the credentials file user for RM authentication
|
protected String |
schedulingPath
Path to the Scheduling installation on the remote hosts
|
protected String |
serverName
name of the server on which the job batching software is running.
|
protected String |
sshOptions
ShhClient options (@see
SSHClient ) |
protected String |
submitJobOpt
options for the submit job command executed on
serverName |
ELASTIC, logger, nodeSource, persistedInfraVariables, readLock, RM_URL_KEY, writeLock
Constructor and Description |
---|
BatchJobInfrastructure() |
Modifier and Type | Method and Description |
---|---|
void |
acquireAllNodes()
Acquires as much nodes as possible, making one distinct reservation per
node
|
void |
acquireNode()
Acquires a single node through pbs
|
void |
configure(Object... parameters)
Configures this infrastructure manager parameters[0] = java path
parameters[1] = ssh options parameters[2] = scheduling path parameters[3]
= java options parameters[4] = max node parameters[5] = node timeout
parameters[6] = scheduler server name parameters[7] = PA scheduler
credentials parameters[8] = submit job options
|
protected abstract String |
extractSubmitOutput(String output)
Parses the submit (
getSubmitJobCommand() ) command output to
extract job's ID. |
protected abstract String |
getBatchinJobSystemName()
Return a string to identify the type of the target Batching Job System.
|
protected abstract String |
getDeleteJobCommand()
To be able to get from implementations the command that will be used to
delete a job
|
String |
getDescription() |
Map<Integer,String> |
getSectionDescriptions() |
protected abstract String |
getSubmitJobCommand()
To be able to get from implementations the command that will be used to
submit a new Job
|
protected void |
initializePersistedInfraVariables()
This method should initialize a value in the runtime variables map for all the runtime variables that will be
used in the class.
|
void |
notifyAcquiredNode(org.objectweb.proactive.core.node.Node node)
Notifies the implementation of the infrastructure manager that a new node
has been registered.
|
protected void |
notifyDeployingNodeLost(String pnURL)
This method is called by Infrastructure Manager in case of a pending node
removal.
|
void |
notifyDownNode(String nodeName,
String nodeUrl,
org.objectweb.proactive.core.node.Node node)
Define what needs to be done in an infrastructure implementation when a node is detected as down.
|
void |
removeNode(org.objectweb.proactive.core.node.Node node)
Removes the node from the resource manager.
|
void |
shutDown()
Notify this infrastructure it is going to be shut down along with its
nodesource.
|
String |
toString() |
acquireAllNodes, acquireNodes, acquireNodes, addDeployingNode, addDeployingNodeWithLockAndPersist, addLostNodeWithLockAndPersist, addMultipleDeployingNodes, buildDeployingNodeURL, checkAllNodesAreAcquiredAndDo, checkNodeIsAcquiredAndDo, declareDeployingNodeLost, getDefaultCommandLineBuilder, getDeployingAndLostNodes, getDeployingNodesWithLock, getDeployingOrLostNode, getEmptyCommandLineBuilder, getLostNodesWithLock, getMeta, getPersistedDeployingNodesUrl, getPersistedInfraVariable, getPersistedLostNodesUrl, getRmUrl, internalConfigure, internalNotifyDownNode, internalRegisterAcquiredNode, internalRemoveDeployingNode, internalRemoveNode, internalShutDown, isShutDown, isUsingDeployingNode, multipleDeclareDeployingNodeLost, onDownNodeReconnection, persistInfrastructureVariables, reconfigure, recoverPersistedInfraVariables, removeDownNodePriorToNotify, searchForNotAcquiredRmNode, setNodeSource, setPersistedInfraVariable, setPersistedNodeSourceData, setRmDbManager, setRmUrl, update, updateDeployingNodeDescription
protected String javaPath
protected String sshOptions
SSHClient
)protected String schedulingPath
protected String javaOptions
protected int maxNodes
protected int nodeTimeOut
protected String serverName
protected File rmCredentialsPath
protected String submitJobOpt
serverName
public void acquireAllNodes()
acquireAllNodes
in class InfrastructureManager
public void acquireNode()
acquireNode
in class InfrastructureManager
public void configure(Object... parameters)
configure
in class InfrastructureManager
parameters
- of the infrastructure managerpublic void notifyAcquiredNode(org.objectweb.proactive.core.node.Node node) throws RMException
InfrastructureManager.addDeployingNode(String, String, String, long)
was made), and is called only for deploying nodes for which one no
timeout occurred.notifyAcquiredNode
in class InfrastructureManager
node
- the newly registered nodeRMException
- if the implementation does not approve the node acquisition
requestprotected void notifyDeployingNodeLost(String pnURL)
notifyDeployingNodeLost
in class InfrastructureManager
pnURL
- the deploying node's URL for which one the timeout occurred.public void notifyDownNode(String nodeName, String nodeUrl, org.objectweb.proactive.core.node.Node node) throws RMException
InfrastructureManager
notifyDownNode
in class InfrastructureManager
node
- the ProActive Programming Node that is down.RMException
- if any problems occurred.public void removeNode(org.objectweb.proactive.core.node.Node node) throws RMException
removeNode
in class InfrastructureManager
node
- the node to release.RMException
- if any problems occurred.public String getDescription()
getDescription
in class InfrastructureManager
public void shutDown()
InfrastructureManager
shutDown
in class InfrastructureManager
protected abstract String getSubmitJobCommand()
protected abstract String getDeleteJobCommand()
protected abstract String getBatchinJobSystemName()
protected abstract String extractSubmitOutput(String output)
getSubmitJobCommand()
) command output to
extract job's ID.output
- the submit command outputprotected void initializePersistedInfraVariables()
InfrastructureManager
InfrastructureManager.configure(Object...)
method.
This method runs with the write lock acquired.initializePersistedInfraVariables
in class InfrastructureManager
public Map<Integer,String> getSectionDescriptions()
getSectionDescriptions
in interface NodeSourcePlugin
getSectionDescriptions
in class InfrastructureManager