Askalon

Cloud and Grid Application Development and Computing Environment

Scheduling

Scheduling of jobs is the process of determining how the jobs should be executed in the environment, considering actual mapping to the resources and time when the execution should take place. Scheduler is a service responsible for scheduling. There can be different levels of schedulers in the system, depending on how far from the real environment the scheduling is performed. The most bottom-level scheduler (sometimes called Local Resource Manager (LRM)) interacts directly with the operating system to submit and execute the jobs. Higher-level schedulers (metaschedulers) consider more global view of the execution environment, and distribute the jobs between different LRMs which are responsible for the mapping. The scheduling is usually based on some metrics to calculate goal function evaluating fitness value of the mapping. The schedule with the highest fitness value is chosen for execution. Different criteria can be applied to calculate the metrics. The most widely used are: execution time, economic cost, fault tolerance and quality of expected results. Scheduler can implement different fault-tolerance and fair-sharing policies to provide higher level of reliability of the environment and fair access to the resources in multi-application and multi-user environment. For instance, the jobs that failed can be restarted or submitted to other resources.
In the Askalon environment lower-level scheduling is done by existing LRMs (PBS, Condor, etc.) wrapped by the Globus middleware. The Askalon Scheduler is a metascheduler distributing the jobs between the Grid resources (e.g., clusters) managed by the middleware services.

Scheduler in ASKALON environment


Fig.1 Scheduler in ASKALON Grid environment.

The Scheduler (see: Fig.1) is a service which prepares workflow application for execution on the Grid. It processes workflow specification described in AGWL language converting a workflow to an executable form, and schedules it on available Grid resources. The Scheduler consists of two main components: Workflow Converter and Scheduling Engine. Event Generator is meant as a future extension of the system to increase dynamicity in workflow processing.
Workflow Converter resolves all ambiguities, transforming sophisticated workflow graphs to simple DAGs (Directed Acyclic Graphs). The reason for doing such conversion is to make the workflows applicable for graph scheduling algorithms. All the transformations are based on the most probable assumptions, and can be changed in the later phase of workflow execution. The assumptions are applied for conditions and parameters of workflows, that cannot be evaluated at the beginning of execution. The conditional constructs like while loop, if-then-else or switch are based on the conditions which determine their execution (number of iterations or execution variant). parallel for may have different number of parallel branches, depending on the input parameters. Correct prediction may benefit with significant scheduling profit, particulary if a strong imbalance in the workflow is predicted. Incorrect prediction does not invoke any execution problem, but only brings about the necessity of rescheduling.
Scheduling Engine is responsible for actual scheduling of the evaluated workflows. It is based on a plug-in architecture, where different scheduling algorithms fitting to the current workflow model can be used interchangeably. The algorithms may have different execution time and different accuracy of the results, they may also regard different metrics as optimization goals. We applied the HEFT algorithm as the primary scheduling algorithm in our system. GridARM provides current status of the resources available on the Grid. It provides also information what Activity Deployments available on the resources match to the Activity Types of the workflow nodes. Making query to the GridARM, Scheduler sends a set of constraints to be fulfilled by the requested resources. Performance Predictor supplies predicted activity execution times and data transfer times for performance-driven scheduling algorithms.
Event Generator will coordinate workflow execution, generating events to the Enactment Engine every time when it is necessary. For instance, rescheduling may be needed when some resources go down, become heavily loaded, or when another relevant event occurs in the system.
Scheduling process starts, when Enactment Engine sends a scheduling request with a workflow description. The workflow consists of nodes representing Activity Types, control and data links between the nodes, and of specification of the inputs and the outputs (i.e., all the information available in AGWL input). The Workflow Converter tries to predict all ambiguous control flows in the workflow, and simplifies the workflow to a DAG. The Scheduling Engine receives the DAG and tries to make the optimal schedule for it. The Scheduling Engine queries the GridARM in order to receive information about different Activity Deployments (activities deployed on individual resources) that can be mapped to the Activity Types specified in the workflow. From among all possible mapping the Scheduler have to choose a set of mappings that produces a concrete workflow which is optimal in terms of assumed goal function. If the optimization goal is the execution time, then the Scheduler queries the Performance Predictor for predicted execution times and data transfer times. One of available scheduling algorithms is applied to map the workflow nodes to appropriate Activity Deployments. Once created, the mapped workflow is sent to the Enactment Engine for execution.
The execution goes on until it is finished or any interrupting event occurs in the system. If the latter is the case, then the Enactment Engine sends a rescheduling event which forces the Scheduler to perform a new scheduling (and possibly a new prediction) of the workflow. For instance, a rescheduling event has to be invoked if any of the assumptions made by the Workflow Converter fails. Another rescheduling strategy, based for instance on periodical rescheduling, can also be assumed by the Scheduler.

Publications

Related work

Projects

Articles

Books

Theses


Web pages

Askalon

Overview Cloud Computing People Publications Posters

Technology

AGWL Workflow Composition GroudSim

Workflow Execution (Meta) Scheduling Service Enabling Performance Prediction Performance Analysis

Resource Broker Resource Monitoring Askalon Visualization Diagrams