Grid Scheduling and Information Services | 7

ABSTRACT

Scheduling and information services are two components of grids, which play an important role in the overall performance of an application running on the grid. The information services complement the grid scheduling system. They provide information about status and availability of resources in the grid. The resources in the grid can either be physical resources such as processors, memory and network bandwidth or a service oﬀered by a node in the grid. Scheduling a job on a node requires two considerations. First, does the resource fulﬁll the minimum requirements and speciﬁc QoS requirements, if any, for the execution of the job? Second, is the resource available to serve the job? Both are provided by the grid information service. However, a scheduling decision is not as simple as that. We now present some cases that complicate the scheduling decision. A task may be composed of several sub-tasks, which are executed on diﬀerent nodes. These sub-tasks may have dependency among themselves in terms of their order of execution. A scheduling algorithm must consider such dependencies while making a scheduling decision. As another example consider the scheduling of a job that has a very large input ﬁle. In this case the scheduling of a task to a node should not be made independent of the data location because signiﬁcant communication overhead might be involved in transferring the data to the node executing the task. A node in the grid might fail due to a hardware failure or a network failure. In such cases the grid scheduler must reschedule the task onto a diﬀerent node. Such a decision is made by consulting the grid information service. In this chapter, we cover the scheduling aspects for these examples as workﬂow scheduling, data-intensive service scheduling and fault tolerance.