ABSTRACT

With the advent of grid technologies, scientists and engineers are building complex and sophisticated applications to manage and process large datasets and execute scientifi c experiments on distributed grid resources [1]. Building complex workfl ows requires means for composing and executing distributed applications. A workfl ow expresses an automation of procedures wherein fi les and data are passed between procedures applications, according to a defi ned set of rules, to achieve an overall goal [2]. A workfl ow management system defi nes, manages, and executes workfl ows on computing resources. The use of the workfl ow paradigm for application composition on grids offers several advantages [3] such as:

Ability to build dynamic applications and orchestrate the use of • distributed resources Utilization of resources that are located in a suitable domain to • increase throughput or reduce execution costs Execution of spanning multiple administrative domains to obtain • specifi c processing capabilities Integration of multiple teams involved in managing different parts • of the experiment workfl ow-thus promoting interorganizational collaborations

Executing a grid workfl ow application is a complex endeavor. Workfl ow tasks are expected to be executed on heterogeneous resources that may be geographically distributed. Different resources may be involved in the execution of one workfl ow. For example, in a scientifi c experiment, one needs to acquire data from an instrument, and analyze it on resources owned by other organizations, in sequence or in parallel with other tasks. Therefore, the discovery and selection of resources for executing workfl ow tasks could be quite complicated. In addition, a large number of tasks may be required to be executed and monitored in parallel and the location of intermediate data may be known only at runtime.