Parallel system

From NAMIC Wiki
Jump to: navigation, search
Home < Parallel system

Overview

The most obvious use case for a "computational grid" is to run many jobs simultaneously on remote resources, specifically on multiple clusters of identical computers. Logistically, doing so is a tedious problem that is solved by so-called application schedulers that execute individual programs on behalf of the user. GridWizard is an application scheduler being developed at UCSD and differs from existing application schedulers in several key areas. We expect that this software will make many other computational tasks substantially easier, including the batch submission of large numbers of jobs from Slicer3.


Background

Existing application scheduler software programs fall into one of two categories: static schedulers or dynamic schedulers. Static schedulers compute the ordering and mapping of jobs onto resources prior to running any of the actual workload. Such a strategy has an obvious problem, in that any system-level failures (a compute node crashes, a disk share becomes unavailable) will affect a larger number of tasks than necessary. Dynamic schedulers, on the other hand, maintain a list of tasks (or jobs) that need to be performed and then map some task from the list to the next available compute resource. In this context, the mapping is not computed explicitly, but is computed by the end of the application's execution.

Another criteria with which to classify application scheduling is their deployment architecture. For example, the APST scheduler template is a pull scheduler: it reserves time on each compute resource and employs a reverse-access shell on each resource to poll for the next available job. In push scheduling, as performed in applications like GridSAM, the application scheduler program creates a manifest of jobs to be done by a particular resource and explicitly schedules that manifest through the mechanism on each resource (e.g., via qsub on a PBS-managed batch system).

Though not immediately obvious, different scheduling styles and deployment architectures are desirable at different stages of the research process. Because a significant amount of software debugging and testing is taking place in the early stages of research, a static push scheduler is often preferable even though such a strategy has been shown to result in suboptimal execution times. The increased time in program execution can be offset if debugging the application becomes substantially easier because of a static push scheduler's predictable nature---here, one frequently prefers an application scheduler that parallelizes an application in a predictable and intuitively understandable manner, rather than according to a heuristically optimal schedule that may be less predictable. However, in some contexts a dynamic pull scheduler may be more appropriate. For example, if a researcher provides (and supports) an online website for other members of the community to analyze data, it may become necessary to add new compute resources without disrupting existing application schedules; this is much easier with a dynamic scheduler, and nearly trivial with a pull-based scheduling. However, in our experience with the one available dynamic pull-based scheduler (APST), pull-based protocols can be prone to deadlock in the face of node failure leading to a stalled application.

Because an application scheduler is merely another research tool to be used in the context of some larger scientific inquiry, it stands to reason that an ideal application scheduler would provide mechanisms that fit the overall research process by selecting: (a) either static or dynamic scheduling; (b) one of several scheduling algorithms, depending on the maturity of the scheduled code; (c) execution logging and restarting; (d) integration with standard data grid tools and access protocols (e.g., sftp, SRB, GSI); (e) integration with standard resource schedulers (e.g., SGE, Condor). To our knowledge, no such application has been reported in the literature, and it is this set of requirements that we try to meet with GridWizard.

Status

Completed:

In Progress:

  • Design of better scheduling algorithms to take advantage of data locality
  • Testing
  • Limit detection: how many jobs *can* we schedule, exactly?