Slicer3:Large scale experiment control brainstorming

From NAMIC Wiki
Jump to: navigation, search
Home < Slicer3:Large scale experiment control brainstorming

Goal

To provide Slicer3 with a mechanism for submitting, monitoring, and summarizing large scale experiments that utilize Slicer3 modules, particularly the Command Line Modules. This page summarizes our thoughts, requirements, and experiments to date.

There are two introductory use cases that we wish to support:

  1. Slicer3 is used interactively to select a set of parameters for an algorithm or workflow on a single dataset. Then, these parameters are applied to N datasets non-interactively.
  2. Slicer3 is used interactively to select a subset of parameters for an algorithm or workflow on a single dataset. Then, the remaining parameter space is searched non-interactively. Various parameter space evaluation techniques could be employed, the simplest of which is to sample the space of (param1, param2, param3).

Note, that with the above two use cases, we are only trying to address large scale experiment control from the standpoint of what it means to Slicer3. We are not trying to solve the general case of large scale experiment control.

Assumptions and restrictions

  1. Computing configuration.
    We shall support a variety of computing infrastructures which include
    1. single computer systems,
    2. clusters,
    3. grids (optional)
  2. Access to compute nodes.
    We shall have no direct access to the compute nodes. All job submissions shall be to some sort of submit node. Exception may be when operating on a single computer system configuration.
  3. Staged data
    The compute nodes shall mount a filesystem outside of the node on which data is staged. We are not providing Slicer3 with the mechanisms to stage data. We assume that all data is staged outside of Slicer3.
  4. Staged programs
    The compute nodes shall have access to the Slicer3 processing modules. Like the case for data, the processing modules are staged outside of the Slicer3 environment.
  5. Experiment scheduling
    A given experiment shall result in one or more processing jobs being submitted to the computing resources.
  6. Job submission
    Submitting a job to the computing infrastructure shall result in a job submission token such that that job can be
    1. monitored for status: scheduled, running, completed
    2. terminated
  7. Experiment constrol
    We shall be able to monitor an experiment to see its status.
    We shall be able to interrupt an experiment.
    We shall be able to resume an experiement without re-running the entire experiment.
    We shall be able to rerun an experiment, overwriting previous results.

Components

(Steve, fill this section in from the whiteboard.)

Thought experiements

Below are a few thought experiments to address the above. These will be used to see how the above needs can be addressed.

Makefiles + the loopy launcher

BatchMake