Slicer3:Large scale experiment control brainstorming

From NAMIC Wiki
Jump to: navigation, search
Home < Slicer3:Large scale experiment control brainstorming

Goal

To provide Slicer3 with a mechanism for submitting, monitoring, and summarizing large scale experiments that utilize Slicer3 modules, particularly the Command Line Modules. This page summarizes our thoughts, requirements, and experiments to date.

There are two introductory use cases that we wish to support:

  1. Slicer3 is used interactively to select a set of parameters for an algorithm or workflow on a single dataset. Then, these parameters are applied to N datasets non-interactively.
  2. Slicer3 is used interactively to select a subset of parameters for an algorithm or workflow on a single dataset. Then, the remaining parameter space is searched non-interactively. Various parameter space evaluation techniques could be employed, the simplest of which is to sample the space of (param1, param2, param3).

Note, that with the above two use cases, we are only trying to address large scale experiment control from the standpoint of what it means to Slicer3. We are not trying to solve the general case of large scale experiment control.

Assumptions and restrictions

  1. Computing configuration.
    We shall support a variety of computing infrastructures which include
    1. single computer systems,
    2. clusters,
    3. grids (optional)
  2. Access to compute nodes.
    We shall have no direct access to the compute nodes. All job submissions shall be to some sort of submit node. Exception may be when operating on a single computer system configuration.
  3. Staged data
    The compute nodes shall mount a filesystem outside of the node on which data is staged. We are not providing Slicer3 with the mechanisms to stage data. We assume that all data is staged outside of Slicer3.
  4. Staged programs
    The compute nodes shall have access to the Slicer3 processing modules. Like the case for data, the processing modules are staged outside of the Slicer3 environment.
  5. Experiment scheduling
    A given experiment shall result in one or more processing jobs being submitted to the computing resources.
  6. Job submission
    Submitting a job to the computing infrastructure shall result in a job submission token such that that job can be
    1. monitored for status: scheduled, running, completed
    2. terminated
  7. Experiment control
    We shall be able to monitor an experiment to see its status.
    We shall be able to interrupt an experiment. This may involve removing jobs from the queue and terminating jobs in process.
    We shall be able to resume an experiment without re-running the entire experiment. Previously terminated jobs will be resubmitted. Previously completed jobs will not be rerun.
    We shall be able to rerun an experiment, overwriting previous results.
  8. Job execution robustness
    Jobs terminating unsuccessfully shall be automatically resubmitted to the computing environment upon the experiment designers request. Jobs may be resubmitted zero times, K times, or until successful.

Components

(Steve, fill this section in from the whiteboard.)

Thought experiements

Below are a few thought experiments to address the above. These will be used to see how the above needs can be addressed.

Makefiles + the loopy launcher

BatchMake

BatchMake allows for large scale experiments to be designed using a scripting language similar to CMake scripts. BatchMake provides a number of looping constructs which can be used to design experiments and parameter searches

  • foreach
  • sequence
  • randomize
  • fornfold

Here is a BatchMake script to search the parameter space of a median filter

SetApp(median @'Median Filter')
SetAppOption(median.inputVolume 'c:/projects/I2/Insight/Testing/Data/Input/cthead1.png')

Set(kernels '1,1,1' '2,2,1' '3,3,1' '4,4,1' '5,5,1')
Set(outVolumePrefix 'c:/projects/Temp/Slicer3/median')

foreach(kernel ${kernels})
  RegEx(kernelText ${kernel} ',' REPLACE '_')
  SetAppOption(median.outputVolume ${outVolumePrefix}${kernelText}.png)
  SetAppOption(median.neighborhood ${kernel})

  Run(output ${median})

endforeach(kernel)