|
|
(29 intermediate revisions by 6 users not shown) |
Line 1: |
Line 1: |
− | == Goal ==
| + | <big>'''Note:''' We are migrating this content to the slicer.org domain - <font color="orange">The newer page is [https://www.slicer.org/wiki/Slicer3:Large_scale_experiment_control_brainstorming#BatchMake_and_Experiment_Control here]</font></big> |
− | | |
− | To provide Slicer3 with a mechanism for submitting, monitoring, and summarizing large scale experiments that utilize Slicer3 modules, particularly the Command Line Modules. This page summarizes our thoughts, requirements, and experiments to date.
| |
− | | |
− | There are two introductory use cases that we wish to support:
| |
− | | |
− | # Slicer3 is used interactively to select a set of parameters for an algorithm or workflow on a single dataset. Then, these parameters are applied to N datasets non-interactively.
| |
− | # Slicer3 is used interactively to select a subset of parameters for an algorithm or workflow on a single dataset. Then, the remaining parameter space is searched non-interactively. Various parameter space evaluation techniques could be employed, the simplest of which is to sample the space of (param1, param2, param3).
| |
− | | |
− | Note, that with the above two use cases, we are only trying to address large scale experiment control from the standpoint of what it means to Slicer3. We are '''not''' trying to solve the general case of large scale experiment control.
| |
− | | |
− | == Assumptions and restrictions == | |
− | | |
− | # Computing configuration.
| |
− | #: We shall support a variety of computing infrastructures which include
| |
− | #:# single computer systems,
| |
− | #:# clusters,
| |
− | #:# grids (optional)
| |
− | # Access to compute nodes.
| |
− | #: We shall have no direct access to the compute nodes. All '''job submissions''' shall be to some sort of submit node. Exception may be when operating on a single computer system configuration.
| |
− | # Staged data
| |
− | #: The compute nodes shall mount a filesystem outside of the node on which data is staged. We are not providing Slicer3 with the mechanisms to stage data. We assume that all data is staged outside of Slicer3.
| |
− | # Staged programs
| |
− | #: The compute nodes shall have access to the Slicer3 processing modules. Like the case for data, the processing modules are staged outside of the Slicer3 environment.
| |
− | # Experiment scheduling
| |
− | #: A given experiment shall result in one or more processing jobs being submitted to the computing resources.
| |
− | # Job submission
| |
− | #: Submitting a job to the computing infrastructure shall result in a job submission token such that that job can be
| |
− | #:# monitored for status: scheduled, running, completed
| |
− | #:# terminated
| |
− | # Experiment control
| |
− | #: We shall be able to monitor an experiment to see its status.
| |
− | #: We shall be able to interrupt an experiment. This may involve removing jobs from the queue and terminating jobs in process. | |
− | #: We shall be able to resume an experiment without re-running the entire experiment. Previously terminated jobs will be resubmitted. Previously completed jobs will not be rerun.
| |
− | #: We shall be able to rerun an experiment, overwriting previous results.
| |
− | # Job execution robustness
| |
− | #: Jobs terminating unsuccessfully shall be automatically resubmitted to the computing environment upon the experiment designers request. Jobs may be resubmitted zero times, K times, or until successful.
| |
− | | |
− | == Components ==
| |
− | | |
− | (Steve, fill this section in from the whiteboard.)
| |
− | | |
− | == Thought experiements ==
| |
− | | |
− | Below are a few thought experiments to address the above. These will be used to see how the above needs can be addressed.
| |
− | | |
− | === Makefiles + the loopy launcher ===
| |
− | | |
− | === BatchMake ===
| |