Slicer Grid Computing

From NAMIC Wiki
Jump to: navigation, search
Home < Slicer Grid Computing

Items discussed

  • Stephen Aylward (KitWare) presented BatchMake interface to Condor
  • Neil Jones (BIRN) presented GridWizard
  • Group discussed different data storage technologies: MIDAS, BIRN Federated Data/XNAT/HID, etc.

Where we are

Both BatchMake and GridWizard have (partially working) demos of launching multiple processes on remote machines. However, there's still a set of questions as to how these systems will interface with the Slicer environment to meet community needs. To that end, it's worth figuring out the answers to a few questions:

  • What is the main community we should target? Will there be different modes of interaction for algorithm developers and domain scientists?
  • What are two specific use cases of a Slicer user interacting with batch processing facilities through Slicer? Where does data come from, where does it need to end up? What data needs to get passed back to Slicer itself?
  • What are two specific use cases of a NAMIC user interacting with batch processing facilities, perhaps outside of Slicer?
  • Stephen (Aylward) presented a soup-to-nuts backend for managing "experiments", data, and displaying metrics computed from images in graphical form. Neil (Jones) discussed the infrastructure to manage the running of large number of jobs and the portal dashboard used to track the jobs.
    • What facilities within Slicer or within other environements (e.g. portal or stand-alone applications) are necessary to track and manage these jobs.
  • What look-and-feel of modules in Slicer is required? Do we need special "grid"-type parameters in the execution model, or should we just adhere to "min parameter", "max parameter", "step" notion?
    • Can we handle configuration of either tool out of band from Slicer?
    • Can we handle monitoring out of band from Slicer?

Until now, we have been focused on the infrastructure development: developing tools to perform generic batch processing. The tools are at a stage where progress (aside from just improving the tools but not deploying them) can be made in applying these to specific research needs.

Introduction

NA-MIC and BIRN, in many respects, have some similar problems but in different domains. At BIRN, we need to launch lots of shape analysis tools from a portal environment. At NAMIC, we would like to launch lots of shape analysis tools from a rich client application (Slicer). While the "environment" is different, the constant is that we both need to launch large computationally intensive jobs on distributed clusters. The logistical problem is straightforward to see: how can we launch and manage many jobs in a repeatable and standardized fashion, without tailoring it specifically to a particular cluster? One might think that these problems are trivially solved; after all, haven't lots of people faced this problem before? Well, they have, and there are many solutions available, but the fact that nobody has been able to get these things to work in our respective contexts is indicative that there *is* a problem, even if nobody can really explain *why* the problem exists in the first place. But we have this shared problem, and we have similar infrastructural needs, so the project have collaborated.

It's not always clear to people in the respective projects---aside from upper management, that is---what the benefit of the NAMIC-BIRN collaboration is, at least at anything more than NAMIC using BIRN's SRB system as a distributed file system mainly "because it's there" and setting up a whole new AFS system is probably more annoying than just relying on SRB. But this, I claim, is a short-sighted complaint. NAMIC's value-add to BIRN is that the specific usage requirements and software distribution channel available from NAMIC will help improve the GridWizard software by making it clear which features need to be improved and which features are window dressing; it is precisely this feedback that has been missing from most other Grid Scheduling projects, which explains why the road to hell is paved with useless globus software. The value-add to NAMIC for relying on BIRN's job launching infrastructure is that computational jobs can be handled faster, started from less-powerful machines (i.e., you work on a laptop and launch a complicated analysis, and pick it up later), and opens up a new use for Slicer as an exploratory data analysis tool.

Since we have similar problems, we've developed similar frameworks. BIRN has GridWizard, a project that's been in development for the better part of a year. Slicer can accept plugins from KitWare that work over BatchMake. Still, none of these have been terribly helpful to actual researchers yet since they have been toolkits under development, rather than toolkits with specific use cases. Enough with the toolkit development already---time to actually generate some results!

Use Cases

Here are two base use cases that we can consider as an initial step in coordinating the two projects, such that each project benefits more than not. These two use cases are "deliverables" from BIRN to NAMIC, but on different time frames---the first use case should be completed, documented, demonstrated, and in the hands of users by early August. The second use case should be completed, documented, demonstrated, and in the hands of users by mid-to-late September.

EM Segmentation

The EM (Expectation Maximization) Segmenter is an algorithm that performs a two-step iterative optimization procedure to detect "stuff" from "non-stuff" (where usually "stuff" is "white matter" and "non-stuff" is everything that isn't). The algorithm itself is slow, but frequently needs to be run on many data sets as an initial step in some larger scientific analysis problem. One of the main problems with EM Segmentation is that configuring a set of parameters for a segmentation run is, mildly put, a bit of an art rather than a specific procedure. We propose that Slicer's main function here is as an exploratory data analysis platform:

  1. User loads an image into Slicer
  2. User goes through a several step process in configuring and initializing the EM algorithm
  3. User performs EM segmentation on a subsection of a single volume
  4. User goes back to step 2, until satisfied that the EM iterations are converging to a useful (though local) optimum.
  5. User saves a "parameter file" [need more info here about what this is] and then performs the EM segmentation algorithm on the whole volume.
  6. User looks wistfully at a directory, and wishes that she could just "run it on all those files", rather than repeating the process from step 1.

There are a couple of problems with these last two steps, including the general warning from Slicer 101 claiming: don't run this on a machine with less than 2 Gb of RAM, and even that might not be enough sometimes. This is an obvious case for where we can benefit from performing an initial analysis on a cluster, and then performing a batch analysis "in the large" on a cluster.

What we propose as a software deliverable comes in four parts:

  1. A configured GridWizard system for the NAMIC cluster that a user can install into his or her own workspace (users can reconfigure for other clusters, but this may not be terribly friendly)
  2. An EM Segmenter Slicer command-line module that allows a user to perform a single segmentation on a cluster, or a batch segmentation on a cluster
  3. Documentation on how to use both of above
  4. A roll, RPM, or tarball for EMSegmenter for 32-bit and 64-bit Linux cluster so that it can easily be installed on many compute nodes simultaneously

In these use cases, we presume that the user is, virtually speaking, inside the SPL proper and does not need to contend with gatekeeper's s/KEYS password system. We also assume the following "preamble" to each use:

  1. User loads an image into Slicer
  2. User configures EM Segmenter
  3. User performs small segmentation locally

The usage we propose of EMSegmenter is in two parts: simple, and batch. "Batch" here simply means "the processing of multiple files simultaneously"; in each case, the job physically runs on a batch-oriented processing system.

  • Simple mode
  1. Preamble...
  2. User launches Segmentation on a cluster (specifically, the NAMIC cluster)
  3. Window pops up with the task to be performed
  4. User reviews the "task list" and clicks "run"
  5. Job starts
  6. Job output ends up on an sshfs-mounted directory
  7. User can reload results into Slicer, check for accuracy.
  • Mega-mode
  1. Preamble...
  2. User chooses a directory
  3. User enters a file glob (i.e., filter) into a parameter box in Slicer; this file glob applies to files in the chosen data directory but is not recursive; the file glob may not adhere #: to strict POSIX guidelines (? or [] might not be implemented)
  4. User choose "launch"
  5. Window pops up with the list of tasks that need to be performed
  6. User reviews the task list, and clicks "run"

In either case, a future version should include some additional features from the "job manager" window that pops up when the user wants to run batch jobs:

  1. Ability to monitor jobs
  2. Ability to inspect job outputs (PBS stderr streams, etc)
  3. Ability to inspect the job artifacts (parallelization scripts)
  4. Ability to choose the scheduling algorithm (and all that implies): do it all statically, or on a first-come-first-served basis.
  5. Ability to select multiple clusters, connected remotely

Many of these features are already present in GridWizard, but validation that they work with the above use case needs to be done _after_ the software is successfully demonstrated.

SPHARM: Spherical Harmonic-based Brain Shape Analysis

In a similar vein to LDDMM, SPHARM is a numerical algorithm aimed at analyzing shape differences between medical volumes. [this description is probably wrong --- need to read the SPHARM paper, obviously] The overall procedure involves taking a valume and essentially performing a Fourier-like analysis where the volume is decomposed into spherical harmonics (the spherical harmonics here taking the place of sinusoidal functions in a standard fourier transform). If you have two volumes and you perform this mapping, you can examine the coefficients of corresponding harmonics to see where bulges or shrinkage occur, etc.

A key step in the SPHARM processing is the initial mapping of an arbitrary volume onto spherical coordinates. The software that performs this is part of the =XXXX= (little help here?) toolkit, using the command =XXXX= (help here too!). We would like to be able to take a large list of volumes and process these through the first step in parallel. The time savings is obvious: 60 volumes acquired throughout a day can be processed in parallel in under half an hour one a moderate cluster; on a single processor, it could take half a week or longer.

The EMSegmenter use cases is bologna software: it's cold and best thinly Sliced. SPHARM, and the way that it is used in dtMRI imaging, is different. Here, we can rely on two things:

  1. The user does not need to start slicer in order to start a job
  2. The user is comfortable with command-line applications

This is an ideal situation for =gwiz-run=, a command-line application scheduler.

The software we propose for this use case comes in three parts:

  1. A configured GridWizard for the NAMIC cluster, which can be installed by a user into his or her own workspace
  2. A software package (RPM, tarball, or rocks roll) for 32- and 64-bit clusters to install the SPHARM software
  3. A set of "template" commands that the user can apply to process large numbers of images

Like the EMSegmenter use case above, we assume that the user is virtually inside the SPL and does not have to contend with the gatekeeper login system, or is using the BIRN cluster. Similarly, the cluster will need to have an =ssfs= filesystem mounted to some remote data repository (or the cluster needs to be the repository). The usage we propose of SPHARM is as follows:

  1. User gathers a large set (>= 25, <= 500) of files to process
  2. User picks a set of configuration parameters
  3. User goes to a NAMIC wiki page listing a "template command"
  4. User runs the =gwiz-gui= application
  5. User cut-and-pastes the template command into the gui, edits it as appropriate
  6. User reviews task list, optionally chooses to change the scheduling algorithm's parameters, clicks "Go"

The "nice-to-have's" from EMSegmenter apply here as well.

The Plan