Difference between revisions of "Slicer3:Remote Data Handling"

From NAMIC Wiki
Jump to: navigation, search
m (Text replacement - "http://www.slicer.org/slicerWiki/index.php/" to "https://www.slicer.org/wiki/")
 
(315 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''[[Slicer3:Developers#Slicer_3_Projects | Back to Slicer3 Projects List ]]'''
+
<big>'''Note:''' We are migrating this content to the slicer.org domain - <font color="orange">The newer page is [https://www.slicer.org/wiki/Slicer3:Remote_Data_Handling  here]</font></big>
 
 
= Current status of Slicer's (local) data handling =
 
Currently, MRML files, Xcede catalog files, XNAT archives and individual datasets are all loaded from local disk. All remote datasets are downloaded (via web interface or command line) outside of Slicer:
 
 
 
[[image:DataLoadingCurrent.png]]
 
 
 
= Goal for how Slicer would upload/download from remote data stores =
 
Eventually, we would like to download uris remotely or locally from the Application itself, and have the option of uplaoding to remote stores as well (here's a sketch -- does this look right?):
 
 
 
[[image:DataLoadingTarget.png]]
 
 
 
= Suggested first pass implementation driven by two use cases =
 
For BIRN, we'd like to demonstrate two use cases:
 
 
 
* First, is loading a combined FIPS/FreeSurfer analysis, specified in an Xcede catalog (.xcat) file, and view this with Slicer's QueryAtlas.
 
* Second, is running a batch job in Slicer that processes a set of remotely held datasets. Each iteration would take as arguments the XML file parameterizing the EMSegmenter, the uri for the remote dataset, and a uri for storing back the segmentation results.
 
 
The subset of functionality we'd need is shown below:
 
 
 
[[image:DataLoadingStartPlan.png]]
 
 
 
...
 
Our approach in the first use case would be to manually upload a test Xcede catalog file and its constituent datasets to some place on the SRB. We'll keep a copy of the catalog file locally, read it and query SRB for each uri in the .xcat file.
 
 
 
= Slicer-side things to implement =
 
== ITK-based mechanism handling remote data (for command line modules, batch processing, and grid processing) (Nicole) ==
 
 
 
This one is tenatively on hold for now.
 
 
 
== vtkMRMLStorageNode methods for handling remote data (for loading and saving data for interactive use) (Nicole) ==
 
 
 
The first goal is to figure out what workflows to support, and a good implementation approach.
 
 
 
Currently, '''Load Scene''', '''Import Scene''', and '''Add Data''' options in Slicer all encapsulate two steps:
 
* locating a dataset, usually accomplished through a file browser, and
 
* selecting a dataset to initiate loading.
 
Then MRML files, Xcede catalog files, or individual datasets are loaded from local disk.
 
 
 
For loading remote datasets, the following options are available:
 
* break these two steps apart explicitly (easiest option),
 
* bind them together under the hood,
 
* or support both of these paradigms.
 
 
 
===Breaking apart "find data" and "load data":===
 
 
 
'''Possible workflow A'''
 
* User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
 
* From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file listed in the archive is downloaded to /tmp directory (always locally cached) by the Download Manager, and then loaded into Slicer via a vtkMRMLStorageNode method when download is complete.
 
 
 
'''Possible workflow B'''
 
* User downloads .xcat or .xml (MRML) file to disk using the HID or an XNAT web interface
 
* From the Load Scene file browser, user selects the .xcat or .xml archive. If no locally cached versions exist, each remote file in the archive is downloaded to /tmp (only if a flag is set) by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete. (How does load work if we don't save to disk first?)
 
 
 
'''Possible workflow C'''
 
* User locates a MRML file, .xcat archive, or individual dataset on the HID or an XNAT web interface
 
* User types the uri into the ''Load Scene'', ''Import Scene'', or ''Add Data'' interfaces.
 
* If no locally cached versions exist, each remote file in the archive is cached to /tmp by the Download Manager, and loaded directly into Slicer via a vtkMRMLStorageNode method when download is complete.
 
 
 
In each workflow, the data gets saved to disk first and then loaded into Slicer. Here's a first pass at how things might work -- we can discuss at meeting:
 
 
 
[[image:DataLoadingSketch.png]]
 
 
 
===Or, bundling together "find data" and "load data":===
 
 
 
'''Possible workflow D'''
 
 
 
In this workflow, Slicer would make calls to HID or XNAT webservices to determine what data of interest is available... Questions:
 
* How might this work?
 
* Do we really want to re-implement functionality in the HID web interface?
 
* Maybe Slicer can implement a workflow (A-C) but also offer a simplified BIRN query interface that has functionality like:
 
** Request BIRNIDs for all subjects who have a complete FIPS/FreeSurfer analysis
 
** Request an xcat for one of these BIRNIDs
 
 
 
=== Saving Data back to remote site ===
 
* Since we have no plan for where to save MRML files on HID, can we have a webservices function we can call from Slicer that writes a file to /dev/null on HID in the meanwhile?
 
 
 
=== What data do we need in an .xcat file? ===
 
For the fBIRN QueryAtlas use case, we need a combination of '''FreeSurfer morphology analysis''' and a '''FIPS analysis''' of the same subject. With the combined data in Slicer, we can view activation overlays co-registered to and overlayed onto the high resolution structural MRI using the FIPS analysis, and determine the names of brain regions where activations occur using the co-registered morphology analysis.
 
 
 
The required analyses including all derived data are in two standard directory structures on local disk, and *hopefully* somewhere on the HID within a standard structure (check with Burak). These directory trees contain a LOT of files we don't need... Below are the files we *do* need for fBIRN QueryAtlas use case.
 
 
 
====FIPS analysis (.feat) directory and required data====
 
For instance, the FIPS output directory in our example dataset from Doug Greve at MGH is called sirp-hp65-stc-to7-gam.feat. Under this directory, QueryAtlas needs the following datasets:
 
* sirp-hp65-stc-to7-gam.feat/reg/example_func.nii
 
* sirp-hp65-stc-to7-gam.feat/reg/freesurfer/anat2exf.register.dat
 
* sirp-hp65-stc-to7-gam.feat/stats/(all statistics files of interest)
 
* sirp-hp65-stc-to7-gam.feat/design.gif (this image relates statistics files to experimental conditions)
 
 
 
====FreeSurfer analysis directory, and required data ====
 
For instance, the FreeSurfer morphology analysis directory in our example dataset from Doug Greve at MGH is called fbph2-000670986943. Under this directory, QueryAtlas needs the following datasets:
 
 
 
* fbph2-000670986943/mri/brain.mgz
 
* fbph2-000670986943/mri/aparc+aseg.mgz
 
* fbph2-000670986943/surf/lh.pial
 
* fbph2-000670986943/surf/rh.pial
 
* fbph2-000670986943/label/lh.aparc.annot
 
* fbph2-000670986943/label/rh.aparc.annot
 
 
 
=== What do we want HID webservices to provide? ===
 
 
 
* Question: are FIPS and FreeSurfer analyses (including QueryAtlas required files listed above) for subjects available on the HID yet?
 
 
 
* The BIRN HID webservices shouldn't really need to know the subset of data that QueryAtlas needs... maybe the web interface can take a BIRN ID and create a FIPS/FreeSurfer xcede catalog with all uris (http://....) in the FIPS and FreeSurfer directories, and package these into an Xcede catalog.
 
 
 
* The catalog could be requested and downloaded from the HID web GUI, with a name like .xcat or .xcat.gzip or whatever. QueryAtlas could then open this file (or unzip and open) and filter for the relevant uris for an fBIRN or Qdec QueryAtlas session.
 
 
 
* Maybe later, this catalog could be requested programmatically from a Slicer webservices client, that gives a particular BIRN ID. (For now, it's reasonable to go thru the HID GUI).
 
 
 
* Then, for each uri in a catalog (or .xml MRML file), we'll use curl to download; so we need all datasets to be publicly readable.
 
 
 
* Can we create a directory (even a temporary one) on the HID for Slicer scene uploads?
 
 
 
* We need some kind of upload service, a function call that takes a dataset and a BIRNID and uploads data to appropriate directory on HID.
 
 
 
[[ Slicer3:XCEDE_use_cases | See this page for more discussion of QueryAtlas's current use of Xcede catalogs, and assumptions... ]]
 
 
 
== Asynchronous I/O Manager (Wendy0 ==
 
 
 
vtkMRMLStorageNode superclass needs to have methods which handle remote or local data loading, whether the uris are contained in an xcat or mrml file. Kind of like this:
 
 
 
* Each subclass of vtkMRMLStorageNode will call the superclass method first.
 
* Superclass method will look at uri, and decide if dataset is local or remote.
 
* If local, the subclass will load the data and return.
 
* If remote, the superclass will check to see if the data is cached on disk (in /tmp or wherever).
 
* If data is cached, subclass method will load that dataset from disk and return.
 
* If data is not cached, subclass method will spawn an independent thread of control that will interact with the Asynchronous I/O Manager, passing it the type of storage node required for the dataset:
 
** Thread will create a new download entry and observe the cancel button
 
** it will make whatever call it needs to download (http)
 
** it will display progress on a progress meter.
 
** and when complete, it will call a method on the vtkMRMLStorageNode subclass to load dataset from local cache.
 
 
 
[[image:DataIOManager.png]]
 

Latest revision as of 18:07, 10 July 2017

Home < Slicer3:Remote Data Handling

Note: We are migrating this content to the slicer.org domain - The newer page is here