Mbirn: Data provenance:Tcons:20061116AgendaAndMinutes

From NAMIC Wiki
Jump to: navigation, search
Home < Mbirn: Data provenance:Tcons:20061116AgendaAndMinutes


Kick off call for the data provenance specification working group.

Thursday Nov 16, 2006, 1pm Eastern. Dial 1-218-936-1000 and enter the Conference ID 29497 followed by # key.

For reference: DP Specification Table, XML Schema


(please add yourself if you're missing)

  • Nicole Aucoin - moderator
  • Syam Gadde
  • Katie Hayes
  • Mike Mendez
  • Dave Keator
  • Dingying Wei



  • Dave Keator gave an overview of how fBIRN has deal with DP issues
  • Pipeline Analysis power point: blue circles with solid outline are the sub-pipelines and then the blue circles with dashed outlines are binaries that are run from the pipelines
  • they define a pipeline and describe it to the database, then someone else can build on it (extend the processing functionality, add new binaries) and define it to the database so that the output can be incorporated
  • if each pipeline were to add provenance tags to the xml that goes along with the derived data, an automatic parsing step could be performed by the data base to stor the provenance information
  • consider serial versus parallel execution
  • they're currently parsing out the data provenance using a priori information
  • processing scripts need to output special xml tags to descripe the pipelines, rather than assume that each step is serially followed by the next
  • Dingying can show Nicole the data base schema she came up with to deal with some of these special cases
  • Need to update the schema at the processStep level
  • fBIRN needs to specify an input file from the SRB, not just it's local file system location (they have a workflow where a file is downloaded from SRB, processed locally, and uploaded, need to point to that original file)
    • Dingying uses a script, set process xml file, which has information about the SRB collection, and replaces the local file names with the SRB information
  • Dave or Dingying will put their binaries in the tool list


  • XCEDE tools will support DP - Syam will add to the tool list
  • gradient unwarping workflow - Mike M. will add to the tool list
    • is linux based - 1.0 does not accept all info
    • Nicole will get in touch with Spencer and Karl Helmer to add add it in
  • sample dp wrappers for third party software - do we need to support windows batch files? for now, no, assume can run shell scripts in cygwin windows
  • Nicole will add the FreeSurfer and BIRNDUP tools in more detail
  • If you're reading this and have a tool that you use during BIRN processing, feel free to sign up for an account and add it in to the tools list

Adding fields:

  • Randy Notestine wanted hostname added
  • can one annoted elements in the xml schema? yes.
  • Dingying: also keep track of the overall package version (required) and command version (optional)
  • separate the command line into inputs and outputs? yes, have both the original string and the split out input and output
  • use hostname to specify the machine uniquely, architecture will sepcify the machine architecure
    • change: take out machine and replace it with hostname and architecture
  • some indication of where to get the binary? maybe optional url? cvs repository? keep it optional
  • build number? optional extra on program

Action Items:

  • Updates to the tool list for next call in two weeks, as assigned
  • Nicole will update the wiki pages with new fields and provide sample xml files