|This wiki page can be used to provide supplemental information, links, and discussion for topics covered in the VIZBI 2010 conference in Heidelberg March 3-5, 2010 at the EMBL.
People wishing to add information to this page, please feel free to request and account and mention VIZBI in the account request comments. You will need to provide a valid email address (to keep spammers out of the this site).
- 1 VIZBI Links
- 2 Wednesday
- 3 Thursday
- 4 Friday
- 5 Session Chair Summaries
Special Issue of Nature Methods
The speakers collaborated on a set of papers summarizing the current state of bioimaging visualization that were published as a special issue of Nature Methods.
Comments on friendfeed
Community notes are available on friendfeed: http://friendfeed.com/vizbi2010
- Movies, slide shows, and documentation for the Query Atlas project mentioned during the talk can be found on this page.
- Open source software for MRI processing can be found at the 3D Slicer web site.
- The National Alliance for Medical Image Processing provides resources and opportunities for collaboration on image analysis topics.
- Pavel Tomancak
- SPIM information
- embryo movies
- Reconstruction of zebrafish early embryonic development by scanned light sheet microscopy Keller PJ, Schmidt AD, Wittbrodt J, Stelzer EH. Science. 2008 Nov 14;322(5904):1065-9. Epub 2008 Oct 9.
- Mosaicing of Single Plane Illumination Microscopy Images Using Groupwise Registration and Fast Content-Based Image Fusion Stephan Preibisch, Torsten Rohlfing, Michael P. Hasak, Pavel Tomancak, Proceedings of SPIE Medical Imaging 2008: Image Processing, SPIE, Bellingham, Wash., pp. 69140E-1-69140E-8, 2008
Matt Hibbs Matt gave a beautifully clear into to expression array analysis. He also discussed his own tool HIDRA enables comparison of several heat maps, each from different experiments.
Oliver Kohlbacher From Spectra to Networks - Visualizing Proteomics Data Again, very clear into to proteomics methodology. Shotgun proteomics means fragmenting proteins using enzymes (e.g., trypsin), then separate using mass spectrometry. Tandom-MS the first separation is via mass, then each peak is further broken down using direct collisions (collision-induced dissociation (CID). This enables determination of the sequence.
2M maps are obtains: one dimension is charge/mass ratio, the other is retention time.
Role of visualization in proteomics: quality, manual/low-throughput analysis; validate automatic analyses (this is where the field is heading, more automation).
Primarily visualization is mass spectra themselves > signal process reduces them to 'stick' spectra (reduce data size by an order of magnitude).
2D mass spectra - one of the problems is simply getting them into memory: they are up to 200GB.
Question: is that even with the 'stick' specrta?
A key problem is lack of data standards.
One dimension/data volume reduction is to fit the spectra to a mathematical model, then you can replace the data by the model.
Retention time and mass (the two primary dimensions) do not have a 'biological' meaning.
Can compare two samples (e.g., disease vs healthy tissue), can create expression profiles that are similar to gene expression profiles.
Key challenges: data volume (hence need data reduction); however, experimentalists always need to go back to the raw data/spectra; integration with other omics data and networks; rapidly changing experimental techniques (difficult to keep up).
Key difference to gene expression profiling: visualization methods are the same, but the key difference is that with protein expression we need to go back to the raw data.
Uniqueness of sequence fragments: antibodies recognize proteins uniquesly with just 9 residues: 8 residues is already sufficient to have on average only one match in human.
"We are back to sending hard disks by mail" - same situation as for image data.
Metabolomics Data (Alexander Goesmann) They take genomes of organisms (e.g., bacterial genome), then reassemble pathways using a tool called 'CARMEN'. They visualize in CellDesigner.
They also compare two genomes: first they have metabolic pathways from one organism, then map onto that information about the comparison, typically showing which genes are missing.
"Metabolome is closer to the actual phenotype than other omics data"
Human have perhaps ~2,500 metabolites; compared with ~1 million proteins, 150,000 transcripts.
Nice illustration of the need for different experimental approaches: no one approach can find all metabolites.
Typical workflow: raw spectra > stick spectra > table of compounds > heat+dendrogram > network enhancement
Nice spectra of beer :) Certainly makes the work relevant.
Nice PCA plot showing clear separation of the metabolitic profile between normal and disease patients: this shows the power of the method to find biomarkers.
Rapid Inference and Re-engineering of Biological Circuits (Nitin Baliga)
Really nice 'fitness landscape' pie-plots.
Genotype > phenotype slide: really clear illustration of the elements of systems biology, put things very nicely in place.
'Architecture of an enabling knowledgebase' - very nice concise summary of the processes, and their relationships.
Biochemical Networks (Hiroaki Kitano)
Great point: circuit diagram allows any engineer to perfectly repeat the functionality - clearly that in biology the same thing is going on, since cells repeat their function perfectly: the what we need is a visualization of function that has the same properly. Hence he points to the inadequacy of the standard pathway representation.
- personal communication: future versions of these tools will include 3D and animation
- Ball Project: http://www.ball-project.org/
- Core library is LGPL
- python bindings to C++
Chris North's keynote Required reading for us: Pirolli & Card, PARC, 'Analysts' Process'.
'Foraging' vs 'Sense-making loop' = the later is the one where you tell a story, e.g., where you in the systems biology review, we first reviewed the 'foraging' then in the 'pathway editing' it was about the sense-making loop, telling the story you found from the foraging, in this case the story is told by creating or editing a pathway.
Sequences and Genomes
David Gordan Sequencing data is generated faster than it can be written to disk.
Historical perspective was interesting to see how far we have come - screenshots from 1991 look ancient :)
- major step of Fred and Consed: color the regions where errors are more likely
- asking the audience: "what visualization issue/challenge would you like to ask this audience?: good idea to invite speakers to do that :)
- finishing is the process of making the assembly correct
- Monica Zoppe
Alignments and Phylgenies
Session Chair Summaries
- Scale issues: space and time. Extract anatomy - linked back to diagnosis and interventions.
- Issues: segmentation, registration (common across other domains too). Subject data mapped to reference for quantitative observations.
- Interaction: academic software need more user focus and designers.
- Exploration vs. Presentation - knowing who the user will be and exposing the correct visualization for the circumstances.
- Sharing and linking data across modalities - becoming more crtical.
- Common software, hardware, and libraries across application domains
- Representing uncertainty - beautiful drawings can be dangerous (can't see missing pieces when showing over-smoothed data).
- Not just visualization - also interaction
- Multi-dimsional time evolving data needs more attention (3D+Time, derived data and features over time...)
- Quantification of images should use similar strategies to systems biology (similar high throughput requirements)
- Links back to raw data (particularly for images, but maybe not for massspec or other datatypes).
- Generally want to be able to browse from raw data to conclusion and tap in at any level, but this is not actually possible for many datatypes.
- Systems Biology
- Many challenges in 'seeing the big picture' of *omics
- Suffering from scaling issues
- some technologies generate way more data than before
- current tools do not scale up well (1 or 2 orders of magnitude)
- Possible cross-linkages across domains for dealing with large data issues
- Reuse of components (like heatmaps) would save time and improve quality
- Well designed
- Need to be easy to connect to new tools (often this is hard today)
- Open Source helps enable reuse
- Sequences and Genomes
- Clustering: used in microscopy, microarrays, systems biology... but there are misleading aspects (e.g. mobile solution from Bang Wong)
- Uncertainty: will be represented in different ways depending on datatype, but good examples will be welcome
- Comparison: tools to overlay, differentiate, interact...
- Structure of the meeting: synergy and interrelationships exciting, but rigid structure of sessions makes interconnections difficult, so maybe organize around themes like clustering. Perhaps a workshop type meeting.
- Interaction: tools for images and automation dynamically combined.
- Possibly organize the meeting around disease research that draws on all these techniques.
- Middleware for biological visualization?
- Macromolecular Structures
- Avoid 'theory' and stick with practical working issues
- Keynotes were aspirational - many interesting things yet to do
- Large scale of data presents a great opportunity for research
- Alignments and Phylogenies
- Agreement with other ideas already presented
- Visualization for debugging - sometimes not needed once the technique works
- Heatmaps are matrices - lots of clustering and analysis techniques available from many fields
- VIZBI 2011?
- Possibly in USA?
- Provide feedback to EMBO - Fill this out since it helps secure future funding
- Videos will be available 'real soon' (some editing required)