FBIRN:NeuroInfStatsTuesAM2006

From NAMIC Wiki
Jump to: navigation, search
Home < FBIRN:NeuroInfStatsTuesAM2006

Data Mining

  • History of Data Mining
    • 1960's Tukey
    • Predictive Modeling (CART)
    • Machine Learning (CS)
    • Relational Databases
    • Web
    • Hardware
  • Convergence of above technologies has lead to data mining
    • There is a lot of hype regarding data mining
  • FBIRN has a small N in the data mining sense
    • Clustering, look for subgroups in the schizophrenia patients
    • fBIRN data has a rich structure
    • Focused on data analysis right now, do algorithm comparisons
    • Make sure that the raw spatial data, timecourse data is available
  • Standardized data for data analyses
    • MRI Machine --> "raw data" --> intermediate data --> Beta-maps, etc
      • inter-subject issues in preprocessing
  • data analysis recommendations
    • give someting standard and consistent across sites
    • should data vetting be performed, runs or subjects tossed based on results of first level analysis
    • time series of variance, auto-correlation
    • having standard datasets available so other disciplines (e.g., computer scientists, statisticians) can look at the data
    • proposal, FIPS 1.0 produces standard results (beta-maps) on phaseII, then spend 6 months improving the pipeline
    • make data available after each major preprocessing step
      • after pre-whitening/smoothing
      • after motion correction (would need to modify FIPS/FEAT)
      • after slice timing correction (would need to modify FIPS/FEAT)
      • currently FIPS final products create the output of FEAT
    • phaseII
      • uncompressed raw data is just over 1Gb/subject visit
      • intermediate time series 20Gb/subject visit
      • may be problematic to store all of the intermediate steps, due to disk space and downloading times
  • what can we do for every subject?
  • what can we do on a subset of subjects?
  • FIAC
    • standard data products
    • 1st level
      • raw
      • motion corrected
        • motion parameters
      • slice-time corrected
      • meta-data, details of the paradigm
        • best practices, what we think the design matrix should be
        • what happened at what time, how long the blocks were, etc.
    • 2nd level
      • constrast copes in standard space
      • varcopes in standard space
      • meta-data,
        • what effect
        • degrees of freedom
        • smoothness
        • T image
        • threshold
  • Standard Data Products Plan
    • Establish a working group
    • Establish the requirements of what is released besides raw data and when
    • QA reports on raw and on intermediate data products
  • Data QA
    • Variance of image over time series
    • Variacne oc (Xt-Xt+1) over time series
      • scale median to 100
    • Good for residuals
      • (Outlier count per image / Expected outlier count ) * 100 for each image yields a time series
      • Normality test at each voxel, look at how many p values smaller than 0.05
        • (Number of significant voxels / Expected significant voxels) * 100