Andy: Update from Wiki

2006-12-18T13:26:28Z

Update from Wiki

New page

= Data Mining =

* History of Data Mining
** 1960's Tukey
** Predictive Modeling (CART)
** Machine Learning (CS)
** Relational Databases
** Web
** Hardware
* Convergence of above technologies has lead to data mining
** There is a lot of hype regarding data mining
* FBIRN has a small N in the data mining sense
** Clustering, look for subgroups in the schizophrenia patients
** fBIRN data has a rich structure
** Focused on data analysis right now, do algorithm comparisons
** Make sure that the raw spatial data, timecourse data is available
* Standardized data for data analyses
** MRI Machine --> "raw data" --> intermediate data --> Beta-maps, etc
*** inter-subject issues in preprocessing
* data analysis recommendations
** give someting standard and consistent across sites
** should data vetting be performed, runs or subjects tossed based on results of first level analysis
** time series of variance, auto-correlation
** having standard datasets available so other disciplines (e.g., computer scientists, statisticians) can look at the data
** proposal, FIPS 1.0 produces standard results (beta-maps) on phaseII, then spend 6 months improving the pipeline
** make data available after each major preprocessing step
*** after pre-whitening/smoothing
*** after motion correction (would need to modify FIPS/FEAT)
*** after slice timing correction (would need to modify FIPS/FEAT)
*** currently FIPS final products create the output of FEAT
** phaseII
*** uncompressed raw data is just over 1Gb/subject visit
*** intermediate time series 20Gb/subject visit
*** may be problematic to store all of the intermediate steps, due to disk space and downloading times
* what can we do for every subject?
* what can we do on a subset of subjects?
* FIAC
** standard data products
** 1st level
*** raw
*** motion corrected
**** motion parameters
*** slice-time corrected
*** meta-data, details of the paradigm
**** best practices, what we think the design matrix should be
**** what happened at what time, how long the blocks were, etc.
** 2nd level
*** constrast copes in standard space
*** varcopes in standard space
*** meta-data,
**** what effect
**** degrees of freedom
**** smoothness
**** T image
**** threshold
* Standard Data Products Plan
** Establish a working group
** Establish the requirements of what is released besides raw data and when
** QA reports on raw and on intermediate data products
* Data QA
** Variance of image over time series
** Variacne oc (Xt-Xt+1) over time series
*** scale median to 100
** Good for residuals
*** (Outlier count per image / Expected outlier count ) * 100 for each image yields a time series
*** Normality test at each voxel, look at how many p values smaller than 0.05
**** (Number of significant voxels / Expected significant voxels) * 100

FBIRN:NeuroInfStatsTuesAM2006 - Revision history

Andy: Update from Wiki