Difference between revisions of "ITK Registration Optimization"
From NAMIC Wiki
Line 68: | Line 68: | ||
#* Status | #* Status | ||
#*# Developed itkCheckerboardImageSource so no IO required | #*# Developed itkCheckerboardImageSource so no IO required | ||
− | #*# | + | #*# Developing tests as listed in the Benchmark section below |
+ | # Develop C-style tests | ||
+ | ## Tests should represent the non-ITK way of doing image analysis | ||
+ | ##* Use standard C/C++ arrays and pointers to access blocks of memory as images | ||
# Develop complete registration solutions for use cases | # Develop complete registration solutions for use cases | ||
#* Status | #* Status | ||
Line 79: | Line 82: | ||
== Compute performance on target platforms == | == Compute performance on target platforms == | ||
* Ongoing | * Ongoing | ||
+ | |||
+ | == Optimize bottlenecks == | ||
+ | * Target bottlenecks | ||
+ | ** Use random, sub-sampling iterator in mean squared difference and cross correlation | ||
+ | ** Multi-thread metric calculation | ||
+ | ** Integrate metrics with transforms and interpolators for tailored performance | ||
= Modular tests = | = Modular tests = |
Revision as of 14:32, 30 March 2007
Home < ITK Registration OptimizationGoals
There are two components to this research
- Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
- Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.
Algorithmic Requirements and Use Cases
- Requirements
- relatively robust, with few parameters to tweak
- runs on grey scale images
- has already been published
- relatively fast (ideally speaking a few minutes for volume to volume).
- not patented
- can be implemented in ITK and parallelized.
- Use-cases
- Intersubject mapping
- Example data set (Kilian)
- fMRI to hi-res brain morphology mapping
- Example data set (Steve Pieper)
- DTI: components of the diffusion tensor
- Example data (Sylvain)
- Intersubject mapping
Hardware Platform Requirements and Use Cases
- Requirements
- Shared memory
- Single and multi-core machines
- Single and multi-processor machines
- AMD and Intel - Windows, Linux, and SunOS
- Use-cases
- Intel Core2Duo
- Intel quad-core Xeon processors (?)
- 6 CPU Sun, Solaris 8 (SPL: vision)
- 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
- 16 core Opteron (SPL: john, ringo, paul, george)
- 16 core, Sun Fire, AMDOpteron (UNC: Styner)
Data
Workplan
Establish testing and reporting infrastructure
- Identify timing tools
- Cross platform and multi-threaded
- Timing and profiling
- Status
- Instrumenting modular tests
- Extending itk's cross-platform high precision timer
- Adding thread affinity to ensure valid timings
- Adding method for increasing process priority
- Profiling complete registration solutions for use cases
- Using CacheGrind on single and multi-core linux systems
- Instrumenting modular tests
- Develop performance dashboard for collecting results
- Each test will report time and accuracy to a central server
- The performance of a test, over time, for a given platform can be viewed on one page
- The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
- Status
- BatchMake database communication code being isolated
- Performance dashboard web pages being designed
Develop tests
- Develop modular tests
- Status
- Developed itkCheckerboardImageSource so no IO required
- Developing tests as listed in the Benchmark section below
- Status
- Develop C-style tests
- Tests should represent the non-ITK way of doing image analysis
- Use standard C/C++ arrays and pointers to access blocks of memory as images
- Tests should represent the non-ITK way of doing image analysis
- Develop complete registration solutions for use cases
- Status
- Centralized data and provide easy access
- Identified relevant registration algorithms
- rigid, affine, bspline, multi-level bspline, and Demons'
- normalized mutual information, mean squared difference, and cross correlation
- Developing traditional ITK-style implementations
- Status
Compute performance on target platforms
- Ongoing
Optimize bottlenecks
- Target bottlenecks
- Use random, sub-sampling iterator in mean squared difference and cross correlation
- Multi-thread metric calculation
- Integrate metrics with transforms and interpolators for tailored performance
Modular tests
All tests send two values to performance dashboards
- the time required
- an measure of the error (0 = no error; 1 = 100% error)
Tests being developed and their parameter spaces
- LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4
- DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
- Factor = 1.5, 2, 3 (i.e., producing up to 600^3 images)
- = 24 tests (approx time on dual-core for all tests = 1.5 minutes)
- BSplineInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
- DimSize = 100, 200 (meaning: 100^3 and 200^3 images)
- Factor = 1.5, 2, 3 (thereby producing up to 600^3 images)
- = 24 tests (approx time on dual-core for all tests = ??)
- SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
- BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> [<outputImage>]
- MeanReciprocalSquaredDifferenceMetricTest
- MeanSquaresMetricTest
- NormalizedCorreltationMetricTest
- GradientDifferentMetricTest
- MattesMutualInformationMetricTest
- MutualInformationMetricTest
- NormalizedMutualInformationMetricTest
- MutualInformationHistogramMetricTest
- NormaalizedMutualInformationHistogramMetricTest
Related Pages
Performance Measurement
- Intel's VTune for Linux ($)
- TAU
- Threadmon: Thread usage/blockage
- TotalView ($)
- PerfSuite (POSIX Threads)
- GProf work-around for multi-threaded apps
- References on multi-threaded profiling and code optimization