ITK Registration Optimization

From NAMIC Wiki
Revision as of 17:01, 30 March 2007 by Aylward (talk | contribs)
Jump to: navigation, search
Home < ITK Registration Optimization

Goals

There are two components to this research

  1. Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
  2. Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.

Algorithmic Requirements and Use Cases

  • Requirements
    1. relatively robust, with few parameters to tweak
    2. runs on grey scale images
    3. has already been published
    4. relatively fast (ideally speaking a few minutes for volume to volume).
    5. not patented
    6. can be implemented in ITK and parallelized.

Hardware Platform Requirements and Use Cases

  • Requirements
    1. Shared memory
    2. Single and multi-core machines
    3. Single and multi-processor machines
    4. AMD and Intel - Windows, Linux, and SunOS
  • Use-cases
    1. Intel Core2Duo
    2. Intel quad-core Xeon processors (?)
    3. 6 CPU Sun, Solaris 8 (SPL: vision)
    4. 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
    5. 16 core Opteron (SPL: john, ringo, paul, george)
    6. 16 core, Sun Fire, AMDOpteron (UNC: Styner)

Data

Workplan

Establish testing and reporting infrastructure

  1. Identify timing tools
    1. Cross platform and multi-threaded
    2. Timing and profiling
    • Status
      1. Instrumenting modular tests
        • Extending itk's cross-platform high precision timer
        • Adding thread affinity to ensure valid timings
        • Adding method for increasing process priority
      2. Profiling complete registration solutions for use cases
        • Using CacheGrind on single and multi-core linux systems
  2. Develop performance dashboard for collecting results
    1. Each test will report time and accuracy to a central server
    2. The performance of a test, over time, for a given platform can be viewed on one page
    3. The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
    • Status
      1. BatchMake database communication code being isolated
      2. Performance dashboard web pages being designed

Develop tests

  1. Develop modular tests
    • Status
      1. Developed itkCheckerboardImageSource so no IO required
      2. Developing tests as listed in the "Modular Tests" section below
  2. Develop C-style tests
    1. Tests should represent the non-ITK way of doing image analysis
      • Use standard C/C++ arrays and pointers to access blocks of memory as images
  3. Develop complete registration solutions for use cases
    • Status
      1. Centralized data and provide easy access
      2. Identified relevant registration algorithms
        • rigid, affine, bspline, multi-level bspline, and Demons'
        • normalized mutual information, mean squared difference, and cross correlation
      3. Developing traditional ITK-style implementations

Compute performance on target platforms

  • Ongoing

Optimize bottlenecks

  • Target bottlenecks
    • Use random, sub-sampling iterator in mean squared difference and cross correlation
    • Multi-thread metric calculation
    • Integrate metrics with transforms and interpolators for tailored performance

Modular tests

All tests send two values to performance dashboards

  • the time required
  • an measure of the error (0 = no error; 1 = 100% error)

Tests being developed and their parameter spaces

  1. LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 1 minute)
  2. BSplineInterpTest <numThreads> <dimSize> <factor> <orderOfSpline> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • Order = 3
    • = 16 tests (approx time on Core2Duo for these tests = 10 minute)
  3. SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 30 minute)
  4. BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> [<outputImage>]
  5. MeanReciprocalSquaredDifferenceMetricTest
  6. MeanSquaresMetricTest
  7. NormalizedCorreltationMetricTest
  8. GradientDifferentMetricTest
  9. MattesMutualInformationMetricTest
  10. MutualInformationMetricTest
  11. NormalizedMutualInformationMetricTest
  12. MutualInformationHistogramMetricTest
  13. NormaalizedMutualInformationHistogramMetricTest

Related Pages

Performance Measurement