Difference between revisions of "ITK Registration Optimization"

From NAMIC Wiki
Jump to: navigation, search
Line 68: Line 68:
 
#* Status
 
#* Status
 
#*# Developed itkCheckerboardImageSource so no IO required
 
#*# Developed itkCheckerboardImageSource so no IO required
#*# Developed
+
#*# Developing tests as listed in the Benchmark section below
 +
# Develop C-style tests
 +
## Tests should represent the non-ITK way of doing image analysis
 +
##* Use standard C/C++ arrays and pointers to access blocks of memory as images
 
# Develop complete registration solutions for use cases
 
# Develop complete registration solutions for use cases
 
#* Status
 
#* Status
Line 79: Line 82:
 
== Compute performance on target platforms ==
 
== Compute performance on target platforms ==
 
* Ongoing
 
* Ongoing
 +
 +
== Optimize bottlenecks ==
 +
* Target bottlenecks
 +
** Use random, sub-sampling iterator in mean squared difference and cross correlation
 +
** Multi-thread metric calculation
 +
** Integrate metrics with transforms and interpolators for tailored performance
  
 
= Modular tests =
 
= Modular tests =

Revision as of 14:32, 30 March 2007

Home < ITK Registration Optimization

Goals

There are two components to this research

  1. Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
  2. Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.

Algorithmic Requirements and Use Cases

  • Requirements
    1. relatively robust, with few parameters to tweak
    2. runs on grey scale images
    3. has already been published
    4. relatively fast (ideally speaking a few minutes for volume to volume).
    5. not patented
    6. can be implemented in ITK and parallelized.

Hardware Platform Requirements and Use Cases

  • Requirements
    1. Shared memory
    2. Single and multi-core machines
    3. Single and multi-processor machines
    4. AMD and Intel - Windows, Linux, and SunOS
  • Use-cases
    1. Intel Core2Duo
    2. Intel quad-core Xeon processors (?)
    3. 6 CPU Sun, Solaris 8 (SPL: vision)
    4. 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
    5. 16 core Opteron (SPL: john, ringo, paul, george)
    6. 16 core, Sun Fire, AMDOpteron (UNC: Styner)

Data

Workplan

Establish testing and reporting infrastructure

  1. Identify timing tools
    1. Cross platform and multi-threaded
    2. Timing and profiling
    • Status
      1. Instrumenting modular tests
        • Extending itk's cross-platform high precision timer
        • Adding thread affinity to ensure valid timings
        • Adding method for increasing process priority
      2. Profiling complete registration solutions for use cases
        • Using CacheGrind on single and multi-core linux systems
  2. Develop performance dashboard for collecting results
    1. Each test will report time and accuracy to a central server
    2. The performance of a test, over time, for a given platform can be viewed on one page
    3. The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
    • Status
      1. BatchMake database communication code being isolated
      2. Performance dashboard web pages being designed

Develop tests

  1. Develop modular tests
    • Status
      1. Developed itkCheckerboardImageSource so no IO required
      2. Developing tests as listed in the Benchmark section below
  2. Develop C-style tests
    1. Tests should represent the non-ITK way of doing image analysis
      • Use standard C/C++ arrays and pointers to access blocks of memory as images
  3. Develop complete registration solutions for use cases
    • Status
      1. Centralized data and provide easy access
      2. Identified relevant registration algorithms
        • rigid, affine, bspline, multi-level bspline, and Demons'
        • normalized mutual information, mean squared difference, and cross correlation
      3. Developing traditional ITK-style implementations

Compute performance on target platforms

  • Ongoing

Optimize bottlenecks

  • Target bottlenecks
    • Use random, sub-sampling iterator in mean squared difference and cross correlation
    • Multi-thread metric calculation
    • Integrate metrics with transforms and interpolators for tailored performance

Modular tests

All tests send two values to performance dashboards

  • the time required
  • an measure of the error (0 = no error; 1 = 100% error)

Tests being developed and their parameter spaces

  1. LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 1.5, 2, 3 (i.e., producing up to 600^3 images)
    • = 24 tests (approx time on dual-core for all tests = 1.5 minutes)
  2. BSplineInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (meaning: 100^3 and 200^3 images)
    • Factor = 1.5, 2, 3 (thereby producing up to 600^3 images)
    • = 24 tests (approx time on dual-core for all tests = ??)
  3. SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
  4. BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> [<outputImage>]
  5. MeanReciprocalSquaredDifferenceMetricTest
  6. MeanSquaresMetricTest
  7. NormalizedCorreltationMetricTest
  8. GradientDifferentMetricTest
  9. MattesMutualInformationMetricTest
  10. MutualInformationMetricTest
  11. NormalizedMutualInformationMetricTest
  12. MutualInformationHistogramMetricTest
  13. NormaalizedMutualInformationHistogramMetricTest

Related Pages

Performance Measurement