Difference between revisions of "ITK Registration Optimization"

From NAMIC Wiki
Jump to: navigation, search
Line 88: Line 88:
 
** Multi-thread metric calculation
 
** Multi-thread metric calculation
 
** Integrate metrics with transforms and interpolators for tailored performance
 
** Integrate metrics with transforms and interpolators for tailored performance
 +
*# MattesMutualInformationImageToImageMetric
 +
{| border="1"
 +
|- bgcolor="#abcdef"
 +
! Time in self !! Time in subfuncs !! Function
 +
|-
 +
|"0.00"||"60.54"||"__tmainCRTStartup"
 +
|-
 +
|0.00||34.04||main"
 +
|-
 +
|0.00||21.39||itk::CheckerBoardImageSource<itk::Image<float,3> >::GenerateData"
 +
|-
 +
|16.52||16.57||floor ?"
 +
|-
 +
|0.00||13.55||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative"
 +
|-
 +
|0.00||12.95||itk::ImageSource<itk::Image<float,3> >::ThreaderCallback"
 +
|-
 +
|0.00||12.95||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative"
 +
|-
 +
|0.30||11.45||itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::Evaluate"
 +
|-
 +
|8.71||8.73||itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::EvaluateAtIndex"
 +
|-
 +
|2.70||8.43||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative"
 +
|-
 +
|7.51||7.53||itk::BSplineKernelFunction<3>::Evaluate"
 +
|-
 +
|3.30||7.53||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative"
 +
|-
 +
|0.00||6.63||endthreadex ?"
 +
|-
 +
|6.61||6.63||itk::StatisticsImageFilter<itk::Image<float,3> >::ThreadedGenerateData"
 +
|-
 +
|3.30||4.82||itk::CheckerBoardSpatialFunction<double,3,itk::Point<double,3> >::Evaluate"
 +
|-
 +
|4.50||4.52||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives"
 +
|-
 +
|3.90||3.92||itk::NearestNeighborInterpolateImageFunction<itk::Image<float,3>,double>::EvaluateAtContinuousIndex"
 +
|-
 +
|3.60||3.61||_ftol2_pentium4"
 +
|-
 +
|3.60||3.61||itk::BSplineKernelFunction<2>::Evaluate [1]"
 +
|-
 +
|1.80||3.01||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue"
 +
|-
 +
|0.90||2.41||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue"
 +
|-
 +
|2.40||2.41||itk::ProgressReporter::CompletedPixel"
 +
|-
 +
|2.40||2.41||itk::ShiftScaleImageFilter<itk::Image<float,3>,itk::Image<float,3> >::ThreadedGenerateData"
 +
|-
 +
|2.10||2.11||itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer"
 +
|-
 +
|1.80||1.81||itk::BSplineDerivativeKernelFunction<3>::Evaluate"
 +
|-
 +
|1.20||1.81||itk::ImageFunction<itk::Image<float,3>,itk::CovariantVector<double,3>,double>::ConvertContinuousIndexToNearestIndex"
 +
|-
 +
|1.50||1.51||itk::BSplineKernelFunction<2>::Evaluate"
 +
|-
 +
|0.00||1.51||thunk@40316b ?"
 +
|-
 +
|1.20||1.20||itk::InterpolateImageFunction<itk::Image<float,3>,double>::Evaluate"
 +
|-
 +
|0.90||1.20||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize"
 +
|-
 +
|0.60||1.20||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize"
 +
|-
 +
|0.90||0.90||itk::ImageBase<3>::GetSpacing"
 +
|-
 +
|0.90||0.90||itk::ImageFunction<itk::Image<float,3>,double,double>::ConvertContinuousIndexToNearestIndex"
 +
|-
 +
|0.90||0.90||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives"
 +
|-
 +
|0.90||0.90||itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint"
 +
|-
 +
|0.90||0.90||itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint"
 +
|-
 +
|}
 +
  
 
= Modular tests =
 
= Modular tests =
Line 125: Line 204:
 
# MutualInformationHistogramMetricTest
 
# MutualInformationHistogramMetricTest
 
# NormaalizedMutualInformationHistogramMetricTest
 
# NormaalizedMutualInformationHistogramMetricTest
 +
 +
Notes
 +
* MattesMutualInformationMetric defaults to BSpline interpolator - above tests override to instead use nearest neighbor interpolation
  
 
= Related Pages =
 
= Related Pages =
Line 133: Line 215:
  
 
= Performance Measurement =
 
= Performance Measurement =
 +
* [http://www.lw-tech.com/index.php LTProf - simple profilter for Windows - Shareware]
 
* [http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/239145.htm Intel's VTune for Linux] ($)
 
* [http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/239145.htm Intel's VTune for Linux] ($)
 
* [http://www.cs.uoregon.edu/research/tau/home.php TAU]
 
* [http://www.cs.uoregon.edu/research/tau/home.php TAU]

Revision as of 20:45, 1 April 2007

Home < ITK Registration Optimization

Goals

There are two components to this research

  1. Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
  2. Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.

Algorithmic Requirements and Use Cases

  • Requirements
    1. relatively robust, with few parameters to tweak
    2. runs on grey scale images
    3. has already been published
    4. relatively fast (ideally speaking a few minutes for volume to volume).
    5. not patented
    6. can be implemented in ITK and parallelized.

Hardware Platform Requirements and Use Cases

  • Requirements
    1. Shared memory
    2. Single and multi-core machines
    3. Single and multi-processor machines
    4. AMD and Intel - Windows, Linux, and SunOS
  • Use-cases
    1. Intel Core2Duo
    2. Intel quad-core Xeon processors (?)
    3. 6 CPU Sun, Solaris 8 (SPL: vision)
    4. 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
    5. 16 core Opteron (SPL: john, ringo, paul, george)
    6. 16 core, Sun Fire, AMDOpteron (UNC: Styner)

Data

Workplan

Establish testing and reporting infrastructure

  1. Identify timing tools
    1. Cross platform and multi-threaded
    2. Timing and profiling
    • Status
      1. Instrumenting modular tests
        • Extending itk's cross-platform high precision timer
        • Adding thread affinity to ensure valid timings
        • Adding method for increasing process priority
      2. Profiling complete registration solutions for use cases
        • Using CacheGrind on single and multi-core linux systems
  2. Develop performance dashboard for collecting results
    1. Each test will report time and accuracy to a central server
    2. The performance of a test, over time, for a given platform can be viewed on one page
    3. The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
    • Status
      1. BatchMake database communication code being isolated
      2. Performance dashboard web pages being designed

Develop tests

  1. Develop modular tests
    • Status
      1. Developed itkCheckerboardImageSource so no IO required
      2. Developing tests as listed in the "Modular Tests" section below
  2. Develop C-style tests
    1. Tests should represent the non-ITK way of doing image analysis
      • Use standard C/C++ arrays and pointers to access blocks of memory as images
  3. Develop complete registration solutions for use cases
    • Status
      1. Centralized data and provide easy access
      2. Identified relevant registration algorithms
        • rigid, affine, bspline, multi-level bspline, and Demons'
        • normalized mutual information, mean squared difference, and cross correlation
      3. Developing traditional ITK-style implementations

Compute performance on target platforms

  • Ongoing

Optimize bottlenecks

  • Target bottlenecks
    • Use random, sub-sampling iterator in mean squared difference and cross correlation
    • Multi-thread metric calculation
    • Integrate metrics with transforms and interpolators for tailored performance
    1. MattesMutualInformationImageToImageMetric
Time in self Time in subfuncs Function
"0.00" "60.54" "__tmainCRTStartup"
0.00 34.04 main"
0.00 21.39 itk::CheckerBoardImageSource<itk::Image<float,3> >::GenerateData"
16.52 16.57 floor ?"
0.00 13.55 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative"
0.00 12.95 itk::ImageSource<itk::Image<float,3> >::ThreaderCallback"
0.00 12.95 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative"
0.30 11.45 itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::Evaluate"
8.71 8.73 itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::EvaluateAtIndex"
2.70 8.43 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative"
7.51 7.53 itk::BSplineKernelFunction<3>::Evaluate"
3.30 7.53 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative"
0.00 6.63 endthreadex ?"
6.61 6.63 itk::StatisticsImageFilter<itk::Image<float,3> >::ThreadedGenerateData"
3.30 4.82 itk::CheckerBoardSpatialFunction<double,3,itk::Point<double,3> >::Evaluate"
4.50 4.52 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives"
3.90 3.92 itk::NearestNeighborInterpolateImageFunction<itk::Image<float,3>,double>::EvaluateAtContinuousIndex"
3.60 3.61 _ftol2_pentium4"
3.60 3.61 itk::BSplineKernelFunction<2>::Evaluate [1]"
1.80 3.01 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue"
0.90 2.41 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue"
2.40 2.41 itk::ProgressReporter::CompletedPixel"
2.40 2.41 itk::ShiftScaleImageFilter<itk::Image<float,3>,itk::Image<float,3> >::ThreadedGenerateData"
2.10 2.11 itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer"
1.80 1.81 itk::BSplineDerivativeKernelFunction<3>::Evaluate"
1.20 1.81 itk::ImageFunction<itk::Image<float,3>,itk::CovariantVector<double,3>,double>::ConvertContinuousIndexToNearestIndex"
1.50 1.51 itk::BSplineKernelFunction<2>::Evaluate"
0.00 1.51 thunk@40316b ?"
1.20 1.20 itk::InterpolateImageFunction<itk::Image<float,3>,double>::Evaluate"
0.90 1.20 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize"
0.60 1.20 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::Initialize"
0.90 0.90 itk::ImageBase<3>::GetSpacing"
0.90 0.90 itk::ImageFunction<itk::Image<float,3>,double,double>::ConvertContinuousIndexToNearestIndex"
0.90 0.90 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives"
0.90 0.90 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint"
0.90 0.90 itk::OptMattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint"


Modular tests

All tests send two values to performance dashboards

  • the time required
  • an measure of the error (0 = no error; 1 = 100% error)

Tests being developed and their parameter spaces

  1. LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 1 minute)
  2. BSplineInterpTest <numThreads> <dimSize> <factor> <bSplineOrder> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • bSplineOrder = 3
    • = 16 tests (approx time on Core2Duo for these tests = 10 minute)
  3. SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • Uses the Welch window function
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 30 minute)
  4. BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> <bSplineOrder> [<outputImage>]
    • 3 nodes are also added outside of the image for interpolation
  1. MeanReciprocalSquaredDifferenceMetricTest
  2. MeanSquaresMetricTest
  3. NormalizedCorreltationMetricTest
  4. GradientDifferentMetricTest
  5. MattesMutualInformationMetricTest
  6. MutualInformationMetricTest
  7. NormalizedMutualInformationMetricTest
  8. MutualInformationHistogramMetricTest
  9. NormaalizedMutualInformationHistogramMetricTest

Notes

  • MattesMutualInformationMetric defaults to BSpline interpolator - above tests override to instead use nearest neighbor interpolation

Related Pages

Performance Measurement