Difference between revisions of "ITK Registration Optimization"

From NAMIC Wiki
Jump to: navigation, search
Line 90: Line 90:
  
 
=== MattesMutualInformationImageToImageMetric ===
 
=== MattesMutualInformationImageToImageMetric ===
Time in Functions
+
<b><center>Time in Functions</center></b>
 
{| border="1"
 
{| border="1"
 
|- bgcolor="#abcdef"
 
|- bgcolor="#abcdef"
Line 173: Line 173:
 
|}
 
|}
  
Time in files
+
<b><center>Time in files</center></b>
 
{| border="1"
 
{| border="1"
 
|- bgcolor="#abcdef"
 
|- bgcolor="#abcdef"

Revision as of 00:20, 2 April 2007

Home < ITK Registration Optimization

Goals

There are two components to this research

  1. Identify registration algorithms that are suitable for non-rigid registration problems that are indemic to NA-MIC
  2. Develop implementations of those algorithms that take advantage of multi-core and multi-processor hardware.

Algorithmic Requirements and Use Cases

  • Requirements
    1. relatively robust, with few parameters to tweak
    2. runs on grey scale images
    3. has already been published
    4. relatively fast (ideally speaking a few minutes for volume to volume).
    5. not patented
    6. can be implemented in ITK and parallelized.

Hardware Platform Requirements and Use Cases

  • Requirements
    1. Shared memory
    2. Single and multi-core machines
    3. Single and multi-processor machines
    4. AMD and Intel - Windows, Linux, and SunOS
  • Use-cases
    1. Intel Core2Duo
    2. Intel quad-core Xeon processors (?)
    3. 6 CPU Sun, Solaris 8 (SPL: vision)
    4. 12 CPU Sun, Solaris 8 (SPL: forest and ocean)
    5. 16 core Opteron (SPL: john, ringo, paul, george)
    6. 16 core, Sun Fire, AMDOpteron (UNC: Styner)

Data

Workplan

Establish testing and reporting infrastructure

  1. Identify timing tools
    1. Cross platform and multi-threaded
    2. Timing and profiling
    • Status
      1. Instrumenting modular tests
        • Extending itk's cross-platform high precision timer
        • Adding thread affinity to ensure valid timings
        • Adding method for increasing process priority
      2. Profiling complete registration solutions for use cases
        • Using CacheGrind on single and multi-core linux systems
  2. Develop performance dashboard for collecting results
    1. Each test will report time and accuracy to a central server
    2. The performance of a test, over time, for a given platform can be viewed on one page
    3. The performance of a set of tests, at one point in time, for all platforms can be viewed on one page
    • Status
      1. BatchMake database communication code being isolated
      2. Performance dashboard web pages being designed

Develop tests

  1. Develop modular tests
    • Status
      1. Developed itkCheckerboardImageSource so no IO required
      2. Developing tests as listed in the "Modular Tests" section below
  2. Develop C-style tests
    1. Tests should represent the non-ITK way of doing image analysis
      • Use standard C/C++ arrays and pointers to access blocks of memory as images
  3. Develop complete registration solutions for use cases
    • Status
      1. Centralized data and provide easy access
      2. Identified relevant registration algorithms
        • rigid, affine, bspline, multi-level bspline, and Demons'
        • normalized mutual information, mean squared difference, and cross correlation
      3. Developing traditional ITK-style implementations

Compute performance on target platforms

  • Ongoing

Optimize bottlenecks

  • Target bottlenecks
    • Use random, sub-sampling iterator in mean squared difference and cross correlation
    • Multi-thread metric calculation
    • Integrate metrics with transforms and interpolators for tailored performance

MattesMutualInformationImageToImageMetric

Time in Functions
Time in self Time in subfuncs Function
0.00 86.64 __tmainCRTStartup"
0.00 48.47 main"
0.00 37.98 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetDerivative"
8.49 20.99 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValueAndDerivative"
0.57 19.27 itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::Evaluate"
13.40 13.55 itk::CentralDifferenceImageFunction<itk::Image<float,3>,double>::EvaluateAtIndex"
11.70 11.83 itk::BSplineKernelFunction<3>::Evaluate [1]"
9.06 9.16 itk::BSplineKernelFunction<2>::Evaluate [1]"
3.21 8.40 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::GetValue"
8.11 8.21 floor ?"
0.00 7.25 itk::CheckerBoardImageSource<itk::Image<float,3> >::GenerateData"
0.00 4.20 itk::ImageSource<itk::Image<float,3> >::ThreaderCallback"
3.77 3.82 itk::NearestNeighborInterpolateImageFunction<itk::Image<float,3>,double>::EvaluateAtContinuousIndex"
3.21 3.24 itk::StatisticsImageFilter<itk::Image<float,3> >::ThreadedGenerateData"
3.02 3.05 itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer"
2.83 2.86 itk::BSplineDerivativeKernelFunction<3>::Evaluate"
2.26 2.29 itk::InterpolateImageFunction<itk::Image<float,3>,double>::Evaluate"
0.00 2.10 endthreadex ?"
2.08 2.10 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::TransformPoint"
2.08 2.10 thunk@403355 ?"
1.89 1.91 _ftol2_pentium4"
1.70 1.72 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputePDFDerivatives"
1.70 1.72 thunk@402f54 ?"
1.51 1.53 itk::BSplineKernelFunction<2>::Evaluate"
1.51 1.53 itk::ImageBase<3>::GetSpacing"
1.13 1.53 itk::ImageFunction<itk::Image<float,3>,double,double>::ConvertContinuousIndexToNearestIndex"
0.94 1.34 itk::CheckerBoardSpatialFunction<double,3,itk::Point<double,3> >::Evaluate"
1.13 1.15 itk::ImageFunction<itk::Image<float,3>,double,double>::IsInsideBuffer [1]"
0.94 0.95 itk::BSplineKernelFunction<3>::Evaluate"
0.75 0.76 itk::ImageBase<3>::GetBufferedRegion"
0.75 0.76 itk::Point<double,3>::operator+"
0.75 0.76 thunk@4036d4 ?"
0.75 0.76 thunk@403cec ?"
0.19 0.57 itk::ImageFunction<itk::Image<float,3>,itk::CovariantVector<double,3>,double>::ConvertContinuousIndexToNearestIndex"
0.57 0.57 itk::MattesMutualInformationImageToImageMetric<itk::Image<float,3>,itk::Image<float,3> >::ComputeImageDerivatives"
0.57 0.57 itk::Point<double,3>::operator="
0.57 0.57 itk::ShiftScaleImageFilter<itk::Image<float,3>,itk::Image<float,3> >::ThreadedGenerateData"
0.57 0.57 itk::TranslationTransform<double,3>::TransformPoint"
Time in files
Time in self Time in subfuncs Files
16.42 72.33 itkmattesmutualinformationimagetoimagemetric.txx"
0.00 48.47 mattesmutualinformationimagetoimagemetrictest.cxx"
23.21 23.47 itkbsplinekernelfunction.h"
0.57 19.27 itkcentraldifferenceimagefunction.h"
13.40 13.55 itkcentraldifferenceimagefunction.txx"
0.00 7.25 itkcheckerboardimagesource.txx"
5.66 6.49 itkimagefunction.h"
0.00 4.20 itkimagesource.txx"
3.77 3.82 itknearestneighborinterpolateimagefunction.h"
3.21 3.24 itkstatisticsimagefilter.txx"
2.83 2.86 itkbsplinederivativekernelfunction.h"
2.83 2.86 itkimagebase.h"
2.26 2.29 itkinterpolateimagefunction.h"
0.94 1.34 itkcheckerboardspatialfunction.txx"
1.32 1.34 itkpoint.txx"
0.75 0.76 itktranslationtransform.txx"
0.57 0.57 itkshiftscaleimagefilter.txx"
0.38 0.38 vnl_matrix.txx"
0.00 0.19 itkbsplinedeformabletransform.txx"
0.19 0.19 itkfixedarray.txx"
0.19 0.19 itkimageregionconstiterator.txx"
0.19 0.19 itkobject.cxx"
0.19 0.19 vnl_vector.txx"
0.19 0.19 vector"
0.19 0.19 secchk.c"

Modular tests

All tests send two values to performance dashboards

  • the time required
  • an measure of the error (0 = no error; 1 = 100% error)

Tests being developed and their parameter spaces

  1. LinearInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 1 minute)
  2. BSplineInterpTest <numThreads> <dimSize> <factor> <bSplineOrder> [<outputImage>]
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • bSplineOrder = 3
    • = 16 tests (approx time on Core2Duo for these tests = 10 minute)
  3. SincInterpTest <numThreads> <dimSize> <factor> [<outputImage>]
    • Uses the Welch window function
    • NumThreads = 1, 2, 4, and #OfCoresIf>4 (for every platform)
    • DimSize = 100, 200 (i.e., 100^3 and 200^3 images)
    • Factor = 2, 3 (i.e., producing up to 600^3 images)
    • = 16 tests (approx time on Core2Duo for these tests = 30 minute)
  4. BSplineTransformLinearInterpTest <numThreads> <dimSize> <numNodesPerDim> <bSplineOrder> [<outputImage>]
    • 3 nodes are also added outside of the image for interpolation
  1. MeanReciprocalSquaredDifferenceMetricTest
  2. MeanSquaresMetricTest
  3. NormalizedCorreltationMetricTest
  4. GradientDifferentMetricTest
  5. MattesMutualInformationMetricTest
  6. MutualInformationMetricTest
  7. NormalizedMutualInformationMetricTest
  8. MutualInformationHistogramMetricTest
  9. NormaalizedMutualInformationHistogramMetricTest

Notes

  • MattesMutualInformationMetric defaults to BSpline interpolator - above tests override to instead use nearest neighbor interpolation

Related Pages

Performance Measurement