Affine shape adaptation

<h2 id="affine-adapted-interest-point-operators">Affine-adapted interest point operators</h2>
The interest points obtained from the scale-adapted Laplacian <a href="/facts/Blob_detection/F5yCn3eQ">blob detector</a> or the multi-scale Harris <a href="/facts/Corner_detection/19pSbRDe">corner detector</a> with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain interest points that are more robust to perspective transformations, a natural approach is to devise a feature detector that is invariant to affine transformations.
Affine invariance can be accomplished from measurements of the same multi-scale windowed second moment matrix 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 as is used in the multi-scale Harris operator provided that we extend the regular <a href="/facts/Scale_space/XJNvTH1N">scale space</a> concept obtained by <a href="/facts/Convolution/PVPrdz9J">convolution</a> with rotationally symmetric Gaussian kernels to an affine Gaussian scale-space obtained by shape-adapted Gaussian kernels (Lindeberg 1994, section 15.3; Lindeberg & Garding 1997). For a two-dimensional image 
 
 
 
 I
 
 
 {\displaystyle I}
 
, let 
 
 
 
 
 
 
 x
 ¯
 
 
 
 =
 (
 x
 ,
 y
 
 )
 
 T
 
 
 
 
 {\displaystyle {\bar {x}}=(x,y)^{T}}
 
 and let 
 
 
 
 
 Σ
 
 t
 
 
 
 
 {\displaystyle \Sigma _{t}}
 
 be a positive definite 2×2 matrix. Then, a non-uniform Gaussian kernel can be defined as

g
        (
        
          
            
              x
              ¯
            
          
        
        ;
        Σ
        )
        =
        
          
            1
            
              2
              π
              
                
                  det
                  ⁡
                  
                    Σ
                    
                      t
                    
                  
                
              
            
          
        
        
          e
          
            −
            
              
                
                  x
                  ¯
                
              
            
            
              Σ
              
                t
              
              
                −
                1
              
            
            
              
                
                  x
                  ¯
                
              
            
            
              /
            
            2
          
        
      
    
    {\displaystyle g({\bar {x}};\Sigma )={\frac {1}{2\pi {\sqrt {\operatorname {det} \Sigma _{t}}}}}e^{-{\bar {x}}\Sigma _{t}^{-1}{\bar {x}}/2}}

and given any input image 
 
 
 
 
 I
 
 L
 
 
 
 
 {\displaystyle I_{L}}
 
 the affine Gaussian scale-space is the three-parameter scale-space defined as

L
        (
        
          
            
              x
              ¯
            
          
        
        ;
        
          Σ
          
            t
          
        
        )
        =
        
          ∫
          
            
              
                
                  x
                  i
                
                ¯
              
            
          
        
        
          I
          
            L
          
        
        (
        x
        −
        ξ
        )
        
        g
        (
        
          
            
              ξ
              ¯
            
          
        
        ;
        
          Σ
          
            t
          
        
        )
        
        d
        
          
            
              ξ
              ¯
            
          
        
        .
      
    
    {\displaystyle L({\bar {x}};\Sigma _{t})=\int _{\bar {xi}}I_{L}(x-\xi )\,g({\bar {\xi }};\Sigma _{t})\,d{\bar {\xi }}.}

Next, introduce an affine transformation 
 
 
 
 η
 =
 B
 ξ
 
 
 {\displaystyle \eta =B\xi }
 
 where 
 
 
 
 B
 
 
 {\displaystyle B}
 
 is a 2×2-matrix, and define a transformed image 
 
 
 
 
 I
 
 R
 
 
 
 
 {\displaystyle I_{R}}
 
 as

I
 
 L
 
 
 (
 
 
 
 ξ
 ¯
 
 
 
 )
 =
 
 I
 
 R
 
 
 (
 
 
 
 η
 ¯
 
 
 
 )
 
 
 {\displaystyle I_{L}({\bar {\xi }})=I_{R}({\bar {\eta }})}
 
.
Then, the affine scale-space representations 
 
 
 
 L
 
 
 {\displaystyle L}
 
 and 
 
 
 
 R
 
 
 {\displaystyle R}
 
 of 
 
 
 
 
 I
 
 L
 
 
 
 
 {\displaystyle I_{L}}
 
 and 
 
 
 
 
 I
 
 R
 
 
 
 
 {\displaystyle I_{R}}
 
, respectively, are related according to

L
        (
        
          
            
              ξ
              ¯
            
          
        
        ,
        
          Σ
          
            L
          
        
        )
        =
        R
        (
        
          
            
              η
              ¯
            
          
        
        ,
        
          Σ
          
            R
          
        
        )
      
    
    {\displaystyle L({\bar {\xi }},\Sigma _{L})=R({\bar {\eta }},\Sigma _{R})}

provided that the affine shape matrices 
 
 
 
 
 Σ
 
 L
 
 
 
 
 {\displaystyle \Sigma _{L}}
 
 and 
 
 
 
 
 Σ
 
 R
 
 
 
 
 {\displaystyle \Sigma _{R}}
 
 are related according to

Σ
 
 R
 
 
 =
 B
 
 Σ
 
 L
 
 
 
 B
 
 T
 
 
 
 
 {\displaystyle \Sigma _{R}=B\Sigma _{L}B^{T}}
 
.
Disregarding mathematical details, which unfortunately become somewhat technical if one aims at a precise description of what is going on, the important message is that the affine Gaussian scale-space is closed under affine transformations.
If we, given the notation 
 
 
 
 ∇
 L
 =
 (
 
 L
 
 x
 
 
 ,
 
 L
 
 y
 
 
 
 )
 
 T
 
 
 
 
 {\displaystyle \nabla L=(L_{x},L_{y})^{T}}
 
 as well as local shape matrix 
 
 
 
 
 Σ
 
 t
 
 
 
 
 {\displaystyle \Sigma _{t}}
 
 and an integration shape matrix 
 
 
 
 
 Σ
 
 s
 
 
 
 
 {\displaystyle \Sigma _{s}}
 
, introduce an affine-adapted multi-scale second-moment matrix according to

μ
          
            L
          
        
        (
        
          
            
              x
              ¯
            
          
        
        ;
        
          Σ
          
            t
          
        
        ,
        
          Σ
          
            s
          
        
        )
        =
        g
        (
        
          
            
              x
              ¯
            
          
        
        −
        
          
            
              ξ
              ¯
            
          
        
        ;
        
          Σ
          
            s
          
        
        )
        
        
          (
          
            
              ∇
              
                L
              
            
            (
            
              
                
                  ξ
                  ¯
                
              
            
            ;
            
              Σ
              
                t
              
            
            )
            
              ∇
              
                L
              
              
                T
              
            
            (
            
              
                
                  ξ
                  ¯
                
              
            
            ;
            
              Σ
              
                t
              
            
            )
          
          )
        
      
    
    {\displaystyle \mu _{L}({\bar {x}};\Sigma _{t},\Sigma _{s})=g({\bar {x}}-{\bar {\xi }};\Sigma _{s})\,\left(\nabla _{L}({\bar {\xi }};\Sigma _{t})\nabla _{L}^{T}({\bar {\xi }};\Sigma _{t})\right)}

it can be shown that under any affine transformation 
 
 
 
 
 
 
 q
 ¯
 
 
 
 =
 B
 
 
 
 p
 ¯
 
 
 
 
 
 {\displaystyle {\bar {q}}=B{\bar {p}}}
 
 the affine-adapted multi-scale second-moment matrix transforms according to

μ
 
 L
 
 
 (
 
 
 
 p
 ¯
 
 
 
 ;
 
 Σ
 
 t
 
 
 ,
 
 Σ
 
 s
 
 
 )
 =
 
 B
 
 T
 
 
 
 μ
 
 R
 
 
 (
 
 
 
 q
 ¯
 
 
 
 ;
 B
 
 Σ
 
 t
 
 
 
 B
 
 T
 
 
 ,
 B
 
 Σ
 
 s
 
 
 
 B
 
 T
 
 
 )
 B
 
 
 {\displaystyle \mu _{L}({\bar {p}};\Sigma _{t},\Sigma _{s})=B^{T}\mu _{R}({\bar {q}};B\Sigma _{t}B^{T},B\Sigma _{s}B^{T})B}
 
.
Again, disregarding somewhat messy technical details, the important message here is that given a correspondence between the image points 
 
 
 
 
 
 
 p
 ¯
 
 
 
 
 
 {\displaystyle {\bar {p}}}
 
 and 
 
 
 
 
 
 
 q
 ¯
 
 
 
 
 
 {\displaystyle {\bar {q}}}
 
, the affine transformation 
 
 
 
 B
 
 
 {\displaystyle B}
 
 can be estimated from measurements of the multi-scale second-moment matrices 
 
 
 
 
 μ
 
 L
 
 
 
 
 {\displaystyle \mu _{L}}
 
 and 
 
 
 
 
 μ
 
 R
 
 
 
 
 {\displaystyle \mu _{R}}
 
 in the two domains.
An important consequence of this study is that if we can find an affine transformation 
 
 
 
 B
 
 
 {\displaystyle B}
 
 such that 
 
 
 
 
 μ
 
 R
 
 
 
 
 {\displaystyle \mu _{R}}
 
 is a constant times the unit matrix, then we obtain a fixed-point that is invariant to affine transformations (Lindeberg 1994, section 15.4; Lindeberg & Garding 1997). For the purpose of practical implementation, this property can often be reached by in either of two main ways. The first approach is based on transformations of the smoothing filters and consists of:

<ul><li>estimating the second-moment matrix 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 in the image domain,</li>
<li>determining a new adapted smoothing kernel with covariance matrix proportional to 
 
 
 
 
 μ
 
 −
 1
 
 
 
 
 {\displaystyle \mu ^{-1}}
 
,</li>
<li>smoothing the original image by the shape-adapted smoothing kernel, and</li>
<li>repeating this operation until the difference between two successive second-moment matrices is sufficiently small.</li></ul>
The second approach is based on warpings in the image domain and implies:

<ul><li>estimating 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 in the image domain,</li>
<li>estimating a local affine transformation proportional to 
 
 
 
 
 
 
 B
 ^
 
 
 
 =
 
 μ
 
 1
 
 /
 
 2
 
 
 
 
 {\displaystyle {\hat {B}}=\mu ^{1/2}}
 
 where 
 
 
 
 
 μ
 
 1
 
 /
 
 2
 
 
 
 
 {\displaystyle \mu ^{1/2}}
 
 denotes the square root matrix of 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
,</li>
<li>warping the input image by the affine transformation 
 
 
 
 
 
 
 
 B
 ^
 
 
 
 
 −
 1
 
 
 
 
 {\displaystyle {\hat {B}}^{-1}}
 
 and</li>
<li>repeating this operation until 
 
 
 
 μ
 
 
 {\displaystyle \mu }
 
 is sufficiently close to a constant times the unit matrix.</li></ul>
This overall process is referred to as affine shape adaptation (Lindeberg & Garding 1997; Baumberg 2000; Mikolajczyk & Schmid 2004; Tuytelaars & van Gool 2004; Ravela 2004; Lindeberg 2008). In the ideal continuous case, the two approaches are mathematically equivalent. In practical implementations, however, the first filter-based approach is usually more accurate in the presence of noise while the second warping-based approach is usually faster.
In practice, the affine shape adaptation process described here is often combined with interest point detection automatic scale selection as described in the articles on <a href="/facts/Blob_detection/F5yCn3eQ">blob detection</a> and <a href="/facts/Corner_detection/19pSbRDe">corner detection</a>, to obtain interest points that are invariant to the full affine group, including scale changes. Besides the commonly used multi-scale Harris operator, this affine shape adaptation can also be applied to other types of interest point operators such as the Laplacian/Difference of Gaussian blob operator and the determinant of the Hessian (Lindeberg 2008). Affine shape adaptation can also be used for affine invariant texture recognition and affine invariant texture segmentation.
Closely related to the notion of affine shape adaptation is the notion of affine normalization, which defines an affine invariant reference frame as further described in Lindeberg (2013a,b, 2021:Appendix I.3), such that any image measurement performed in the affine invariant reference frame is affine invariant.

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Blob_detection/F5yCn3eQ">Blob detection</a></li>
<li><a href="/facts/Corner_detection/19pSbRDe">Corner detection</a></li>
<li><a href="/facts/Gaussian_function/5bG8Kpp9">Gaussian function</a></li>
<li><a href="/facts/Harris-Affine/jdr1EUQD">Harris affine region detector</a></li>
<li><a href="/facts/Hessian_Affine_region_detector/Pn3khQgG">Hessian affine region detector</a></li>
<li><a href="/facts/Scale_space/XJNvTH1N">Scale space</a></li></ul>

<ul><li>Baumberg, A. (2000). "Reliable feature matching across widely separated views". Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. I:1774–1781. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1109%2FCVPR.2000.855899">10.1109/CVPR.2000.855899</a>.</li>
<li>Lindeberg, T. (1994). <a href="http://www.csc.kth.se/~tony/book.html">Scale-Space Theory in Computer Vision</a>. Springer. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-7923-9418-6.</li>
<li>Lindeberg, T.; Garding, J. (1997). <a href="http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A472972&dswid=5025">"Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure"</a>. Image and Vision Computing. 15 (6): 415–434. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2FS0262-8856%2897%2901144-X">10.1016/S0262-8856(97)01144-X</a>.</li>
<li>Lindeberg, T. (2008). <a href="http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A441147&dswid=9229">"Scale-space"</a>. Encyclopedia of Computer Science and Engineering (<a href="/facts/Benjamin_Wah/FHLwTd61">Benjamin Wah</a>, ed), John Wiley and Sons. Vol. IV. pp. 2495–2504. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1002%2F9780470050118.ecse609">10.1002/9780470050118.ecse609</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0470050118.</li>
<li>Lindeberg, T. (2013a). <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716821">"Invariance of visual operations at the level of receptive fields"</a>. PLOS ONE. 8 (7): e66990:1–33. <a href="/facts/ArXiv_(identifier)/H6EtgnBe">arXiv</a>:<a href="https://arxiv.org/abs/1210.0754">1210.0754</a>. <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/2013PLoSO...866990L">2013PLoSO...866990L</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1371%2Fjournal.pone.0066990">10.1371/journal.pone.0066990</a>. <a href="/facts/PMC_(identifier)/dX1zMt71">PMC</a> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716821">3716821</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/23894283">23894283</a>.</li>
<li>Lindeberg, T. (2013b). <a href="http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A607456&dswid=-5433">"Generalized axiomatic scale-space theory"</a>. Advances in Imaging and Electron Physics. 178 (7): 1–96. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2FB978-0-12-407701-0.00001-7">10.1016/B978-0-12-407701-0.00001-7</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 9780124077010.</li>
<li>Lindeberg, T. (2021). <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820928">"Normative theory of visual receptive fields"</a>. Heliyon. 7 (1): e05897. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2Fj.heliyon.2021.e05897">10.1016/j.heliyon.2021.e05897</a>. <a href="/facts/PMC_(identifier)/dX1zMt71">PMC</a> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820928">7820928</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/33521348">33521348</a>.</li>
<li>Mikolajczyk, K.; Schmid, C. (2004). <a href="http://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/mikolajczyk_ijcv2004.pdf">"Scale and affine invariant interest point detectors"</a> (PDF). International Journal of Computer Vision. 60 (1): 63–86. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1023%2FB%3AVISI.0000027790.02288.f2">10.1023/B:VISI.0000027790.02288.f2</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:1704741">1704741</a>. Integration of the multi-scale Harris operator with the methodology for automatic scale selection as well as with affine shape adaptation.</li>
<li>Tuytelaars, T.; van Gool, L. (2004). <a href="https://web.archive.org/web/20100612233617/http://vis.uky.edu/~dnister/Teaching/CS684Fall2005/tuytelaars_ijcv2004.pdf">"Matching Widely Separated Views Based on Affine Invariant Regions"</a> (PDF). International Journal of Computer Vision. 59 (1): 63–86. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1023%2FB%3AVISI.0000020671.28016.e8">10.1023/B:VISI.0000020671.28016.e8</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:5107897">5107897</a>. Archived from <a href="http://www.vis.uky.edu/~dnister/Teaching/CS684Fall2005/tuytelaars_ijcv2004.pdf">the original</a> (PDF) on 2010-06-12.</li>
<li>Ravela, S. (2004). "Shaping receptive fields for affine invariance". Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 2. pp. 725–730. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1109%2FCVPR.2004.1315236">10.1109/CVPR.2004.1315236</a>. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-7695-2158-4.</li></ul>

Affine shape adaptation open-in-new

Affine shape adaptation