Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.
Applying the motion vectors to an image to predict the transformation to another image, on account of moving camera or object in the image is called motion compensation. The combination of motion estimation and motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs.
Motion estimation based video compression helps in saving bits by sending encoded difference images which have inherently less entropy as opposed to sending a fully coded frame. However, the most computationally expensive and resource extensive operation in the entire compression process is motion estimation. Hence, fast and computationally inexpensive algorithms for motion estimation is a need for video compression.
A metric for matching a macroblock with another block is based on a cost function. The most popular in terms of computational expense is:
Mean difference or Mean Absolute Difference (MAD) = 1 N 2 ∑ i = 0 n − 1 ∑ j = 0 n − 1 | C i j − R i j | {\displaystyle {\frac {1}{N^{2}}}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}|C_{ij}-R_{ij}|}
Mean Squared Error (MSE) = 1 N 2 ∑ i = 0 n − 1 ∑ j = 0 n − 1 ( C i j − R i j ) 2 {\displaystyle {\frac {1}{N^{2}}}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}(C_{ij}-R_{ij})^{2}}
where N is the size of the macro-block, and C i j {\displaystyle C_{ij}} and R i j {\displaystyle R_{ij}} are the pixels being compared in current macroblock and reference macroblock, respectively.
The motion compensated image that is created using the motion vectors and macroblocks from the reference frame is characterized by Peak signal-to-noise ratio (PSNR),
PSNR = 10 log 10 ( peak to peak value of original data ) 2 MSE {\displaystyle {\text{PSNR}}=10\log _{10}{\frac {({\text{peak to peak value of original data}})^{2}}{\text{MSE}}}}
Block Matching algorithms have been researched since mid-1980s. Many algorithms have been developed, but only some of the most basic or commonly used have been described below.
This algorithm calculates the cost function at each possible location in the search window. This leads to the best possible match of the macro-block in the reference frame with a block in another frame. The resulting motion compensated image has highest peak signal-to-noise ratio as compared to any other block matching algorithm. However this is the most computationally extensive block matching algorithm among all. A larger search window requires greater number of computations.
The optimized hierarchical block matching (OHBM) algorithm speeds up the exhaustive search based on the optimized image pyramids.3
It is one of the earliest fast block matching algorithms. It runs as follows:
The resulting location for S=1 is the one with minimum cost function and the macro block at this location is the best match.
There is a reduction in computation by a factor of 9 in this algorithm. For p=7, while ES evaluates cost for 225 macro-blocks, TSS evaluates only for 25 macro blocks.
TDLS is closely related to TSS however it is more accurate for estimating motion vectors for a large search window size. The algorithm can be described as follows,
TSS uses a uniformly allocated checking pattern and is prone to miss small motions. NTSS 4 is an improvement over TSS as it provides a center biased search scheme and has provisions to stop halfway to reduce the computational cost. It was one of the first widely accepted fast algorithms and frequently used for implementing earlier standards like MPEG 1 and H.261.
The algorithm runs as follows:
Thus this algorithm checks 17 points for each macro-block and the worst-case scenario involves checking 33 locations, which is still much faster than TSS
The idea behind TSS is that the error surface due to motion in every macro block is unimodal. A unimodal surface is a bowl shaped surface such that the weights generated by the cost function increase monotonically from the global minimum. However a unimodal surface cannot have two minimums in opposite directions and hence the 8 point fixed pattern search of TSS can be further modified to incorporate this and save computations. SES 5 is the extension of TSS that incorporates this assumption.
SES algorithm improves upon TSS algorithm as each search step in SES is divided into two phases:
• First Phase :
• Second Phase:
• Set the new step size as S = S/2
• Repeat the SES search procedure until S=1
• Select the location with lowest weight as motion vector SES is computationally very efficient as compared to TSS. However the peak signal-to-noise ratio achieved is poor as compared to TSS as the error surfaces are not strictly unimodal in reality.
Four Step Search is an improvement over TSS in terms of lower computational cost and better peak signal-to-noise ratio. Similar to NTSS, FSS 6 also employs center biased searching and has a halfway stop provision.
Diamond Search (DS)7 algorithm uses a diamond search point pattern and the algorithm runs exactly the same as 4SS. However, there is no limit on the number of steps that the algorithm can take.
Two different types of fixed patterns are used for search,
This algorithm finds the global minimum very accurately as the search pattern is neither too big nor too small. Diamond Search algorithm has a peak signal-to-noise ratio close to that of Exhaustive Search with significantly less computational expense.
Adaptive rood pattern search (ARPS) 8 algorithm makes use of the fact that the general motion in a frame is usually coherent, i.e. if the macro blocks around the current macro block moved in a particular direction then there is a high probability that the current macro block will also have a similar motion vector. This algorithm uses the motion vector of the macro block to its immediate left to predict its own motion vector.
Adaptive rood pattern search runs as follows:
Rood pattern search directly puts the search in an area where there is a high probability of finding a good matching block. The main advantage of ARPS over DS is if the predicted motion vector is (0, 0), it does not waste computational time in doing LDSP, but it directly starts using SDSP. Furthermore, if the predicted motion vector is far away from the center, then again ARPS saves on computations by directly jumping to that vicinity and using SDSP, whereas DS takes its time doing LDSP.
1. http://www.mathworks.com/matlabcentral/fileexchange/8761-block-matching-algorithms-for-motion-estimation
2. https://www.ece.cmu.edu/~ee899/project/deepak_mid.htm
Dabov, Kostadin; Foi, Alessandro; Katkovnik, Vladimir; Egiazarian, Karen (16 July 2007). "Image denoising by sparse 3D transform-domain collaborative filtering". IEEE Transactions on Image Processing. 16 (8): 2080–2095. Bibcode:2007ITIP...16.2080D. CiteSeerX 10.1.1.219.5398. doi:10.1109/TIP.2007.901238. PMID 17688213. S2CID 1475121. /wiki/Bibcode_(identifier) ↩
Danielyan, Aram; Katkovnik, Vladimir; Egiazarian, Karen (30 June 2011). "BM3D Frames and Variational Image Deblurring". IEEE Transactions on Image Processing. 21 (4): 1715–28. arXiv:1106.6180. Bibcode:2012ITIP...21.1715D. doi:10.1109/TIP.2011.2176954. PMID 22128008. S2CID 11204616. /wiki/ArXiv_(identifier) ↩
Je, Changsoo; Park, Hyung-Min (2013). "Optimized hierarchical block matching for fast and accurate image registration". Signal Processing: Image Communication. 28 (7): 779–791. doi:10.1016/j.image.2013.04.002. /wiki/Doi_(identifier) ↩
Li, Renxiang; Zeng, Bing; Liou, Ming (August 1994). "A New Three-Step Search Algorithm for Block Motion Estimation". IEEE Transactions on Circuits and Systems for Video Technology. 4 (4): 438–442. doi:10.1109/76.313138. /wiki/Doi_(identifier) ↩
Lu, Jianhua; Liou, Ming (April 1997). "A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation". IEEE Transactions on Circuits and Systems for Video Technology. 7 (2): 429–433. doi:10.1109/76.564122. /wiki/Doi_(identifier) ↩
Po, Lai-Man; Ma, Wing-Chung (June 1996). "A Novel Four-Step Search Algorithm for Fast Block Motion Estimation". IEEE Transactions on Circuits and Systems for Video Technology. 6 (3): 313–317. doi:10.1109/76.499840. /wiki/Doi_(identifier) ↩
Zhu, Shan; Ma, Kai-Kuang (February 2000). "A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation". IEEE Transactions on Image Processing. 9 (12): 287–290. Bibcode:2000ITIP....9..287Z. doi:10.1109/83.821744. PMID 18255398. /wiki/Bibcode_(identifier) ↩
Nie, Yao; Ma, Kai-Kuang (December 2002). "Adaptive Rood Pattern Search for Fast Block-Matching Motion Estimation" (PDF). IEEE Transactions on Image Processing. 11 (12): 1442–1448. Bibcode:2002ITIP...11.1442N. doi:10.1109/TIP.2002.806251. PMID 18249712. http://www3.ntu.edu.sg/home/ekkma/1_Publications_files/Adaptive%20rood%20pattern%20search%20for%20fast%20block-matching%20motion%20estimation%20%28IEEE%20TIP%20Dec%202002%29.pdf ↩