Closest pair of points problem

<h2 id="time-bounds">Time bounds</h2>
Randomized algorithms that solve the problem in <a href="/facts/Linear_time/77T62gmf">linear time</a> are known, in <a href="/facts/Euclidean_space/R2UbzmzM">Euclidean spaces</a> whose dimension is treated as a constant for the purposes of <a href="/facts/Asymptotic_analysis/kt87qLpu">asymptotic analysis</a>.<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a><a class="footnote-ref" id="fnref:3" href="#fn:3">3</a><a class="footnote-ref" id="fnref:4" href="#fn:4">4</a> This is significantly faster than the 
 
 
 
 O
 (
 
 n
 
 2
 
 
 )
 
 
 {\displaystyle O(n^{2})}
 
 time (expressed here in <a href="/facts/Big_O_notation/weFFjSWg">big O notation</a>) that would be obtained by a naive algorithm of finding distances between all pairs of points and selecting the smallest.
It is also possible to solve the problem without randomization, in <a href="/facts/Random-access_machine/9NrNtVSd">random-access machine</a> <a href="/facts/Model_of_computation/nJr8mHSI">models of computation</a> with unlimited memory that allow the use of the <a href="/facts/Floor_function/ssehyMMf">floor function</a>, in near-linear 
 
 
 
 O
 (
 n
 log
 ⁡
 log
 ⁡
 n
 )
 
 
 {\displaystyle O(n\log \log n)}
 
 time.<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a> In even more restricted models of computation, such as the <a href="/facts/Algebraic_decision_tree/FalJqMSd">algebraic decision tree</a>, the problem can be solved in the somewhat slower 
 
 
 
 O
 (
 n
 log
 ⁡
 n
 )
 
 
 {\displaystyle O(n\log n)}
 
 time bound,<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> and this is optimal for this model, by a reduction from the <a href="/facts/Element_uniqueness_problem/Mwxm6SRv">element uniqueness problem</a>. Both <a href="/facts/Sweep_line_algorithm/phUJCw6U">sweep line algorithms</a> and <a href="/facts/Divide-and-conquer_algorithm/pC1X7Ws7">divide-and-conquer algorithms</a> with this slower time bound are commonly taught as examples of these algorithm design techniques.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a><a class="footnote-ref" id="fnref:8" href="#fn:8">8</a>

<h2 id="linear-time-randomized-algorithms">Linear-time randomized algorithms</h2>
A linear <a href="/facts/Expected_time/6TqLIMMz">expected time</a> randomized algorithm of Rabin (1976), modified slightly by <a href="/facts/Richard_Lipton/aMHKxTMH">Richard Lipton</a> to make its analysis easier, proceeds as follows, on an input set 
 
 
 
 S
 
 
 {\displaystyle S}
 
 consisting of 
 
 
 
 n
 
 
 {\displaystyle n}
 
 points in a 
 
 
 
 k
 
 
 {\displaystyle k}
 
-dimensional Euclidean space:

<ol><li>Select 
 
 
 
 n
 
 
 {\displaystyle n}
 
 pairs of points uniformly at random, with replacement, and let 
 
 
 
 d
 
 
 {\displaystyle d}
 
 be the minimum distance of the selected pairs.</li>
<li>Round the input points to a square grid of points whose size (the separation between adjacent grid points) is 
 
 
 
 d
 
 
 {\displaystyle d}
 
, and use a <a href="/facts/Hash_table/Ht4ptDPF">hash table</a> to collect together pairs of input points that round to the same grid point.</li>
<li>For each input point, compute the distance to all other inputs that either round to the same grid point or to another grid point within the <a href="/facts/Moore_neighborhood/ehDuz1RA">Moore neighborhood</a> of 
 
 
 
 
 3
 
 k
 
 
 −
 1
 
 
 {\displaystyle 3^{k}-1}
 
 surrounding grid points.</li>
<li>Return the smallest of the distances computed throughout this process.</li></ol>
The algorithm will always correctly determine the closest pair, because it maps any pair closer than distance 
 
 
 
 d
 
 
 {\displaystyle d}
 
 to the same grid point or to adjacent grid points. The uniform sampling of pairs in the first step of the algorithm (compared to a different method of Rabin for sampling a similar number of pairs) simplifies the proof that the expected number of distances computed by the algorithm is linear.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a>
Instead, a different algorithm Khuller & Matias (1995) goes through two phases: a random iterated filtering process that approximates the closest distance to within an <a href="/facts/Approximation_ratio/BWlf7M32">approximation ratio</a> of 
 
 
 
 2
 
 
 k
 
 
 
 
 {\displaystyle 2{\sqrt {k}}}
 
, together with a finishing step that turns this approximate distance into the exact closest distance. The filtering process repeat the following steps, until 
 
 
 
 S
 
 
 {\displaystyle S}
 
 becomes empty:

<ol><li>Choose a point 
 
 
 
 p
 
 
 {\displaystyle p}
 
 uniformly at random from 
 
 
 
 S
 
 
 {\displaystyle S}
 
.</li>
<li>Compute the distances from 
 
 
 
 p
 
 
 {\displaystyle p}
 
 to all the other points of 
 
 
 
 S
 
 
 {\displaystyle S}
 
 and let 
 
 
 
 d
 
 
 {\displaystyle d}
 
 be the minimum such distance.</li>
<li>Round the input points to a square grid of size 
 
 
 
 d
 
 /
 
 (
 2
 
 
 k
 
 
 )
 
 
 {\displaystyle d/(2{\sqrt {k}})}
 
, and delete from 
 
 
 
 S
 
 
 {\displaystyle S}
 
 all points whose Moore neighborhood has no other points.</li></ol>
The approximate distance found by this filtering process is the final value of 
 
 
 
 d
 
 
 {\displaystyle d}
 
, computed in the step before 
 
 
 
 S
 
 
 {\displaystyle S}
 
 becomes empty. Each step removes all points whose closest neighbor is at distance 
 
 
 
 d
 
 
 {\displaystyle d}
 
 or greater, at least half of the points in expectation, from which it follows that the total expected time for filtering is linear. Once an approximate value of 
 
 
 
 d
 
 
 {\displaystyle d}
 
 is known, it can be used for the final steps of Rabin's algorithm; in these steps each grid point has a constant number of inputs rounded to it, so again the time is linear.<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a>

<h2 id="dynamic-closest-pair-problem">Dynamic closest-pair problem</h2>
The <a href="/facts/Dynamic_problem_(algorithms)/weVxHwh4">dynamic version</a> for the closest-pair problem is stated as follows:

<ul><li>Given a <a href="/facts/Set_(abstract_data_type)/ovoXUmF9">dynamic set</a> of objects, find algorithms and <a href="/facts/Data_structure/yNsNvL1I">data structures</a> for efficient recalculation of the closest pair of objects each time the objects are inserted or deleted.</li></ul>
If the <a href="/facts/Bounding_box/oFHL8bxI">bounding box</a> for all points is known in advance and the constant-time floor function is available, then the expected 
 
 
 
 O
 (
 n
 )
 
 
 {\displaystyle O(n)}
 
-space data structure was suggested that supports expected-time 
 
 
 
 O
 (
 log
 ⁡
 n
 )
 
 
 {\displaystyle O(\log n)}
 
 insertions and deletions and constant query time. When modified for the algebraic decision tree model, insertions and deletions would require 
 
 
 
 O
 (
 
 log
 
 2
 
 
 ⁡
 n
 )
 
 
 {\displaystyle O(\log ^{2}n)}
 
 expected time.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a> The complexity of the dynamic closest pair algorithm cited above is exponential in the dimension 
 
 
 
 d
 
 
 {\displaystyle d}
 
, and therefore such an algorithm becomes less suitable for high-dimensional problems.
An algorithm for the dynamic closest-pair problem in 
 
 
 
 d
 
 
 {\displaystyle d}
 
 dimensional space was developed by Sergey Bespamyatnikh in 1998.<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a> Points can be inserted and deleted in 
 
 
 
 O
 (
 log
 ⁡
 n
 )
 
 
 {\displaystyle O(\log n)}
 
 time per point (in the worst case).

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/GIS/PoP1kVoa">GIS</a></li>
<li><a href="/facts/Nearest_neighbor_search/R6QTyAph">Nearest neighbor search</a></li></ul>
<h2 id="notes">Notes</h2>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Shamos, Michael Ian; Hoey, Dan (1975). "Closest-point problems". 16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975. IEEE Computer Society. pp. 151–162. doi:10.1109/SFCS.1975.8. <a href="/wiki/Michael_Ian_Shamos" target="_blank">/wiki/Michael_Ian_Shamos</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Rabin, M. (1976). "Probabilistic algorithms". Algorithms and Complexity: Recent Results and New Directions. Academic Press. pp. 21–39. As cited by Khuller & Matias (1995). <a href="/wiki/Michael_O._Rabin" target="_blank">/wiki/Michael_O._Rabin</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Khuller, Samir; Matias, Yossi (1995). "A simple randomized sieve algorithm for the closest-pair problem". Information and Computation. 118 (1): 34–37. doi:10.1006/inco.1995.1049. MR 1329236. S2CID 206566076. <a href="/wiki/Samir_Khuller" target="_blank">/wiki/Samir_Khuller</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Lipton, Richard (24 September 2011). "Rabin Flips a Coin". Gödel's Lost Letter and P=NP. <a href="/wiki/Richard_Lipton" target="_blank">/wiki/Richard_Lipton</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Fortune, Steve; Hopcroft, John (1979). "A note on Rabin's nearest-neighbor algorithm". Information Processing Letters. 8 (1): 20–23. doi:10.1016/0020-0190(79)90085-1. hdl:1813/7460. MR 0515507. <a href="/wiki/John_Hopcroft" target="_blank">/wiki/John_Hopcroft</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Clarkson, Kenneth L. (1983). "Fast algorithms for the all nearest neighbors problem". 24th Annual Symposium on Foundations of Computer Science, Tucson, Arizona, USA, 7-9 November 1983. IEEE Computer Society. pp. 226–232. doi:10.1109/SFCS.1983.16. ISBN 0-8186-0508-1. <a href="0-8186-0508-1" target="_blank">0-8186-0508-1</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. "33.4: Finding the closest pair of points". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 957–961. ISBN 0-262-03293-7. <a href="0-262-03293-7" target="_blank">0-262-03293-7</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Kleinberg, Jon M.; Tardos, Éva (2006). "5.4 Finding the closest pair of points". Algorithm Design. Addison-Wesley. pp. 225–231. ISBN 978-0-321-37291-8. <a href="978-0-321-37291-8" target="_blank">978-0-321-37291-8</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Lipton, Richard (24 September 2011). "Rabin Flips a Coin". Gödel's Lost Letter and P=NP. <a href="/wiki/Richard_Lipton" target="_blank">/wiki/Richard_Lipton</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Khuller, Samir; Matias, Yossi (1995). "A simple randomized sieve algorithm for the closest-pair problem". Information and Computation. 118 (1): 34–37. doi:10.1006/inco.1995.1049. MR 1329236. S2CID 206566076. <a href="/wiki/Samir_Khuller" target="_blank">/wiki/Samir_Khuller</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Golin, Mordecai; Raman, Rajeev; Schwarz, Christian; Smid, Michiel (1998). "Randomized data structures for the dynamic closest-pair problem" (PDF). SIAM Journal on Computing. 27 (4): 1036–1072. doi:10.1137/S0097539794277718. MR 1622005. S2CID 1242364. <a href="http://repository.ust.hk/ir/bitstream/1783.1-1429/1/27771.pdf" target="_blank">http://repository.ust.hk/ir/bitstream/1783.1-1429/1/27771.pdf</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Bespamyatnikh, S. N. (1998). "An optimal algorithm for closest-pair maintenance". Discrete & Computational Geometry. 19 (2): 175–195. doi:10.1007/PL00009340. MR 1600047. <a href="https://doi.org/10.1007%2FPL00009340" target="_blank">https://doi.org/10.1007%2FPL00009340</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
</ol>

Closest pair of points problem open-in-new

Closest pair of points problem