Count sketch

<h2 id="intuitive-explanation">Intuitive explanation</h2>
The inventors of this data structure offer the following iterative explanation of its operation:<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>

<ul><li>at the simplest level, the output of a single <a href="/facts/Hash_function/rk6MlCt3">hash function</a> s mapping stream elements q into {+1, -1} is feeding a single <a href="/facts/Up%2fdown_counter/SD2V8R8w">up/down counter</a> C. After a single pass over the data, the frequency 
 
 
 
 n
 (
 q
 )
 
 
 {\displaystyle n(q)}
 
 of a stream element q can be approximated, although extremely poorly, by the <a href="/facts/Expected_value/1XV0JKL8">expected value</a> 
 
 
 
 
 
 E
 
 
 [
 C
 ⋅
 s
 (
 q
 )
 ]
 
 
 {\displaystyle {\mathbf {E}}[C\cdot s(q)]}
 
;</li>
<li>a straightforward way to improve the <a href="/facts/Variance/ULBJKXD1">variance</a> of the previous estimate is to use an array of different hash functions 
 
 
 
 
 s
 
 i
 
 
 
 
 {\displaystyle s_{i}}
 
, each connected to its own counter 
 
 
 
 
 C
 
 i
 
 
 
 
 {\displaystyle C_{i}}
 
. For each element q, the 
 
 
 
 
 
 E
 
 
 [
 
 C
 
 i
 
 
 ⋅
 
 s
 
 i
 
 
 (
 q
 )
 ]
 =
 n
 (
 q
 )
 
 
 {\displaystyle {\mathbf {E}}[C_{i}\cdot s_{i}(q)]=n(q)}
 
 still holds, so averaging across the i range will tighten the approximation;</li>
<li>the previous construct still has a major deficiency: if a lower-frequency-but-still-important output element a exhibits a <a href="/facts/Hash_collision/QwAesPFG">hash collision</a> with a high-frequency element, 
 
 
 
 n
 (
 a
 )
 
 
 {\displaystyle n(a)}
 
 estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each 
 
 
 
 
 C
 
 i
 
 
 
 
 {\displaystyle C_{i}}
 
 in the previous construct with an array of m counters (making the counter set into a two-dimensional matrix 
 
 
 
 
 C
 
 i
 ,
 j
 
 
 
 
 {\displaystyle C_{i,j}}
 
), with index j of a particular counter to be incremented/decremented selected via another set of hash functions 
 
 
 
 
 h
 
 i
 
 
 
 
 {\displaystyle h_{i}}
 
 that map element q into the range {1..m}. Since 
 
 
 
 
 
 E
 
 
 [
 
 C
 
 i
 ,
 
 h
 
 i
 
 
 (
 q
 )
 
 
 ⋅
 
 s
 
 i
 
 
 (
 q
 )
 ]
 =
 n
 (
 q
 )
 
 
 {\displaystyle {\mathbf {E}}[C_{i,h_{i}(q)}\cdot s_{i}(q)]=n(q)}
 
, averaging across all values of i will work.</li></ul>
<h2 id="mathematical-definition">Mathematical definition</h2>
1. For constants 
 
 
 
 w
 
 
 {\displaystyle w}
 
 and 
 
 
 
 t
 
 
 {\displaystyle t}
 
 (to be defined later) independently choose 
 
 
 
 d
 =
 2
 t
 +
 1
 
 
 {\displaystyle d=2t+1}
 
 random hash functions

h
          
            1
          
        
        ,
        …
        ,
        
          h
          
            d
          
        
      
    
    {\displaystyle h_{1},\dots ,h_{d}}
  
 and 
  
    
      
        
          s
          
            1
          
        
        ,
        …
        ,
        
          s
          
            d
          
        
      
    
    {\displaystyle s_{1},\dots ,s_{d}}
  
 such that

h
          
            i
          
        
        :
        [
        n
        ]
        →
        [
        w
        ]
      
    
    {\displaystyle h_{i}:[n]\to [w]}
  
 and

s
 
 i
 
 
 :
 [
 n
 ]
 →
 {
 ±
 1
 }
 
 
 {\displaystyle s_{i}:[n]\to \{\pm 1\}}
 
.
It is necessary that the hash families from which 
 
 
 
 
 h
 
 i
 
 
 
 
 {\displaystyle h_{i}}
 
 and 
 
 
 
 
 s
 
 i
 
 
 
 
 {\displaystyle s_{i}}
 
 are chosen be pairwise independent.
2. For each item 
 
 
 
 
 q
 
 i
 
 
 
 
 {\displaystyle q_{i}}
 
 in the stream, add 
 
 
 
 
 s
 
 j
 
 
 (
 
 q
 
 i
 
 
 )
 
 
 {\displaystyle s_{j}(q_{i})}
 
 to the 
 
 
 
 
 h
 
 j
 
 
 (
 
 q
 
 i
 
 
 )
 
 
 {\displaystyle h_{j}(q_{i})}
 
th bucket of the 
 
 
 
 j
 
 
 {\displaystyle j}
 
th hash.
At the end of this process, one has 
 
 
 
 w
 d
 
 
 {\displaystyle wd}
 
 sums 
 
 
 
 (
 
 C
 
 i
 j
 
 
 )
 
 
 {\displaystyle (C_{ij})}
 
 where

C
          
            i
            ,
            j
          
        
        =
        
          ∑
          
            
              h
              
                i
              
            
            (
            k
            )
            =
            j
          
        
        
          s
          
            i
          
        
        (
        k
        )
        .
      
    
    {\displaystyle C_{i,j}=\sum _{h_{i}(k)=j}s_{i}(k).}

To estimate the count of 
 
 
 
 q
 
 
 {\displaystyle q}
 
s one computes the following value:

r
          
            q
          
        
        =
        
          
            median
          
          
            i
            =
            1
          
          
            d
          
        
        
        
          s
          
            i
          
        
        (
        q
        )
        ⋅
        
          C
          
            i
            ,
            
              h
              
                i
              
            
            (
            q
            )
          
        
        .
      
    
    {\displaystyle r_{q}={\text{median}}_{i=1}^{d}\,s_{i}(q)\cdot C_{i,h_{i}(q)}.}

The values 
 
 
 
 
 s
 
 i
 
 
 (
 q
 )
 ⋅
 
 C
 
 i
 ,
 
 h
 
 i
 
 
 (
 q
 )
 
 
 
 
 {\displaystyle s_{i}(q)\cdot C_{i,h_{i}(q)}}
 
 are unbiased estimates of how many times 
 
 
 
 q
 
 
 {\displaystyle q}
 
 has appeared in the stream.
The estimate 
 
 
 
 
 r
 
 q
 
 
 
 
 {\displaystyle r_{q}}
 
 has variance 
 
 
 
 O
 (
 
 m
 i
 n
 
 {
 
 m
 
 1
 
 
 2
 
 
 
 /
 
 
 w
 
 2
 
 
 ,
 
 m
 
 2
 
 
 2
 
 
 
 /
 
 w
 }
 )
 
 
 {\displaystyle O(\mathrm {min} \{m_{1}^{2}/w^{2},m_{2}^{2}/w\})}
 
, where

m
 
 1
 
 
 
 
 {\displaystyle m_{1}}
 
 is the length of the stream and 
 
 
 
 
 m
 
 2
 
 
 2
 
 
 
 
 {\displaystyle m_{2}^{2}}
 
 is 
 
 
 
 
 ∑
 
 q
 
 
 (
 
 ∑
 
 i
 
 
 [
 
 q
 
 i
 
 
 =
 q
 ]
 
 )
 
 2
 
 
 
 
 {\displaystyle \sum _{q}(\sum _{i}[q_{i}=q])^{2}}
 
.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a>
Furthermore, 
 
 
 
 
 r
 
 q
 
 
 
 
 {\displaystyle r_{q}}
 
 is guaranteed to never be more than 
 
 
 
 2
 
 m
 
 2
 
 
 
 /
 
 
 
 w
 
 
 
 
 {\displaystyle 2m_{2}/{\sqrt {w}}}
 
 off from the true value, with probability 
 
 
 
 1
 −
 
 e
 
 −
 O
 (
 t
 )
 
 
 
 
 {\displaystyle 1-e^{-O(t)}}
 
.

<h3>Vector formulation</h3>
Alternatively Count-Sketch can be seen as a linear mapping with a non-linear reconstruction function.
Let 
 
 
 
 
 M
 
 (
 i
 ∈
 [
 d
 ]
 )
 
 
 ∈
 {
 −
 1
 ,
 0
 ,
 1
 
 }
 
 w
 ×
 n
 
 
 
 
 {\displaystyle M^{(i\in [d])}\in \{-1,0,1\}^{w\times n}}
 
, be a collection of 
 
 
 
 d
 =
 2
 t
 +
 1
 
 
 {\displaystyle d=2t+1}
 
 matrices, defined by

M
          
            
              h
              
                i
              
            
            (
            j
            )
            ,
            j
          
          
            (
            i
            )
          
        
        =
        
          s
          
            i
          
        
        (
        j
        )
      
    
    {\displaystyle M_{h_{i}(j),j}^{(i)}=s_{i}(j)}

for 
 
 
 
 j
 ∈
 [
 w
 ]
 
 
 {\displaystyle j\in [w]}
 
 and 0 everywhere else.
Then a vector 
 
 
 
 v
 ∈
 
 
 R
 
 
 n
 
 
 
 
 {\displaystyle v\in \mathbb {R} ^{n}}
 
 is sketched by 
 
 
 
 
 C
 
 (
 i
 )
 
 
 =
 
 M
 
 (
 i
 )
 
 
 v
 ∈
 
 
 R
 
 
 w
 
 
 
 
 {\displaystyle C^{(i)}=M^{(i)}v\in \mathbb {R} ^{w}}
 
.
To reconstruct 
 
 
 
 v
 
 
 {\displaystyle v}
 
 we take 
 
 
 
 
 v
 
 j
 
 
 ∗
 
 
 =
 
 
 median
 
 
 i
 
 
 
 C
 
 j
 
 
 (
 i
 )
 
 
 
 s
 
 i
 
 
 (
 j
 )
 
 
 {\displaystyle v_{j}^{*}={\text{median}}_{i}C_{j}^{(i)}s_{i}(j)}
 
.
This gives the same guarantees as stated above, if we take 
 
 
 
 
 m
 
 1
 
 
 =
 ‖
 v
 
 ‖
 
 1
 
 
 
 
 {\displaystyle m_{1}=\|v\|_{1}}
 
 and 
 
 
 
 
 m
 
 2
 
 
 =
 ‖
 v
 
 ‖
 
 2
 
 
 
 
 {\displaystyle m_{2}=\|v\|_{2}}
 
.

<h2 id="relation-to-tensor-sketch">Relation to Tensor sketch</h2>
The count sketch projection of the <a href="/facts/Outer_product/qR9C2BU1">outer product</a> of two vectors is equivalent to the <a href="/facts/Convolution/PVPrdz9J">convolution</a> of two component count sketches. 
The count sketch computes a vector <a href="/facts/Convolution/PVPrdz9J">convolution</a> 

 
 
 
 
 C
 
 (
 1
 )
 
 
 x
 ∗
 
 C
 
 (
 2
 )
 
 
 
 x
 
 T
 
 
 
 
 {\displaystyle C^{(1)}x\ast C^{(2)}x^{T}}
 
, where 
 
 
 
 
 C
 
 (
 1
 )
 
 
 
 
 {\displaystyle C^{(1)}}
 
 and 
 
 
 
 
 C
 
 (
 2
 )
 
 
 
 
 {\displaystyle C^{(2)}}
 
 are independent count sketch matrices.
Pham and Pagh<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a> show that this equals 
 
 
 
 C
 (
 x
 ⊗
 
 x
 
 T
 
 
 )
 
 
 {\displaystyle C(x\otimes x^{T})}
 
 – a count sketch 
 
 
 
 C
 
 
 {\displaystyle C}
 
 of the <a href="/facts/Outer_product/qR9C2BU1">outer product</a> of vectors, where 
 
 
 
 ⊗
 
 
 {\displaystyle \otimes }
 
 denotes <a href="/facts/Kronecker_product/gABGx6I6">Kronecker product</a>.
The <a href="/facts/Fast_Fourier_transform/caT7UUCh">fast Fourier transform</a> can be used to do fast convolution of count sketches.
By using the <a href="/facts/Khatri%25E2%2580%2593Rao_product/7IDTvWlg">face-splitting product</a><a class="footnote-ref" id="fnref:10" href="#fn:10">10</a><a class="footnote-ref" id="fnref:11" href="#fn:11">11</a><a class="footnote-ref" id="fnref:12" href="#fn:12">12</a> such structures can be computed much faster than normal matrices.

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Count%25E2%2580%2593min_sketch/FQdt8laA">Count–min sketch</a> is a version of algorithm with smaller memory requirements (and weaker error guarantees as a tradeoff).</li>
<li><a href="/facts/Tensor_sketch/b5cYt17P">Tensor sketch</a></li></ul>

<h2 id="further-reading">Further reading</h2>
<ul><li>Charikar, Moses; Chen, Kevin; Farach-Colton, Martin (2004). <a href="https://people.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf">"Finding frequent items in data streams"</a> (PDF). Theoretical Computer Science. 312 (1). Elsevier BV: 3–15. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2Fs0304-3975%2803%2900400-6">10.1016/s0304-3975(03)00400-6</a>. <a href="/facts/ISSN_(identifier)/DPAflDvU">ISSN</a> <a href="https://search.worldcat.org/issn/0304-3975">0304-3975</a>.</li>
<li>Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" <a href="https://ieeexplore.ieee.org/document/7990127">[1]</a>. IEEE Access, Vol. 5. 2017.</li>
<li>Ahle, Thomas; Knudsen, Jakob (2019-09-03). <a href="https://www.researchgate.net/publication/335617805">"Almost Optimal Tensor Sketch"</a>. <a href="/facts/ResearchGate/tyxv0zvV">ResearchGate</a>. Retrieved 2020-07-11.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017. <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Ahle, Thomas; Knudsen, Jakob (2019-09-03). "Almost Optimal Tensor Sketch". ResearchGate. Retrieved 2020-07-11. <a href="https://www.researchgate.net/publication/335617805" target="_blank">https://www.researchgate.net/publication/335617805</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Charikar, Chen & Farach-Colton 2004. - Charikar, Moses; Chen, Kevin; Farach-Colton, Martin (2004). "Finding frequent items in data streams" (PDF). Theoretical Computer Science. 312 (1). Elsevier BV: 3–15. doi:10.1016/s0304-3975(03)00400-6. ISSN 0304-3975. <a href="https://people.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf" target="_blank">https://people.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Journal of Computer and system sciences 58.1 (1999): 137-147. <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Moody, John. "Fast learning in multi-resolution hierarchies." Advances in neural information processing systems. 1989. <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157. <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Charikar, Chen & Farach-Colton 2004. - Charikar, Moses; Chen, Kevin; Farach-Colton, Martin (2004). "Finding frequent items in data streams" (PDF). Theoretical Computer Science. 312 (1). Elsevier BV: 3–15. doi:10.1016/s0304-3975(03)00400-6. ISSN 0304-3975. <a href="https://people.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf" target="_blank">https://people.cs.rutgers.edu/~farach/pubs/FrequentStream.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Larsen, Kasper Green, Rasmus Pagh, and Jakub Tětek. "CountSketches, Feature Hashing and the Median of Three." International Conference on Machine Learning. PMLR, 2021. <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Ninh, Pham; Pagh, Rasmus (2013). Fast and scalable polynomial kernels via explicit feature maps. SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery. doi:10.1145/2487575.2487591. <a href="/wiki/Rasmus_Pagh" target="_blank">/wiki/Rasmus_Pagh</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Slyusar, V. I. (1998). "End products in matrices in radar applications" (PDF). Radioelectronics and Communications Systems. 41 (3): 50–53. <a href="http://slyusar.kiev.ua/en/IZV_1998_3.pdf" target="_blank">http://slyusar.kiev.ua/en/IZV_1998_3.pdf</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Slyusar, V. I. (1997-05-20). "Analytical model of the digital antenna array on a basis of face-splitting matrix products" (PDF). Proc. ICATT-97, Kyiv: 108–109. <a href="http://slyusar.kiev.ua/ICATT97.pdf" target="_blank">http://slyusar.kiev.ua/ICATT97.pdf</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Slyusar, V. I. (March 13, 1998). "A Family of Face Products of Matrices and its Properties" (PDF). Cybernetics and Systems Analysis C/C of Kibernetika I Sistemnyi Analiz.- 1999. 35 (3): 379–384. doi:10.1007/BF02733426. S2CID 119661450. <a href="http://slyusar.kiev.ua/FACE.pdf" target="_blank">http://slyusar.kiev.ua/FACE.pdf</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
</ol>

Count sketch open-in-new

Count sketch