Following 5 and,6 diffusion maps can be defined in four steps.
Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let ( X , A , μ ) {\displaystyle (X,{\mathcal {A}},\mu )} be a measure space, where X {\displaystyle X} is the data set and μ {\displaystyle \mu } represents the distribution of the points on X {\displaystyle X} .
Based on this, the connectivity k {\displaystyle k} between two data points, x {\displaystyle x} and y {\displaystyle y} , can be defined as the probability of walking from x {\displaystyle x} to y {\displaystyle y} in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points: k : X × X → R {\displaystyle k:X\times X\rightarrow \mathbb {R} } . For example, the popular Gaussian kernel:
More generally, the kernel function has the following properties
( k {\displaystyle k} is symmetric)
( k {\displaystyle k} is positivity preserving).
The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once.
Given ( X , k ) {\displaystyle (X,k)} , we can then construct a reversible discrete-time Markov chain on X {\displaystyle X} (a process known as the normalized graph Laplacian construction):
and define:
Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property:
From p ( x , y ) {\displaystyle p(x,y)} we can construct a transition matrix of a Markov chain ( M {\displaystyle M} ) on X {\displaystyle X} . In other words, p ( x , y ) {\displaystyle p(x,y)} represents the one-step transition probability from x {\displaystyle x} to y {\displaystyle y} , and M t {\displaystyle M^{t}} gives the t-step transition matrix.
We define the diffusion matrix L {\displaystyle L} (it is also a version of graph Laplacian matrix)
We then define the new kernel
or equivalently,
where D is a diagonal matrix and D i , i = ∑ j L i , j . {\displaystyle D_{i,i}=\sum _{j}L_{i,j}.}
We apply the graph Laplacian normalization to this new kernel:
where D ( α ) {\displaystyle D^{(\alpha )}} is a diagonal matrix and D i , i ( α ) = ∑ j L i , j ( α ) . {\displaystyle {D}_{i,i}^{(\alpha )}=\sum _{j}L_{i,j}^{(\alpha )}.}
One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of M {\displaystyle M} ) reveals the geometric structure of X {\displaystyle X} at larger and larger scales (the diffusion process). Specifically, the notion of a cluster in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Therefore, t not only serves as a time parameter, but it also has the dual role of scale parameter.
The eigendecomposition of the matrix M t {\displaystyle M^{t}} yields
where { λ l } {\displaystyle \{\lambda _{l}\}} is the sequence of eigenvalues of M {\displaystyle M} and { ψ l } {\displaystyle \{\psi _{l}\}} and { ϕ l } {\displaystyle \{\phi _{l}\}} are the biorthogonal left and right eigenvectors respectively. Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum.
The reason to introduce the normalization step involving α {\displaystyle \alpha } is to tune the influence of the data point density on the infinitesimal transition of the diffusion. In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing. In this case, we can set α = 1 {\displaystyle \alpha =1} and the diffusion operator approximates the Laplace–Beltrami operator. We then recover the Riemannian geometry of the data set regardless of the distribution of the points. To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use α = 0.5 {\displaystyle \alpha =0.5} and the resulting Markov chain approximates the Fokker–Planck diffusion. With α = 0 {\displaystyle \alpha =0} , it reduces to the classical graph Laplacian normalization.
The diffusion distance at time t {\displaystyle t} between two points can be measured as the similarity of two points in the observation space with the connectivity between them. It is given by
where ϕ 0 ( y ) {\displaystyle \phi _{0}(y)} is the stationary distribution of the Markov chain, given by the first left eigenvector of M {\displaystyle M} . Explicitly:
Intuitively, D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} is small if there is a large number of short paths connecting x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} . There are several interesting features associated with the diffusion distance, based on our previous discussion that t {\displaystyle t} also serves as a scale parameter:
The diffusion distance can be calculated using the eigenvectors by
So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as:
Because of the spectrum decay, it is sufficient to use only the first k eigenvectors and eigenvalues. Thus we get the diffusion map from the original data to a k-dimensional space which is embedded in the original space.
In 8 it is proved that
so the Euclidean distance in the diffusion coordinates approximates the diffusion distance.
The basic algorithm framework of diffusion map is as:
Step 1. Given the similarity matrix L.
Step 2. Normalize the matrix according to parameter α {\displaystyle \alpha } : L ( α ) = D − α L D − α {\displaystyle L^{(\alpha )}=D^{-\alpha }LD^{-\alpha }} .
Step 3. Form the normalized matrix M = ( D ( α ) ) − 1 L ( α ) {\displaystyle M=({D}^{(\alpha )})^{-1}L^{(\alpha )}} .
Step 4. Compute the k largest eigenvalues of M t {\displaystyle M^{t}} and the corresponding eigenvectors.
Step 5. Use diffusion map to get the embedding Ψ t {\displaystyle \Psi _{t}} .
In the paper 9 Nadler et al. showed how to design a kernel that reproduces the diffusion induced by a Fokker–Planck equation. They also explained that, when the data approximate a manifold, one can recover the geometry of this manifold by computing an approximation of the Laplace–Beltrami operator. This computation is completely insensitive to the distribution of the points and therefore provides a separation of the statistics and the geometry of the data. Since diffusion maps give a global description of the data-set, they can measure the distances between pairs of sample points in the manifold in which the data is embedded. Applications based on diffusion maps include face recognition,10 spectral clustering, low dimensional representation of images, image segmentation,11 3D model segmentation,12 speaker verification13 and identification,14 sampling on manifolds, anomaly detection,1516 image inpainting,17 revealing brain resting state networks organization 18 and so on.
Furthermore, the diffusion maps framework has been productively extended to complex networks,19 revealing a functional organisation of networks which differs from the purely topological or structural one.
Coifman, R.R.; Lafon, S; Lee, A B; Maggioni, M; Nadler, B; Warner, F; Zucker, S W (2005). "Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps". PNAS. 102 (21): 7426–7431. Bibcode:2005PNAS..102.7426C. doi:10.1073/pnas.0500334102. PMC 1140422. PMID 15899970. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1140422 ↩
Coifman, R.R.; Lafon, S; Lee, A B; Maggioni, M; Nadler, B; Warner, F; Zucker, S W (2005). "Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods". PNAS. 102 (21): 7432–7437. Bibcode:2005PNAS..102.7432C. doi:10.1073/pnas.0500896102. PMC 1140426. PMID 15899969. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1140426 ↩
Coifman, R.R.; S. Lafon. (2006). "Diffusion maps". Applied and Computational Harmonic Analysis. 21: 5–30. doi:10.1016/j.acha.2006.04.006. S2CID 17160669. /wiki/Doi_(identifier) ↩
Lafon, S.S. (2004). Diffusion Maps and Geometric Harmonics (PDF) (PhD). Yale University. https://690cd646-a-62cb3a1a-s-sites.googlegroups.com/site/stefansresearchpapers/home/dissertation.pdf ↩
De la Porte, J.; Herbst, B M; Hereman, W; Van der Walt, S J (2008). "An Introduction to Diffusion Maps". Proceedings of the Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA). CiteSeerX 10.1.1.309.674. /wiki/CiteSeerX_(identifier) ↩
Nadler, Boaz; Stéphane Lafon; Ronald R. Coifman; Ioannis G. Kevrekidis (2005). "Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker–Planck Operators" (PDF). Advances in Neural Information Processing Systems. 18. arXiv:math/0506090. Bibcode:2005math......6090N. http://www.wisdom.weizmann.ac.il/~nadler/Publications/dm_nips05.pdf ↩
Barkan, Oren; Weill, Jonathan; Wolf, Lior; Aronowitz, Hagai. "Fast high dimensional vector multiplication face recognition" (PDF). Proceedings of the IEEE International Conference on Computer Vision 2013: 1960–1967. http://www.cv-foundation.org/openaccess/content_iccv_2013/papers/Barkan_Fast_High_Dimensional_2013_ICCV_paper.pdf ↩
Zeev, Farbman; Fattal Raanan; Lischinski Dani (2010). "Diffusion maps for edge-aware image editing". ACM Trans. Graph. 29 (6): 145:1–145:10. doi:10.1145/1882261.1866171. /wiki/Doi_(identifier) ↩
Oana, Sidi; van Kaick, Oliver; Kleiman, Yanir; Zhang, Hao; Cohen-Or, Daniel (2011). Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering (PDF). ACM Transactions on Graphics. https://www.cs.sfu.ca/~haoz/pubs/sidi_siga11_coseg.pdf ↩
Barkan, Oren; Aronowitz, Hagai (2013). "Diffusion maps for PLDA-based speaker verification" (PDF). Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing: 7639–7643. http://smartfp7.eu/sites/default/files/field/files/page/DIFFUSION%20MAPS%20FOR%20PLDA-BASED%20SPEAKER%20VERIFICATION.pdf ↩
Michalevsky, Yan; Talmon, Ronen; Cohen, Israel (2011). Speaker Identification Using Diffusion Maps (PDF). European signal processing conference 2011. http://www.eurasip.org/Proceedings/Eusipco/Eusipco2011/papers/1569414913.pdf ↩
Mishne, Gal; Cohen, Israel (2013). "Multiscale Anomaly Detection Using Diffusion Maps". IEEE Selected Topics in Signal Processing. 7 (1): 111–123. Bibcode:2013ISTSP...7..111M. doi:10.1109/jstsp.2012.2232279. S2CID 1954466. /wiki/Bibcode_(identifier) ↩
Shabat, Gil; Segev, David; Averbuch, Amir (2018-01-07). "Uncovering Unknown Unknowns in Financial Services Big Data by Unsupervised Methodologies: Present and Future trends". KDD 2017 Workshop on Anomaly Detection in Finance. 71: 8–19. http://proceedings.mlr.press/v71/shabat18a.html ↩
Gepshtein, Shai; Keller, Yosi (2013). "Image Completion by Diffusion Maps and Spectral Relaxation". IEEE Transactions on Image Processing. 22 (8): 2983–2994. Bibcode:2013ITIP...22.2983G. doi:10.1109/tip.2013.2237916. PMID 23322762. S2CID 14375333. /wiki/Bibcode_(identifier) ↩
Margulies, Daniel S.; Ghosh, Satrajit S.; Goulas, Alexandros; Falkiewicz, Marcel; Huntenburg, Julia M.; Langs, Georg; Bezgin, Gleb; Eickhoff, Simon B.; Castellanos, F. Xavier; Petrides, Michael; Jefferies, Elizabeth; Smallwood, Jonathan (2016). "Situating the default-mode network along a principal gradient of macroscale cortical organization". Proceedings of the National Academy of Sciences. 113 (44): 12574–12579. Bibcode:2016PNAS..11312574M. doi:10.1073/pnas.1608282113. PMC 5098630. PMID 27791099. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098630 ↩
De Domenico, Manlio (2017). "Diffusion geometry unravels the emergence of functional clusters in collective phenomena". Physical Review Letters. 118 (16): 168301. arXiv:1704.07068. Bibcode:2017PhRvL.118p8301D. doi:10.1103/PhysRevLett.118.168301. PMID 28474920. S2CID 2638868. https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.118.168301 ↩