Manifold alignment assumes that disparate data sets produced by similar generating processes will share a similar underlying manifold representation. By learning projections from each original space to the shared manifold, correspondences are recovered and knowledge from one domain can be transferred to another. Most manifold alignment techniques consider only two data sets, but the concept extends to arbitrarily many initial data sets.
Consider the case of aligning two data sets, X {\displaystyle X} and Y {\displaystyle Y} , with X i ∈ R m {\displaystyle X_{i}\in \mathbb {R} ^{m}} and Y i ∈ R n {\displaystyle Y_{i}\in \mathbb {R} ^{n}} .
Manifold alignment algorithms attempt to project both X {\displaystyle X} and Y {\displaystyle Y} into a new d-dimensional space such that the projections both minimize distance between corresponding points and preserve the local manifold structure of the original data. The projection functions are denoted:
ϕ X : R m → R d {\displaystyle \phi _{X}:\,\mathbb {R} ^{m}\rightarrow \mathbb {R} ^{d}}
ϕ Y : R n → R d {\displaystyle \phi _{Y}:\,\mathbb {R} ^{n}\rightarrow \mathbb {R} ^{d}}
Let W {\displaystyle W} represent the binary correspondence matrix between points in X {\displaystyle X} and Y {\displaystyle Y} :
W i , j = { 1 i f X i ↔ Y j 0 o t h e r w i s e {\displaystyle W_{i,j}={\begin{cases}1&if\,X_{i}\leftrightarrow Y_{j}\\0&otherwise\end{cases}}}
Let S X {\displaystyle S_{X}} and S Y {\displaystyle S_{Y}} represent pointwise similarities within data sets. This is usually encoded as the heat kernel of the adjacency matrix of a k-nearest neighbor graph.
Finally, introduce a coefficient 0 ≤ μ ≤ 1 {\displaystyle 0\leq \mu \leq 1} , which can be tuned to adjust the weight of the 'preserve manifold structure' goal, versus the 'minimize corresponding point distances' goal.
With these definitions in place, the loss function for manifold alignment can be written:
arg min ϕ X , ϕ Y μ ∑ i , j ‖ ϕ X ( X i ) − ϕ X ( X j ) ‖ 2 S X , i , j + μ ∑ i , j ‖ ϕ Y ( Y i ) − ϕ Y ( Y j ) ‖ 2 S Y , i , j + ( 1 − μ ) ∑ i , j ‖ ϕ X ( X i ) − ϕ Y ( Y j ) ‖ 2 W i , j {\displaystyle \arg \min _{\phi _{X},\phi _{Y}}\mu \sum _{i,j}\left\Vert \phi _{X}\left(X_{i}\right)-\phi _{X}\left(X_{j}\right)\right\Vert ^{2}S_{X,i,j}+\mu \sum _{i,j}\left\Vert \phi _{Y}\left(Y_{i}\right)-\phi _{Y}\left(Y_{j}\right)\right\Vert ^{2}S_{Y,i,j}+\left(1-\mu \right)\sum _{i,j}\Vert \phi _{X}\left(X_{i}\right)-\phi _{Y}\left(Y_{j}\right)\Vert ^{2}W_{i,j}}
Solving this optimization problem is equivalent to solving a generalized eigenvalue problem using the graph laplacian3 of the joint matrix, G:
G = [ μ S X ( 1 − μ ) W ( 1 − μ ) W T μ S Y ] {\displaystyle G=\left[{\begin{array}{cc}\mu S_{X}&\left(1-\mu \right)W\\\left(1-\mu \right)W^{T}&\mu S_{Y}\end{array}}\right]}
The algorithm described above requires full pairwise correspondence information between input data sets; a supervised learning paradigm. However, this information is usually difficult or impossible to obtain in real world applications. Recent work has extended the core manifold alignment algorithm to semi-supervised 4 , unsupervised 5 , and multiple-instance 6 settings.
The algorithm described above performs a "one-step" alignment, finding embeddings for both data sets at the same time. A similar effect can also be achieved with "two-step" alignments 7 8 , following a slightly modified procedure:
Manifold alignment can be used to find linear (feature-level) projections, or nonlinear (instance-level) embeddings. While the instance-level version generally produces more accurate alignments, it sacrifices a great degree of flexibility as the learned embedding is often difficult to parameterize. Feature-level projections allow any new instances to be easily embedded in the manifold space, and projections may be combined to form direct mappings between the original data representations. These properties are especially important for knowledge-transfer applications.
Manifold alignment is suited to problems with several corpora that lie on a shared manifold, even when each corpus is of a different dimensionality. Many real-world problems fit this description, but traditional techniques are not able to take advantage of all corpora at the same time. Manifold alignment also facilitates transfer learning, in which knowledge of one domain is used to jump-start learning in correlated domains.
Applications of manifold alignment include:
Ham, Ji Hun; Daniel D. Lee; Lawrence K. Saul (2003). "Learning high dimensional correspondences from low dimensional manifolds" (PDF). Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003). ftp://ftp.cis.upenn.edu/pub/datamining/public_html/ReadingGroup/papers/corr_icmlws03.pdf ↩
Hotelling, H (1936). "Relations between two sets of variates" (PDF). Biometrika. 28 (3–4): 321–377. doi:10.2307/2333955. JSTOR 2333955. http://cbio.ensmp.fr/~jvert/svn/bibli/local/Hotelling1936Relation.pdf ↩
Belkin, M; P Niyogi (2003). "Laplacian eigenmaps for dimensionality reduction and data representation" (PDF). Neural Computation. 15 (6): 1373–1396. CiteSeerX 10.1.1.192.8814. doi:10.1162/089976603321780317. S2CID 14879317. http://www.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf ↩
Ham, Ji Hun; Daniel D. Lee; Lawrence K. Saul (2005). "Semisupervised alignment of manifolds" (PDF). Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence. http://cseweb.ucsd.edu/~saul/papers/semi_aistats05.pdf ↩
Wang, Chang; Sridhar Mahadevan (2009). Manifold Alignment without Correspondence (PDF). The 21st International Joint Conference on Artificial Intelligence.[permanent dead link] http://www.cs.umass.edu/~chwang/papers/IJCAI-2009-MA.pdf ↩
Wang, Chang; Sridhar Mahadevan (2011). Heterogeneous Domain Adaptation using Manifold Alignment (PDF). The 22nd International Joint Conference on Artificial Intelligence. Archived from the original (PDF) on 2012-04-15. Retrieved 2011-12-14. https://web.archive.org/web/20120415030530/http://www-all.cs.umass.edu/~chwang/IJCAI2011-DA.pdf ↩
Lafon, Stephane; Yosi Keller; Ronald R. Coifman (2006). "Data fusion and multicue data matching by diffusion maps" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (11): 1784–1797. CiteSeerX 10.1.1.419.1814. doi:10.1109/tpami.2006.223. PMID 17063683. S2CID 1186335.[permanent dead link] http://www.eng.biu.ac.il/~kellery1/share/dataanalysis/kernel%20methods/Data%20Fusion%20and%20Multicue%20Data%20Matching%20by%20Diffusion%20Maps.pdf ↩
Wang, Chang; Sridhar Mahadevan (2008). Manifold Alignment using Procrustes Analysis (PDF). The 25th International Conference on Machine Learning.[permanent dead link] http://www.cs.umass.edu/~chwang/papers/ICML-2008.pdf ↩
Makondo, Ndivhuwo; Benjamin Rosman; Osamu Hasegawa (2015). Knowledge Transfer for Learning Robot Models via Local Procrustes Analysis. The 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids). CiteSeerX 10.1.1.728.8830. doi:10.1109/HUMANOIDS.2015.7363502. /wiki/CiteSeerX_(identifier) ↩