Low-rank matrix approximations

Low-rank matrix approximations are essential tools in the application of kernel methods to large-scale learning problems.
Kernel methods (for instance, <a href="/facts/Support_vector_machine/XobxpdBG">support vector machines</a> or <a href="/facts/Gaussian_process/MrBq7kYW">Gaussian processes</a>) project data points into a high-dimensional or infinite-dimensional <a href="/facts/Feature_vector/nzAhYxfu">feature space</a> and find the optimal splitting hyperplane. In the <a href="/facts/Kernel_method/fYuURIPk">kernel method</a> the data is represented in a kernel matrix (or, <a href="/facts/Gramian_matrix/SGb2VGTU">Gram matrix</a>). Many algorithms can solve <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a> problems using the kernel matrix. The main problem of <a href="/facts/Kernel_method/fYuURIPk">kernel method</a> is its high <a href="/facts/Algorithmic_efficiency/VutjvPTd">computational cost</a> associated with kernel matrices. The cost is at least quadratic in the number of training data points, but most <a href="/facts/Kernel_method/fYuURIPk">kernel methods</a> include computation of <a href="/facts/Invertible_matrix/fPqXk3V8">matrix inversion</a> or <a href="/facts/Eigendecomposition_of_a_matrix/a2nfF7hJ">eigenvalue decomposition</a> and the cost becomes cubic in the number of training data. Large training sets cause large <a href="/facts/Algorithmic_efficiency/VutjvPTd">storage and computational costs</a>. While low rank decomposition methods (<a href="/facts/Cholesky_decomposition/Q3IjkuTs">Cholesky decomposition</a>) reduce this cost, they still require computing the kernel matrix. One of the approaches to deal with this problem is low-rank matrix approximations. The most popular examples of them are the Nyström approximation and randomized feature maps approximation methods. Both of them have been successfully applied to efficient kernel learning.

Low-rank matrix approximations open-in-new

Low-rank matrix approximations