When dimensionality increases, data becomes increasingly sparse while density and distance between points, critical to clustering and outlier analysis, becomes less meaningful. Dimensionality reduction helps reduce noise in the data and allows for easier visualization, such as the example below where 3-dimensional data is transformed into 2 dimensions to show hidden parts. One method of dimensionality reduction is wavelet transform, in which data is transformed to preserve relative distance between objects at different levels of resolution, and is often used for image compression.3
This method of data reduction reduces the data volume by choosing alternate, smaller forms of data representation. Numerosity reduction can be split into 2 groups: parametric and non-parametric methods. Parametric methods (regression, for example) assume the data fits some model, estimate model parameters, store only the parameters, and discard the data. One example of this is in the image below, where the volume of data to be processed is reduced based on more specific criteria. Another example would be a log-linear model, obtaining a value at a point in m-D space as the product on appropriate marginal subspaces. Non-parametric methods do not assume models, some examples being histograms, clustering, sampling, etc.4
Data reduction can be obtained by assuming a statistical model for the data. Classical principles of data reduction include sufficiency, likelihood, conditionality and equivariance.5
"Travel Time Data Collection Handbook" (PDF). Retrieved 6 December 2020. https://www.fhwa.dot.gov/ohim/handbook/chap7.pdf ↩
Iranmanesh, S.; Rodriguez-Villegas, E. (2017). "A 950 nW Analog-Based Data Reduction Chip for Wearable EEG Systems in Epilepsy". IEEE Journal of Solid-State Circuits. 52 (9): 2362–2373. Bibcode:2017IJSSC..52.2362I. doi:10.1109/JSSC.2017.2720636. hdl:10044/1/48764. S2CID 24852887. /wiki/Esther_Rodriguez-Villegas ↩
Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020. http://cis.csuohio.edu/~sschung/CIS660/chapter_3JHan.pdf ↩
Casella, George (2002). Statistical inference. Roger L. Berger. Australia: Thomson Learning. pp. 271–309. ISBN 0-534-24312-6. OCLC 46538638. 0-534-24312-6 ↩