The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3,2 ICF3 and LSBo.4 On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENN5 and LSSm.6 In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo.
There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS,7 CDIS8 and XLDIS9 select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF.
Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA,10 PSDSP11 and PSSP.12 The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP13 adopts a similar approach for selecting actual instances (instead of prototypes).
S. García, J. Luengo, and F. Herrera, Data preprocessing in data mining. Springer, 2015. ↩
D. R. Wilson and T. R. Martinez, Reduction techniques for instance-based learning algorithms, Machine learning, vol. 38, no. 3, pp. 257–286, 2000. ↩
H. Brighton and C. Mellish, Advances in instance selection for instance-based learning algorithms, Data mining and knowledge discovery, vol. 6, no. 2, pp. 153–172, 2002. ↩
E. Leyva, A. González, and R. Pérez, Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective, Pattern Recognition, vol. 48, no. 4, pp. 1523–1537, 2015. ↩
D. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” Systems, Man and Cybernetics, IEEE Transactions on, no. 3, pp. 408–421, 1972. ↩
Carbonera, Joel Luis, and Mara Abel. A density-based approach for instance selection. IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), 2015. ↩
Carbonera, Joel Luis, and Mara Abel. A novel density-based approach for instance selection. IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), 2016. ↩
Carbonera, Joel Luís (2017), "An Efficient Approach for Instance Selection", Big Data Analytics and Knowledge Discovery, Lecture Notes in Computer Science, vol. 10440, Springer International Publishing, pp. 228–243, doi:10.1007/978-3-319-64283-3_17, ISBN 9783319642826 9783319642826 ↩
Carbonera, Joel Luís; Abel, Mara (2018), "An Efficient Prototype Selection Algorithm Based on Spatial Abstraction", Big Data Analytics and Knowledge Discovery, Springer International Publishing, pp. 177–192, doi:10.1007/978-3-319-98539-8_14, ISBN 9783319985381 9783319985381 ↩
Carbonera, Joel Luís; Abel, Mara (2018), "An Efficient Prototype Selection Algorithm Based on Dense Spatial Partitions", Artificial Intelligence and Soft Computing, Springer International Publishing, pp. 288–300, doi:10.1007/978-3-319-91262-2_26, ISBN 9783319912615 9783319912615 ↩
Carbonera, Joel Luis; Abel, Mara (November 2017). "Efficient Prototype Selection Supported by Subspace Partitions". 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE. pp. 921–928. doi:10.1109/ictai.2017.00142. ISBN 9781538638767. S2CID 46955571. 9781538638767 ↩