Both a must-link and a cannot-link constraint define a relationship between two data instances. Together, the sets of these constraints act as a guide for which a constrained clustering algorithm will attempt to find chunklets (clusters in the dataset which satisfy the specified constraints).
Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions.1
Examples of constrained clustering algorithms include:
Pourrajabi, M.; Moulavi, D.; Campello, R. J. G. B.; Zimek, A.; Sander, J.; Goebel, R. (2014). "Model Selection for Semi-Supervised Clustering". Proceedings of the 17th International Conference on Extending Database Technology (EDBT). pp. 331–342. doi:10.5441/002/edbt.2014.31. /wiki/Doi_(identifier) ↩
Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. (2001). "Constrained K-means Clustering with Background Knowledge". Proceedings of the Eighteenth International Conference on Machine Learning. pp. 577–584. /wiki/Kiri_Wagstaff ↩
Basu, Sugato; Banerjee, Arindam; Mooney, Raymond J. (April 2004). Active Semi-Supervision for Pairwise Constrained Clustering (PDF). Proceedings of the 2004 SIAM International Conference on Data Mining. pp. 333–344. /wiki/Raymond_J._Mooney ↩
de Amorim, R. C. (2012). "Constrained Clustering with Minkowski Weighted K-Means". Proceedings of the 13th IEEE International Symposium on Computational Intelligence and Informatics. pp. 13–17. doi:10.1109/CINTI.2012.6496753. /wiki/Doi_(identifier) ↩