A large scale evaluation has been conducted by Google in 20062 to compare the performance of Minhash and Simhash3 algorithms. In 2007 Google reported using Simhash for duplicate detection for web crawling4 and using Minhash and LSH for Google News personalization.5
Cyphers, Bennett (2021-03-03). "Google's FLoC Is a Terrible Idea". Electronic Frontier Foundation. Retrieved 2021-04-13. https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea ↩
Henzinger, Monika (2006), "Finding near-duplicate web pages: a large-scale evaluation of algorithms", Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 284, doi:10.1145/1148170.1148222, ISBN 978-1595933690, S2CID 207160068. 978-1595933690 ↩
Charikar, Moses S. (2002), "Similarity estimation techniques from rounding algorithms", Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 380–388, doi:10.1145/509907.509965, ISBN 978-1581134957, S2CID 4229473. 978-1581134957 ↩
Gurmeet Singh, Manku; Jain, Arvind; Das Sarma, Anish (2007), "Detecting near-duplicates for web crawling", Proceedings of the 16th International Conference on World Wide Web (PDF), p. 141, doi:10.1145/1242572.1242592, ISBN 9781595936547. 9781595936547 ↩
Das, Abhinandan S.; Datar, Mayur; Garg, Ashutosh; Rajaram, Shyam; et al. (2007), "Google news personalization: scalable online collaborative filtering", Proceedings of the 16th International Conference on World Wide Web, p. 271, doi:10.1145/1242572.1242610, ISBN 9781595936547, S2CID 207163129. 9781595936547 ↩