The concept of semantic similarity is more specific than semantic relatedness, as the latter includes concepts as antonymy and meronymy, while similarity does not. However, much of the literature uses these terms interchangeably, along with terms like semantic distance. In essence, semantic similarity, semantic distance, and semantic relatedness all mean, "How much does term A have to do with term B?" The answer to this question is usually a number between −1 and 1, or between 0 and 1, where 1 signifies extremely high similarity.
An intuitive way of visualizing the semantic similarity of terms is by grouping together terms which are closely related and spacing wider apart the ones which are distantly related. This is also common in practice for mind maps and concept maps.
A more direct way of visualizing the semantic similarity of two linguistic items can be seen with the Semantic Folding approach. In this approach a linguistic item such as a term or a text can be represented by generating a pixel for each of its active semantic features in e.g. a 128 x 128 grid. This allows for a direct visual comparison of the semantics of two items by comparing image representations of their respective feature sets.
Semantic similarity measures have been applied and developed in biomedical ontologies.
They are mainly used to compare genes and proteins based on the similarity of their functions rather than on their sequence similarity,
but they are also being extended to other bioentities, such as diseases.
There are essentially two types of approaches that calculate topological similarity between ontological concepts:
Researchers have collected datasets with similarity judgements on pairs of words, which are used to evaluate the cognitive plausibility of computational measures. The golden standard up to today is an old 65 word list where humans have judged the word similarity.
Harispe S.; Ranwez S.; Janaqi S.; Montmain J. (2015). "Semantic Similarity from Natural Language and Ontology Analysis". Synthesis Lectures on Human Language Technologies. 8 (1): 1–254. arXiv:1704.05295. doi:10.2200/S00639ED1V01Y201504HLT027. S2CID 17428739. /wiki/ArXiv_(identifier)
Feng Y.; Bagheri E.; Ensan F.; Jovanovic J. (2017). "The state of the art in semantic relatedness: a framework for comparison". Knowledge Engineering Review. 32: 1–30. doi:10.1017/S0269888917000029. S2CID 52172371. /wiki/Doi_(identifier)
A. Ballatore; M. Bertolotto; D.C. Wilson (2014). "An evaluative baseline for geo-semantic relatedness and similarity". GeoInformatica. 18 (4): 747–767. arXiv:1402.3371. Bibcode:2014GInfo..18..747B. doi:10.1007/s10707-013-0197-8. S2CID 17474023. /wiki/ArXiv_(identifier)
Budanitsky, Alexander; Hirst, Graeme (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures" (PDF). Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh. https://ftp.cs.toronto.edu/pub/gh/Budanitsky+Hirst-2001.pdf
Guzzi, Pietro Hiram; Mina, Marco; Cannataro, Mario; Guerra, Concettina (2012). "Semantic similarity analysis of protein data: assessment with biological features and issues". Briefings in Bioinformatics. 13 (5): 569–585. doi:10.1093/bib/bbr066. PMID 22138322. https://doi.org/10.1093%2Fbib%2Fbbr066
Benabderrahmane, Sidahmed; Smail Tabbone, Malika; Poch, Olivier; Napoli, Amedeo; Devignes, Marie-Domonique. (2010). "IntelliGO: a new vector-based semantic similarity measure including annotation origin". BMC Bioinformatics. 11: 588. doi:10.1186/1471-2105-11-588. PMC 3098105. PMID 21122125. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098105
Chicco, D; Masseroli, M (2015). "Software suite for gene and protein annotation prediction and similarity search". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 12 (4): 837–843. doi:10.1109/TCBB.2014.2382127. hdl:11311/959408. PMID 26357324. S2CID 14714823. https://doi.org/10.1109/TCBB.2014.2382127
Köhler, S; Schulz, MH; Krawitz, P; Bauer, S; Dolken, S; Ott, CE; Mundlos, C; Horn, D; et al. (2009). "Clinical diagnostics in human genetics with semantic similarity searches in ontologies". American Journal of Human Genetics. 85 (4): 457–64. doi:10.1016/j.ajhg.2009.09.003. PMC 2756558. PMID 19800049. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2756558
"ProteInOn". http://xldb.fc.ul.pt/biotools/proteinon/
"CMPSim". http://xldb.di.fc.ul.pt/biotools/cmpsim/
"CESSM". http://xldb.fc.ul.pt/biotools/cessm/
Janowicz, K.; Raubal, M.; Kuhn, W. (2011). "The semantics of similarity in geographic information retrieval". Journal of Spatial Information Science. 2 (2): 29–57. doi:10.5311/josis.2011.2.3. hdl:20.500.11850/41298. https://doi.org/10.5311%2Fjosis.2011.2.3
Algorithm, implementation and application of the SIM-DL similarity server. Second International Conference on Geospatial Semantics (GEOS 2007). Lecture Notes in Computer Science. 2007. pp. 128–145. CiteSeerX 10.1.1.172.5544. /wiki/CiteSeerX_(identifier)
"Geo-Net-PT Similarity Calculator". http://xldb.fc.ul.pt/wiki/Geographic_Similarity_calculator_GeoSSM
"Geo-Net-PT". http://xldb.fc.ul.pt/wiki/Geo-Net-PT_02_in_English
"OSM Semantic Network". OSM Wiki. https://wiki.openstreetmap.org/wiki/OSM_Semantic_Network
A. Ballatore; D.C. Wilson; M. Bertolotto. "Geographic Knowledge Extraction and Semantic Similarity in OpenStreetMap" (PDF). Knowledge and Information Systems: 61–81. http://irserver.ucd.ie/bitstream/handle/10197/3973/2012_-_Geographic_Knowledge_Extraction_and_Semantic_Similarity_in_OpenStreetMap_-_Ballatore_et_al.pdf?sequence=1
Budanitsky, Alexander; Hirst, Graeme (2001). "Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures" (PDF). Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh. https://ftp.cs.toronto.edu/pub/gh/Budanitsky+Hirst-2001.pdf
Kaur, I. & Hornof, A.J. (2005). "A comparison of LSA, wordNet and PMI-IR for predicting user click behavior". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 51–60. doi:10.1145/1054972.1054980. ISBN 978-1-58113-998-3. S2CID 14347026. 978-1-58113-998-3
Similarity-based Learning Methods for the Semantic Web (C. d'Amato, PhD Thesis) http://www.di.uniba.it/~cdamato/PhDThesis_dAmato.pdf
Gracia, J. & Mena, E. (2008). "Web-Based Measure of Semantic Relatedness" (PDF). Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE '08): 136–150. http://disi.unitn.it/~p2p/RelatedWork/Matching/Gracia_wise08.pdf
Raveendranathan, P. (2005). Identifying Sets of Related Words from the World Wide Web. Master of Science Thesis, University of Minnesota Duluth. http://www.d.umn.edu/~tpederse/Pubs/prath-thesis.pdf
Wubben, S. (2008). Using free link structure to calculate semantic relatedness. In ILK Research Group Technical Report Series, nr. 08-01, 2008. http://ilk.uvt.nl/~swubben/publications/wubben2008-techrep.pdf
Juvina, I., van Oostendorp, H., Karbor, P., & Pauw, B. (2005). Towards modeling contextual information in web navigation. In B. G. Bara & L. Barsalou & M. Bucciarelli (Eds.), 27th Annual Meeting of the Cognitive Science Society, CogSci2005 (pp. 1078–1083). Austin, Tx: The Cognitive Science Society, Inc. https://cloudfront.escholarship.org/dist/prd/content/qt0p7528tp/qt0p7528tp.pdf
Navigli, R., Lapata, M. (2007). Graph Connectivity Measures for Unsupervised Word Sense Disambiguation, Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, January 6–12th, 2007, pp. 1683–1688. http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-272.pdf
Pirolli, P. (2005). "Rational analyses of information foraging on the Web". Cognitive Science. 29 (3): 343–373. doi:10.1207/s15516709cog0000_20. PMID 21702778. https://doi.org/10.1207%2Fs15516709cog0000_20
Pirolli, P. & Fu, W.-T. (2003). "SNIF-ACT: A model of information foraging on the World Wide Web". Lecture Notes in Computer Science. Vol. 2702. pp. 45–54. CiteSeerX 10.1.1.6.1506. doi:10.1007/3-540-44963-9_8. ISBN 978-3-540-40381-4. 978-3-540-40381-4
Turney, P. (2001). Mining the Web for Synonyms: PMI versus LSA on TOEFL. In L. De Raedt & P. Flach (Eds.), Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491–502). Freiburg, Germany. https://arxiv.org/abs/cs/0212033
Reimers, Nils; Gurevych, Iryna (November 2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics. pp. 3982–3992. arXiv:1908.10084. doi:10.18653/v1/D19-1410. https://www.aclweb.org/anthology/D19-1410
Mueller, Jonas; Thyagarajan, Aditya (2016-03-05). "Siamese Recurrent Architectures for Learning Sentence Similarity". Thirtieth AAAI Conference on Artificial Intelligence. 30. doi:10.1609/aaai.v30i1.10350. S2CID 16657628. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12195
Kiros, Ryan; Zhu, Yukun; Salakhutdinov, Russ R; Zemel, Richard; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015), Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M. (eds.), "Skip-Thought Vectors" (PDF), Advances in Neural Information Processing Systems 28, Curran Associates, Inc., pp. 3294–3302, retrieved 2020-03-13 http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf
Cheatham, Michelle; Hitzler, Pascal (2013). "String Similarity Metrics for Ontology Alignment". In Alani, Harith; Kagal, Lalana; Fokoue, Achille; Groth, Paul; Biemann, Chris; Parreira, Josiane Xavier; Aroyo, Lora; Noy, Natasha; Welty, Chris (eds.). Advanced Information Systems Engineering. The Semantic Web – ISWC 2013. Lecture Notes in Computer Science. Vol. 7908. Berlin, Heidelberg: Springer. pp. 294–309. doi:10.1007/978-3-642-41338-4_19. ISBN 978-3-642-41338-4. S2CID 18372966. 978-3-642-41338-4
Sousa, G., Lima, R., & Trojahn, C. (2022). An eye on representation learning in ontology matching. OM@ISWC.
Sousa, G., Lima, R., & Trojahn, C. (2022). An eye on representation learning in ontology matching. OM@ISWC.
Pekar, Viktor; Staab, Steffen (2002). Taxonomy learning. Proceedings of the 19th international conference on Computational linguistics –. Vol. 1. pp. 1–7. doi:10.3115/1072228.1072318. /wiki/Doi_(identifier)
Cheng, J; Cline, M; Martin, J; Finkelstein, D; Awad, T; Kulp, D; Siani-Rose, MA (2004). "A knowledge-based clustering algorithm driven by Gene Ontology". Journal of Biopharmaceutical Statistics. 14 (3): 687–700. doi:10.1081/BIP-200025659. PMID 15468759. S2CID 25224811. /wiki/Doi_(identifier)
Wu, H; Su, Z; Mao, F; Olman, V; Xu, Y (2005). "Prediction of functional modules based on comparative genome analysis and Gene Ontology application". Nucleic Acids Research. 33 (9): 2822–37. doi:10.1093/nar/gki573. PMC 1130488. PMID 15901854. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1130488
Del Pozo, Angela; Pazos, Florencio; Valencia, Alfonso (2008). "Defining functional distances over Gene Ontology". BMC Bioinformatics. 9: 50. doi:10.1186/1471-2105-9-50. PMC 2375122. PMID 18221506. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375122
Benabderrahmane, Sidahmed; Smail Tabbone, Malika; Poch, Olivier; Napoli, Amedeo; Devignes, Marie-Domonique. (2010). "IntelliGO: a new vector-based semantic similarity measure including annotation origin". BMC Bioinformatics. 11: 588. doi:10.1186/1471-2105-11-588. PMC 3098105. PMID 21122125. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098105
Philip Resnik (1995). Chris S. Mellish (ed.). "Using information content to evaluate semantic similarity in a taxonomy". Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95). 1: 448–453. arXiv:cmp-lg/9511007. Bibcode:1995cmp.lg...11007R. CiteSeerX 10.1.1.41.6956. /wiki/ArXiv_(identifier)
Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML '98), Jude W. Shavlik (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 296–304 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.1832&rep=rep1&type=pdf
Ana Gabriela Maguitman, Filippo Menczer, Heather Roinestad, Alessandro Vespignani: Algorithmic detection of semantic similarity. WWW 2005: 107–116 http://wwwconference.org/proceedings/www2005/docs/p107.pdf
J. J. Jiang and D. W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference on Research
on Computational Linguistics (ROCLING X), pages 9008+, September 1997 https://arxiv.org/abs/cmp-lg/9709008
M. T. Pilehvar, D. Jurgens and R. Navigli. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.. Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 4–9, 2013, pp. 1341–1351. http://wwwusers.di.uniroma1.it/~navigli/pubs/ACL_2013_Pilehvar_Jurgens_Navigli.pdf
Dong, Hai (2009). "A Hybrid Concept Similarity Measure Model for Ontology Environment". On the Move to Meaningful Internet Systems: OTM 2009 Workshops. Lecture Notes in Computer Science. Vol. 5872. pp. 848–857. Bibcode:2009LNCS.5872..848D. doi:10.1007/978-3-642-05290-3_103. ISBN 978-3-642-05289-7. 978-3-642-05289-7
Dong, Hai (2011). "A context-aware semantic similarity model for ontology environments". Concurrency and Computation: Practice and Experience. 23 (2): 505–524. doi:10.1002/cpe.1652. S2CID 412845. https://www.researchgate.net/publication/220105255
Landauer, T. K.; Dumais, S. T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge" (PDF). Psychological Review. 104 (2): 211–240. CiteSeerX 10.1.1.184.4759. doi:10.1037/0033-295x.104.2.211. S2CID 1144461. http://www.stat.cmu.edu/%7Ecshalizi/350/2008/readings/Landauer-Dumais.pdf
Landauer, T. K.; Foltz, P. W. & Laham, D. (1998). "Introduction to Latent Semantic Analysis" (PDF). Discourse Processes. 25 (2–3): 259–284. CiteSeerX 10.1.1.125.109. doi:10.1080/01638539809545028. S2CID 16625196. http://lsa.colorado.edu/papers/dp1.LSAintro.pdf
"Google Similarity Distance". http://iknowate.blogspot.com/2011/10/google-similarity-distance.html
Carrillo, F.; Cecchi, G. A.; Sigman, M.; Slezak, D. F. (2015). "Fast Distributed Dynamics of Semantic Networks via Social Media" (PDF). Computational Intelligence and Neuroscience. 2015: 712835. doi:10.1155/2015/712835. PMC 4449913. PMID 26074953. http://downloads.hindawi.com/journals/cin/2015/712835.pdf
"Samer Hassan" (PDF).[dead link] http://www.samerhassan.com/images/4/48/Hassan.pdf
Wilson Wong; Wei Liu; Mohammed Bennamoun (November 2006). Featureless similarities for terms clustering using tree-traversing ants. PCAR '06: Proceedings of the 2006 international symposium on Practical cognitive agents and robots. pp. 177–191. doi:10.1145/1232425.1232448. http://doi.acm.org/10.1145/1232425.1232448
"6 Degrees of Wikipedia". The Chronicle of Higher Education. The Wired Campus. May 28, 2008. Archived from the original on May 30, 2008. https://web.archive.org/web/20080530043310/http://chronicle.com/wiredcampus/article/3041/six-degrees-of-wikipedia
V. D. Veksler; Ryan Z. Govostes (2008). "Defining the Dimensions of the Human Semantic Space" (PDF). https://raw.githubusercontent.com/lyoshenka/papers/master/pp718-veksler.pdf
J. Camacho-Collados; M. T. Pilehvar; R. Navigli (2015). NASARI: a Novel Approach to a Semantically-Aware Representation of Items (PDF). Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL 2015). Denver, US. pp. 567–577. http://aclweb.org/anthology/N/N15/N15-1059.pdf
J. Camacho-Collados; M. T. Pilehvar; R. Navigli (July 27–29, 2015). A Unified Multilingual Semantic Representation of Concepts (PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015). Beijing, China. pp. 741–751. http://aclweb.org/anthology/P/P15/P15-1072.pdf
Fähndrich J.; Weber S.; Ahrndt S. (2016). "Design and Use of a Semantic Similarity Measure for Interoperability Among Agents". In Klusch M.; Unland R.; Shehory O.; Pokahr A.; Ahrndt S. (eds.). Multiagent System Technologies. MATES 2016. Lecture Notes in Computer Science. Vol. 9872. Springer. Available at author version http://www.fähndrich.de
C. d'Amato; S. Staab; N. Fanizzi (2008). "On the influence of description logics ontologies on conceptual similarity". Knowledge Engineering: Practice and Patterns. pp. 48–63. doi:10.1007/978-3-540-87696-0_7. /wiki/Doi_(identifier)
Bendeck, F. (2008). WSM-P Workflow Semantic Matching Platform, PhD dissertation, University of Trier, Germany. Verlag Dr. Hut. ASIN 3899638549. /wiki/Amazon_Standard_Identification_Number
Rubenstein, Herbert, and John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.893.7406&rep=rep1&type=pdf
For a list of datasets, and an overview of the state of the art see https://www.aclweb.org/. https://www.aclweb.org/aclwiki/index.php?title=Similarity_(State_of_the_art)
Rubenstein, Herbert; Goodenough, John B. (1965-10-01). "Contextual correlates of synonymy". Communications of the ACM. 8 (10): 627–633. doi:10.1145/365628.365657. S2CID 18309234. https://doi.org/10.1145%2F365628.365657
Miller, George A.; Charles, Walter G. (1991-01-01). "Contextual correlates of semantic similarity". Language and Cognitive Processes. 6 (1): 1–28. doi:10.1080/01690969108406936. ISSN 0169-0965. /wiki/Doi_(identifier)
"Placing search in context". ACM Transactions on Information Systems. 20: 116–131. 2002-01-01. CiteSeerX 10.1.1.29.1912. doi:10.1145/503104.503110. S2CID 12956853. /wiki/CiteSeerX_(identifier)