In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of information retrieval.
History
The term statistical semantics was first used by Warren Weaver in his well-known paper on machine translation.1 He argued that word-sense disambiguation for machine translation should be based on the co-occurrence frequency of the context words near a given target word. The underlying assumption that "a word is characterized by the company it keeps" was advocated by J. R. Firth.2 This assumption is known in linguistics as the distributional hypothesis.3 Emile Delavenay defined statistical semantics as the "statistical study of the meanings of words and their frequency and order of recurrence".4 "Furnas et al. 1983" is frequently cited as a foundational contribution to statistical semantics.5 An early success in the field was latent semantic analysis.
Applications
Research in statistical semantics has resulted in a wide variety of algorithms that use the distributional hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora:
- Measuring the similarity in word meanings6789
- Measuring the similarity in word relations 10
- Modeling similarity-based generalization11
- Discovering words with a given relation12
- Classifying relations between words13
- Extracting keywords from documents1415
- Measuring the cohesiveness of text16
- Discovering the different senses of words17
- Distinguishing the different senses of words18
- Subcognitive aspects of words19
- Distinguishing praise from criticism20
Related fields
Statistical semantics focuses on the meanings of common words and the relations between common words, unlike text mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations). Statistical semantics is a subfield of computational semantics, which is in turn a subfield of computational linguistics and natural language processing.
Many of the applications of statistical semantics (listed above) can also be addressed by lexicon-based algorithms, instead of the corpus-based algorithms of statistical semantics. One advantage of corpus-based algorithms is that they are typically not as labour-intensive as lexicon-based algorithms. Another advantage is that they are usually easier to adapt to new languages or noisier new text types from e.g. social media than lexicon-based algorithms are.21 However, the best performance on an application is often achieved by combining the two approaches.22
See also
- Linguistics portal
- Co-occurrence
- Computational linguistics
- Information retrieval
- Latent semantic analysis
- Latent semantic indexing
- Semantic analytics
- Semantic similarity
- Statistical natural language processing
- Text corpus
- Text mining
- Web mining
Sources
- Delavenay, Emile (1960). An Introduction to Machine Translation. New York, NY: Thames and Hudson. OCLC 1001646.
- Firth, John R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic Analysis. Oxford: Philological Society: 1–32. Reprinted in Palmer, F.R., ed. (1968). Selected Papers of J.R. Firth 1952-1959. London: Longman. OCLC 123573912.
- Frank, Eibe; Paynter, Gordon W.; Witten, Ian H.; Gutwin, Carl; Nevill-Manning, Craig G. (1999). "Domain-specific keyphrase extraction". Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. IJCAI-99. Vol. 2. California: Morgan Kaufmann. pp. 668–673. CiteSeerX 10.1.1.148.3598. ISBN 1-55860-613-0.
- Furnas, George W.; Landauer, T. K.; Gomez, L. M.; Dumais, S. T. (1983). "Statistical semantics: Analysis of the potential performance of keyword information systems" (PDF). Bell System Technical Journal. 62 (6): 1753–1806. doi:10.1002/j.1538-7305.1983.tb03513.x. S2CID 22483184. Archived from the original (PDF) on 2016-03-04. Retrieved 2012-07-12.
- Hearst, Marti A. (1992). "Automatic Acquisition of Hyponyms from Large Text Corpora" (PDF). Proceedings of the Fourteenth International Conference on Computational Linguistics. COLING '92. Nantes, France. pp. 539–545. CiteSeerX 10.1.1.36.701. doi:10.3115/992133.992154. Archived from the original (PDF) on 2012-05-22. Retrieved 2012-07-12.
- Landauer, Thomas K.; Dumais, Susan T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge". Psychological Review. 104 (2): 211–240. CiteSeerX 10.1.1.184.4759. doi:10.1037/0033-295x.104.2.211. S2CID 1144461.
- Lund, Kevin; Burgess, Curt; Atchley, Ruth Ann (1995). "Semantic and associative priming in high-dimensional semantic space" (PDF). Proceedings of the 17th Annual Conference of the Cognitive Science Society. Cognitive Science Society. pp. 660–665.[permanent dead link]
- McDonald, Scott; Ramscar, Michael (2001). "Testing the distributional hypothesis: The influence of context on judgements of semantic similarity". Proceedings of the 23rd Annual Conference of the Cognitive Science Society. pp. 611–616. CiteSeerX 10.1.1.104.7535.
- Pantel, Patrick; Lin, Dekang (2002). "Discovering word senses from text". Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD '02. pp. 613–619. CiteSeerX 10.1.1.12.6771. doi:10.1145/775047.775138. ISBN 1-58113-567-X.
- Sahlgren, Magnus (2008). "The Distributional Hypothesis" (PDF). Rivista di Linguistica. 20 (1): 33–53. Archived from the original (PDF) on 2012-03-15. Retrieved 2012-11-20.
- Sahlgren, Magnus; Karlgren, Jussi (2009). Terminology mining in social media. CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management. doi:10.1145/1645953.1646006.
- Terra, Egidio L.; Clarke, Charles L. A. (2003). "Frequency estimates for statistical word similarity measures" (PDF). Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003. HLT/NAACL 2003. pp. 244–251. CiteSeerX 10.1.1.12.9041. doi:10.3115/1073445.1073477. Archived from the original (PDF) on 2013-11-03. Retrieved 2012-07-12.
- Turney, Peter D. (May 2000). "Learning algorithms for keyphrase extraction". Information Retrieval. 2 (4): 303–336. arXiv:cs/0212020. CiteSeerX 10.1.1.11.1829. doi:10.1023/A:1009976227802. S2CID 7007323.
- Turney, Peter D. (2001). "Answering subcognitive Turing Test questions: A reply to French". Journal of Experimental and Theoretical Artificial Intelligence. 13 (4): 409–419. arXiv:cs/0212015. CiteSeerX 10.1.1.12.8734. doi:10.1080/09528130110100270. S2CID 59099.
- Turney, Peter D. (2003). "Coherent keyphrase extraction via Web mining". Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. IJCAI-03. Acapulco, Mexico. pp. 434–439. arXiv:cs/0308033. Bibcode:2003cs........8033T. CiteSeerX 10.1.1.100.3751.
- Turney, Peter D. (2004). "Word sense disambiguation by Web mining for word co-occurrence probabilities". Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. SENSEVAL-3. Barcelona, Spain. pp. 239–242. arXiv:cs/0407065. Bibcode:2004cs........7065T.
- Turney, Peter D. (2006). "Similarity of semantic relations". Computational Linguistics. 32 (3): 379–416. arXiv:cs/0608100. Bibcode:2006cs........8100T. CiteSeerX 10.1.1.75.8007. doi:10.1162/coli.2006.32.3.379. S2CID 2468783.
- Turney, Peter D.; Littman, Michael L. (October 2003). "Measuring praise and criticism: Inference of semantic orientation from association". ACM Transactions on Information Systems. 21 (4): 315–346. arXiv:cs/0309034. Bibcode:2003cs........9034T. CiteSeerX 10.1.1.9.6425. doi:10.1145/944012.944013. S2CID 2024.
- Turney, Peter D.; Littman, Michael L. (2005). "Corpus-based Learning of Analogies and Semantic Relations". Machine Learning. 60 (1–3): 251–278. arXiv:cs/0508103. Bibcode:2005cs........8103T. CiteSeerX 10.1.1.90.9819. doi:10.1007/s10994-005-0913-1. S2CID 9322367.
- Turney, Peter D.; Littman, Michael L.; Bigham, Jeffrey; Shnayder, Victor (2003). "Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems". Proceedings of the International Conference on Recent Advances in Natural Language Processing. RANLP-03. Borovets, Bulgaria. pp. 482–489. arXiv:cs/0309035. Bibcode:2003cs........9035T. CiteSeerX 10.1.1.5.2939.
- Weaver, Warren (1955). "Translation" (PDF). In Locke, W.N.; Booth, D.A. (eds.). Machine Translation of Languages. Cambridge, Massachusetts: MIT Press. pp. 15–23. ISBN 0-8371-8434-7. Archived from the original (PDF) on 2019-01-29. Retrieved 2012-07-12. {{cite book}}: ISBN / Date incompatibility (help)
- Yarlett, Daniel G. (2008). Language Learning Through Similarity-Based Generalization (PDF) (PhD thesis). Stanford University. Archived from the original (PDF) on 2014-04-19.
References
Weaver 1955 - Weaver, Warren (1955). "Translation" (PDF). In Locke, W.N.; Booth, D.A. (eds.). Machine Translation of Languages. Cambridge, Massachusetts: MIT Press. pp. 15–23. ISBN 0-8371-8434-7. Archived from the original (PDF) on 2019-01-29. Retrieved 2012-07-12. https://web.archive.org/web/20190129025829/http://www.mt-archive.info/Weaver-1949.pdf ↩
Firth 1957 - Firth, John R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic Analysis. Oxford: Philological Society: 1–32. ↩
Sahlgren 2008 - Sahlgren, Magnus (2008). "The Distributional Hypothesis" (PDF). Rivista di Linguistica. 20 (1): 33–53. Archived from the original (PDF) on 2012-03-15. Retrieved 2012-11-20. https://web.archive.org/web/20120315233953/http://soda.swedish-ict.se/3941/1/sahlgren.distr-hypo.pdf ↩
Delavenay 1960 - Delavenay, Emile (1960). An Introduction to Machine Translation. New York, NY: Thames and Hudson. OCLC 1001646. https://search.worldcat.org/oclc/1001646 ↩
Furnas et al. 1983 - Furnas, George W.; Landauer, T. K.; Gomez, L. M.; Dumais, S. T. (1983). "Statistical semantics: Analysis of the potential performance of keyword information systems" (PDF). Bell System Technical Journal. 62 (6): 1753–1806. doi:10.1002/j.1538-7305.1983.tb03513.x. S2CID 22483184. Archived from the original (PDF) on 2016-03-04. Retrieved 2012-07-12. https://web.archive.org/web/20160304093738/http://furnas.people.si.umich.edu/Papers/FurnasEtAl1983_BSTJ_p1753.pdf ↩
Lund, Burgess & Atchley 1995 - Lund, Kevin; Burgess, Curt; Atchley, Ruth Ann (1995). "Semantic and associative priming in high-dimensional semantic space" (PDF). Proceedings of the 17th Annual Conference of the Cognitive Science Society. Cognitive Science Society. pp. 660–665. http://locutus.ucr.edu/reprintPDFs/lba95csp.pdf ↩
Landauer & Dumais 1997 - Landauer, Thomas K.; Dumais, Susan T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge". Psychological Review. 104 (2): 211–240. CiteSeerX 10.1.1.184.4759. doi:10.1037/0033-295x.104.2.211. S2CID 1144461. http://lsa.colorado.edu/papers/plato/plato.annote.html ↩
McDonald & Ramscar 2001 - McDonald, Scott; Ramscar, Michael (2001). "Testing the distributional hypothesis: The influence of context on judgements of semantic similarity". Proceedings of the 23rd Annual Conference of the Cognitive Science Society. pp. 611–616. CiteSeerX 10.1.1.104.7535. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7535 ↩
Terra & Clarke 2003 - Terra, Egidio L.; Clarke, Charles L. A. (2003). "Frequency estimates for statistical word similarity measures" (PDF). Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003. HLT/NAACL 2003. pp. 244–251. CiteSeerX 10.1.1.12.9041. doi:10.3115/1073445.1073477. Archived from the original (PDF) on 2013-11-03. Retrieved 2012-07-12. https://web.archive.org/web/20131103095022/http://acl.ldc.upenn.edu/N/N03/N03-1032.pdf ↩
Turney 2006 - Turney, Peter D. (2006). "Similarity of semantic relations". Computational Linguistics. 32 (3): 379–416. arXiv:cs/0608100. Bibcode:2006cs........8100T. CiteSeerX 10.1.1.75.8007. doi:10.1162/coli.2006.32.3.379. S2CID 2468783. http://cogprints.org/5098/ ↩
Yarlett 2008 - Yarlett, Daniel G. (2008). Language Learning Through Similarity-Based Generalization (PDF) (PhD thesis). Stanford University. Archived from the original (PDF) on 2014-04-19. https://web.archive.org/web/20140419012951/http://psych.stanford.edu/~michael/papers/Draft_Yarlett_Similarity.pdf ↩
Hearst 1992 - Hearst, Marti A. (1992). "Automatic Acquisition of Hyponyms from Large Text Corpora" (PDF). Proceedings of the Fourteenth International Conference on Computational Linguistics. COLING '92. Nantes, France. pp. 539–545. CiteSeerX 10.1.1.36.701. doi:10.3115/992133.992154. Archived from the original (PDF) on 2012-05-22. Retrieved 2012-07-12. https://web.archive.org/web/20120522165806/http://acl.ldc.upenn.edu/C/C92/C92-2082.pdf ↩
Turney & Littman 2005 - Turney, Peter D.; Littman, Michael L. (2005). "Corpus-based Learning of Analogies and Semantic Relations". Machine Learning. 60 (1–3): 251–278. arXiv:cs/0508103. Bibcode:2005cs........8103T. CiteSeerX 10.1.1.90.9819. doi:10.1007/s10994-005-0913-1. S2CID 9322367. http://cogprints.org/4518/ ↩
Frank et al. 1999 - Frank, Eibe; Paynter, Gordon W.; Witten, Ian H.; Gutwin, Carl; Nevill-Manning, Craig G. (1999). "Domain-specific keyphrase extraction". Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. IJCAI-99. Vol. 2. California: Morgan Kaufmann. pp. 668–673. CiteSeerX 10.1.1.148.3598. ISBN 1-55860-613-0. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.3598 ↩
Turney 2000 - Turney, Peter D. (May 2000). "Learning algorithms for keyphrase extraction". Information Retrieval. 2 (4): 303–336. arXiv:cs/0212020. CiteSeerX 10.1.1.11.1829. doi:10.1023/A:1009976227802. S2CID 7007323. https://arxiv.org/abs/cs/0212020 ↩
Turney 2003 - Turney, Peter D. (2003). "Coherent keyphrase extraction via Web mining". Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. IJCAI-03. Acapulco, Mexico. pp. 434–439. arXiv:cs/0308033. Bibcode:2003cs........8033T. CiteSeerX 10.1.1.100.3751. https://arxiv.org/abs/cs/0308033 ↩
Pantel & Lin 2002 - Pantel, Patrick; Lin, Dekang (2002). "Discovering word senses from text". Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD '02. pp. 613–619. CiteSeerX 10.1.1.12.6771. doi:10.1145/775047.775138. ISBN 1-58113-567-X. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6771 ↩
Turney 2004 - Turney, Peter D. (2004). "Word sense disambiguation by Web mining for word co-occurrence probabilities". Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. SENSEVAL-3. Barcelona, Spain. pp. 239–242. arXiv:cs/0407065. Bibcode:2004cs........7065T. http://cogprints.org/3732/ ↩
Turney 2001 - Turney, Peter D. (2001). "Answering subcognitive Turing Test questions: A reply to French". Journal of Experimental and Theoretical Artificial Intelligence. 13 (4): 409–419. arXiv:cs/0212015. CiteSeerX 10.1.1.12.8734. doi:10.1080/09528130110100270. S2CID 59099. https://arxiv.org/abs/cs/0212015 ↩
Turney & Littman 2003 - Turney, Peter D.; Littman, Michael L. (October 2003). "Measuring praise and criticism: Inference of semantic orientation from association". ACM Transactions on Information Systems. 21 (4): 315–346. arXiv:cs/0309034. Bibcode:2003cs........9034T. CiteSeerX 10.1.1.9.6425. doi:10.1145/944012.944013. S2CID 2024. http://cogprints.org/3164/ ↩
Sahlgren & Karlgren 2009 - Sahlgren, Magnus; Karlgren, Jussi (2009). Terminology mining in social media. CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management. doi:10.1145/1645953.1646006. https://dl.acm.org/doi/pdf/10.1145/1645953.1646006 ↩
Turney et al. 2003 - Turney, Peter D.; Littman, Michael L.; Bigham, Jeffrey; Shnayder, Victor (2003). "Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems". Proceedings of the International Conference on Recent Advances in Natural Language Processing. RANLP-03. Borovets, Bulgaria. pp. 482–489. arXiv:cs/0309035. Bibcode:2003cs........9035T. CiteSeerX 10.1.1.5.2939. http://cogprints.org/3163/ ↩