Automated Similarity Judgment Program

<h2 id="history">History</h2>
<h3>Original goals</h3>
<p>ASJP was originally developed as a means for objectively evaluating the similarity of words with the same meaning from different languages, with the ultimate goal of classifying languages computationally, based on the lexical similarities observed. In the first ASJP paper<a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a> two <a href="/facts/Semantics/1L4WwLAP">semantically</a> identical words from compared languages were judged similar if they showed at least two identical sound segments. Similarity between the two languages was calculated as a percentage of the total number of words compared that were judged as similar. This method was applied to 100-item word lists for 250 languages from <a href="/facts/Language_families/IpJv9o5S">language families</a> including <a href="/facts/Austroasiatic/htRcwIvn">Austroasiatic</a>, <a href="/facts/Indo-European_languages/L34TfgSr">Indo-European</a>, <a href="/facts/Mayan_languages/5J0QpcLa">Mayan</a>, and <a href="/facts/Muskogean/wwrc06Wg">Muskogean</a>.
</p>
<h3>ASJP Consortium</h3>
<p>The ASJP Consortium, founded around 2008,[<i>when?</i>] came to involve around 25 professional linguists and other interested parties working as volunteer transcribers and/or extending aid to the project in other ways. The main driving force behind the founding of the consortium was Cecil H. Brown. <a href="/facts/S%C3%B8ren_Wichmann/6rMSPgzr">Søren Wichmann</a> is daily curator of the project. A third central member of the consortium is Eric W. Holman, who has created most of the software used in the project.
</p>
<h3>Shorter word lists</h3>
<p>While word lists used were originally based on the 100-item <a href="/facts/Swadesh_list/f7vsCc2e">Swadesh list</a>, it was statistically determined that a subset of 40 of the 100 items produced just as good if not slightly better classificatory results than the whole list.<a class="footnote-ref" id="fnref:10" href="#fn:10"><sup>10</sup></a> So subsequently word lists gathered contain only 40 items (or less, when attestations for some are lacking).
</p>
<h3>Levenshtein distance</h3>
<p>In papers published since 2008, ASJP has employed a similarity judgment program based on <a href="/facts/Levenshtein_distance/F0otyNSl">Levenshtein distance</a> (LD). This approach was found to produce better classificatory results measured against expert opinion than the method used initially. LD is defined as the minimum number of successive changes necessary to convert one word into another, where each change is the insertion, deletion, or substitution of a symbol. Within the Levenshtein approach, differences in word length can be corrected for by dividing LD by the number of symbols of the longer of the two compared words. This produces normalized LD (LDN). An LDN divided (LDND) between the two languages is calculated by dividing the average LDN for all the word pairs involving the same meaning by the average LDN for all the word pairs involving different meanings. This second normalization is intended to correct for chance similarity.<a class="footnote-ref" id="fnref:11" href="#fn:11"><sup>11</sup></a>
</p>
<h2 id="word-list">Word list</h2>
<p>The ASJP uses the following 40-word list.<a class="footnote-ref" id="fnref:12" href="#fn:12"><sup>12</sup></a> It is similar to the <a href="/facts/Swadesh_list/f7vsCc2e">Swadesh–Yakhontov list</a>, but has some differences.
</p>

Body parts
<ul><li>eye</li>
<li>ear</li>
<li>nose</li>
<li>tongue</li>
<li>tooth</li>
<li>hand</li>
<li>knee</li>
<li>blood</li>
<li>bone</li>
<li>breast (woman’s)</li>
<li>liver</li>
<li>skin</li></ul>
Animals and plants
<ul><li>louse</li>
<li>dog</li>
<li>fish (noun)</li>
<li>horn (animal part)</li>
<li>tree</li>
<li>leaf</li></ul>
People
<ul><li>person</li>
<li>name (noun)</li></ul>
Nature
<ul><li>sun</li>
<li>star</li>
<li>water</li>
<li>fire</li>
<li>stone</li>
<li>path</li>
<li>mountain</li>
<li>night (dark time)</li></ul>
Verbs and adjectives
<ul><li>drink (verb)</li>
<li>die</li>
<li>see</li>
<li>hear</li>
<li>come</li>
<li>new</li>
<li>full</li></ul>
Numerals and pronouns
<ul><li>one</li>
<li>two</li>
<li>I</li>
<li>you</li>
<li>we</li></ul>

<h2 id="asjpcode">ASJPcode</h2>
<p>ASJP version from 2016 uses the following symbols to encode <a href="/facts/Phoneme/EXGHlJ7a">phonemes</a>: p b f v m w 8 t d s z c n r l S Z C j T 5 y k g x N q X h 7 L 4 G ! i e E 3 a u o
</p><p>They represent 7 vowels and 34 consonants, all found on the standard QWERTY keyboard.
</p>
Sounds represented by ASJPcode<a class="footnote-ref" id="fnref:13" href="#fn:13"><sup>13</sup></a><table><tbody><tr><th>ASJPcode</th><th>Description</th><th>IPA</th></tr><tr><td>i</td><td>high front vowel, rounded and unrounded</td><td>i, ɪ, y, ʏ</td></tr><tr><td>e</td><td>mid front vowel, rounded and unrounded</td><td>e, ø</td></tr><tr><td>E</td><td>low front vowel, rounded and unrounded</td><td>a, æ, ɛ, ɶ, œ, e</td></tr><tr><td>3</td><td>high and mid central vowel, rounded and unrounded</td><td>ɨ, ɘ, ə, ɜ, ʉ, ɵ, ɞ</td></tr><tr><td>a</td><td>low central vowel, unrounded</td><td>ɐ, ä</td></tr><tr><td>u</td><td>high back vowel, rounded and unrounded</td><td>ɯ, u, ʊ</td></tr><tr><td>o</td><td>mid and low back vowel, rounded and unrounded</td><td>ɤ, ʌ, ɑ, o, ɔ, ɒ</td></tr><tr><td>p</td><td>voiceless bilabial stop and fricative</td><td>p, ɸ</td></tr><tr><td>b</td><td>voiced bilabial stop and fricative</td><td>b, β</td></tr><tr><td>m</td><td>bilabial nasal</td><td>m</td></tr><tr><td>f</td><td>voiceless labiodental fricative</td><td>f</td></tr><tr><td>v</td><td>voiced labiodental fricative</td><td>v</td></tr><tr><td>8</td><td>voiceless and voiced dental fricative</td><td>θ, ð</td></tr><tr><td>4</td><td>dental nasal</td><td>n̪</td></tr><tr><td>t</td><td>voiceless alveolar stop</td><td>t</td></tr><tr><td>d</td><td>voiced alveolar stop</td><td>d</td></tr><tr><td>s</td><td>voiceless alveolar fricative</td><td>s</td></tr><tr><td>z</td><td>voiced alveolar fricative</td><td>z</td></tr><tr><td>c</td><td>voiceless and voiced alveolar affricate</td><td>t͡s, d͡z</td></tr><tr><td>n</td><td>voiceless and voiced alveolar nasal</td><td>n</td></tr><tr><td>S</td><td>voiceless postalveolar fricative</td><td>ʃ</td></tr><tr><td>Z</td><td>voiced postalveolar fricative</td><td>ʒ</td></tr><tr><td>C</td><td>voiceless palato-alveolar affricate</td><td>t͡ʃ</td></tr><tr><td>j</td><td>voiced palato-alveolar affricate</td><td>d͡ʒ</td></tr><tr><td>T</td><td>voiceless and voiced palatal stop</td><td>c, ɟ</td></tr><tr><td>5</td><td>palatal nasal</td><td>ɲ</td></tr><tr><td>k</td><td>voiceless velar stop</td><td>k</td></tr><tr><td>g</td><td>voiced velar stop</td><td>ɡ</td></tr><tr><td>x</td><td>voiceless and voiced velar fricative</td><td>x, ɣ</td></tr><tr><td>N</td><td>velar nasal</td><td>ŋ</td></tr><tr><td>q</td><td>voiceless uvular stop</td><td>q</td></tr><tr><td>G</td><td>voiced uvular stop</td><td>ɢ</td></tr><tr><td>X</td><td>voiceless and voiced uvular fricative, voiceless and voiced pharyngeal fricative</td><td>χ, ʁ, ħ, ʕ</td></tr><tr><td>7</td><td>voiceless glottal stop</td><td>ʔ</td></tr><tr><td>h</td><td>voiceless and voiced glottal fricative</td><td>h, ɦ</td></tr><tr><td>l</td><td>voiced alveolar lateral approximate</td><td>l</td></tr><tr><td>L</td><td>all other laterals</td><td>ʟ, ɭ, ʎ</td></tr><tr><td>w</td><td>voiced bilabial-velar approximant</td><td>w</td></tr><tr><td>y</td><td>palatal approximant</td><td>j</td></tr><tr><td>r</td><td>voiced apico-alveolar trill and all varieties of “r-sounds”</td><td>r, ʀ, etc.</td></tr><tr><td>!</td><td>all varieties of “click-sounds”</td><td>ǃ, ǀ, ǁ, ǂ</td></tr></tbody></table>
<p>A ~ mark follows two consonants so that they are considered to be in the same position.
Thus, kʷat becomes kw~at.
Syllables like kat, wat, kaw and kwi are considered lexically similar to kw~at.
</p><p>Similarly, a $ mark follows three consonants so that they are considered to be in the same position.
ndy$im is considered similar to nim, dam and yim.
</p><p>" marks the preceding consonant as <a href="/facts/Glottalized/216kqedT">glottalized</a>.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Historical_linguistics/nj5y6HDR">Historical linguistics</a></li>
<li><a href="/facts/Lexicostatistics/8QCqMrec">Lexicostatistics</a></li>
<li><a href="/facts/Lexibank/FpDduWdr">Lexibank</a></li></ul>

<h2 id="sources">Sources</h2>
<ul><li>Søren Wichmann, Jeff Good (eds). 2014. <a href="https://books.google.com/books?id=NJ6XCgAAQBAJ&dq=ASJP+code&pg=PA203">Quantifying Language Dynamics: On the Cutting edge of Areal and Phylogenetic Linguistics</a>, p. 203. Leiden: Brill.</li>
<li>Brown, Cecil H., et al. 2008. <a href="https://www.researchgate.net/profile/Soren_Wichmann/publication/40853551_Automated_Classification_of_the_World%27s_Languages_A_Description_of_the_Method_and_Preliminary_Results/links/546373360cf2837efdb30a6e/Automated-Classification-of-the-Worlds-Languages-A-Description-of-the-Method-and-Preliminary-Results.pdf">Automated Classification of the World's Languages: A Description of the Method and Preliminary Results</a>. Language Typology and Universals 61(4). November 2008. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1524%2Fstuf.2008.0026">10.1524/stuf.2008.0026</a></li>
<li>Wichmann, Søren, Eric W. Holman, and Cecil H. Brown (eds.). 2018. <a href="http://asjp.clld.org/">The ASJP Database</a> (version 18).</li></ul>
<h2 id="external-links">External links</h2>
<ul><li><a href="http://asjp.clld.org/">ASJP Database</a> official home page</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>"The ASJP Database -". asjp.clld.org. Retrieved February 15, 2024. <a href="https://asjp.clld.org/" target="_blank">https://asjp.clld.org/</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Brown, Cecil H; Holman, Eric W.; Wichmann, Søren; Velupillai, Viveka (2008). "Automated classification of the world's languages: A description of the method and preliminary results". STUF – Language Typology and Universals. <a href="https://www.researchgate.net/publication/40853551" target="_blank">https://www.researchgate.net/publication/40853551</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>"Automated dating of the world's language families based on lexical similarity" (PDF). pubman.mpdl.mpg.de. 2011. <a href="http://pubman.mpdl.mpg.de/pubman/item/escidoc:2395214/component/escidoc:2432001/shh768.pdf" target="_blank">http://pubman.mpdl.mpg.de/pubman/item/escidoc:2395214/component/escidoc:2432001/shh768.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>"Homelands of the world's language families: A quantitative approach". www.researchgate.net. 2010. <a href="https://www.researchgate.net/publication/300467626" target="_blank">https://www.researchgate.net/publication/300467626</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Wichmann, Søren; Holman, Eric W.; Brown, Cecil H. (April 2010). "Sound Symbolism in Basic Vocabulary". Entropy. 12 (4): 844–858. doi:10.3390/e12040844. ISSN 1099-4300. <a href="https://doi.org/10.3390%2Fe12040844" target="_blank">https://doi.org/10.3390%2Fe12040844</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Pompei, Simone; Loreto, Vittorio; Tria, Francesca (June 3, 2011). "On the Accuracy of Language Trees". PLOS ONE. 6 (6): e20109. arXiv:1103.4012. Bibcode:2011PLoSO...620109P. doi:10.1371/journal.pone.0020109. ISSN 1932-6203. PMC 3108590. PMID 21674034. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3108590" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3108590</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Cf. comments by Adelaar, Blust and Campbell in Holman, Eric W., et al. (2011) "Automated Dating of the World’s Language Families Based on Lexical Similarity." Current Anthropology, vol. 52, no. 6, pp. 841–875. <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>"Cross-Linguistic Linked Data". Retrieved February 22, 2020. <a href="http://clld.org" target="_blank">http://clld.org</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>Brown, Cecil H; Holman, Eric W.; Wichmann, Søren; Velupillai, Viveka (2008). "Automated classification of the world's languages: A description of the method and preliminary results". STUF – Language Typology and Universals. <a href="https://www.researchgate.net/publication/40853551" target="_blank">https://www.researchgate.net/publication/40853551</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
<li id="fn:10"><p>Holman, Eric W.; Wichmann, Søren; Brown, Cecil H.; Velupillai, Viveka; Müller, André; Bakker, Dik (2008). "Explorations in automated language classification". Folia Linguistica. <a href="https://www.researchgate.net/publication/40853552" target="_blank">https://www.researchgate.net/publication/40853552</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></p></li>
<li id="fn:11"><p>Wichmann, Søren, Eric W. Holman, Dik Bakker, and Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A 389: 3632-3639 (doi:10.1016/j.physa.2010.05.011). <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></p></li>
<li id="fn:12"><p>"Guidelines" (PDF). asjp.clld.org. <a href="http://asjp.clld.org/static/Guidelines.pdf" target="_blank">http://asjp.clld.org/static/Guidelines.pdf</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></p></li>
<li id="fn:13"><p>Brown, Cecil H; Holman, Eric W.; Wichmann, Søren; Velupillai, Viveka (2008). "Automated classification of the world's languages: A description of the method and preliminary results". STUF – Language Typology and Universals. <a href="https://www.researchgate.net/publication/40853551" target="_blank">https://www.researchgate.net/publication/40853551</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></p></li>
</ol>

Automated Similarity Judgment Program open-in-new

Automated Similarity Judgment Program