Deterministic acyclic finite state automaton

<h2 id="history">History</h2>
Blumer et al<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> first defined terminology Directed Acyclic Word Graph (DAWG) in 1983. Appel and Jacobsen<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a> used the same naming for a different data structure in 1988. Independent of earlier work, Daciuk et al<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a> rediscovered the latter data structure in 2000 but called it DAFSA.

<h2 id="comparison-to-tries">Comparison to tries</h2>
By allowing the same vertices to be reached by multiple paths, a DAFSA may use significantly fewer vertices than the strongly related <a href="/facts/Trie/cXNr0ScT">trie</a> data structure. Consider, for example, the four English words "tap", "taps", "top", and "tops". A trie for those four words would have 12 vertices, one for each of the strings formed as a prefix of one of these words, or for one of the words followed by the end-of-string marker. However, a DAFSA can represent these same four words using only six vertices vi for 0 ≤ i ≤ 5, and the following edges: an edge from v0 to v1 labeled "t", two edges from v1 to v2 labeled "a" and "o", an edge from v2 to v3 labeled "p", an edge v3 to v4 labeled "s", and edges from v3 and v4 to v5 labeled with the end-of-string marker. There is a tradeoff between memory and functionality, because a standard DAFSA can tell you if a word exists within it, but it cannot point you to auxiliary information about that word, whereas a trie can.
The primary difference between DAFSA and trie is the elimination of suffix and infix redundancy in storing strings. The trie eliminates prefix redundancy since all common prefixes are shared between strings, such as between doctors and doctorate the doctor prefix is shared. In a DAFSA common suffixes are also shared, for words that have the same set of possible suffixes as each other. For dictionary sets of common English words, this translates into major memory usage reduction.
Because the terminal nodes of a DAFSA can be reached by multiple paths, a DAFSA cannot directly store auxiliary information relating to each path, e.g. a word's frequency in the English language. However, if for each node we store the number of unique paths through that point in the structure, we can use it to retrieve the index of a word, or a word given its index.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a> The auxiliary information can then be stored in an array.

<ul><li>Blumer, A.; Blumer, J.; Haussler, D.; Ehrenfeucht, A.; Chen, M.T.; Seiferas, J. (1985), "The smallest automaton recognizing the subwords of a text", Theoretical Computer Science, 40: 31–55, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1016%2F0304-3975%2885%2990157-4">10.1016/0304-3975(85)90157-4</a></li>
<li>Appel, Andrew; Jacobsen, Guy (1988), <a href="https://www.cs.cmu.edu/afs/cs/academic/class/15451-s06/www/lectures/scrabble.pdf">"The World's Fastest Scrabble Program"</a> (PDF), Communications of the ACM, 31 (5): 572–578, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1145%2F42411.42420">10.1145/42411.42420</a>. One of the early mentions of the data structure.</li>
<li>Jansen, Cees J. A.; Boekee, Dick E. (1990), "On the significance of the directed acyclic word graph in cryptology", Advances in Cryptology – AUSCRYPT '90, <a href="/facts/Lecture_Notes_in_Computer_Science/1UeA77BC">Lecture Notes in Computer Science</a>, vol. 453, <a href="/facts/Springer-Verlag/nAesf6nT">Springer-Verlag</a>, pp. 318–326, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1007%2FBFb0030372">10.1007/BFb0030372</a>, <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 3-540-53000-2.</li>
<li>Epifanio, Chiara; Mignosi, Filippo; Shallit, Jeffrey; Venturini, Ilaria (2004), "Sturmian graphs and a conjecture of Moser", in Calude, Cristian S.; Calude, Elena; Dineen, Michael J. (eds.), Developments in language theory. Proceedings, 8th international conference (DLT 2004), Auckland, New Zealand, December 2004, Lecture Notes in Computer Science, vol. 3340, <a href="/facts/Springer-Verlag/nAesf6nT">Springer-Verlag</a>, pp. 175–187, <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 3-540-24014-4, <a href="/facts/Zbl_(identifier)/P6rFxKKx">Zbl</a> <a href="https://zbmath.org/?format=complete&q=an:1117.68454">1117.68454</a></li>
<li>Tresoldi, Tiago (2020), "DAFSA: a Python library for Deterministic Acyclic Finite State Automata", Journal of Open Source Software, 5 (46): 1986, <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/2020JOSS....5.1986T">2020JOSS....5.1986T</a>, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.21105%2Fjoss.01986">10.21105/joss.01986</a>, <a href="/facts/Hdl_(identifier)/rdebSxmC">hdl</a>:<a href="https://hdl.handle.net/21.11116%2F0000-0005-AD0D-B">21.11116/0000-0005-AD0D-B</a> An <a href="/facts/Open_source/7X9rCdz4">open source</a> <a href="/facts/Python_(programming_language)/YbuGqofa">Python</a> implementation.</li></ul>
<h2 id="external-links">External links</h2>

Wikimedia Commons has media related to Deterministic acyclic finite state automaton.

<ul><li>"<a href="http://pages.pathcom.com/~vadco/dawg.html">Directed Acyclic Word Graph or DAWG</a>" – JohnPaul Adamovsky teaches how to construct a DAFSA using an array of integers (<a href="https://web.archive.org/web/20220722224703/http://pages.pathcom.com/~vadco/dawg.html">Archived</a> 22 July 2022 at the <a href="/facts/Wayback_Machine/nmQ3a6JC">Wayback Machine</a>)</li>
<li>"<a href="http://pages.pathcom.com/~vadco/cwg.html">Caroline Word Graph or CWG</a>" – JohnPaul Adamovsky teaches how to construct a DAFSA hash function using a novel encoding with multiple integer arrays (<a href="https://web.archive.org/web/20220727180242/http://pages.pathcom.com/~vadco/cwg.html">Archived</a> 27 July 2022 at the <a href="/facts/Wayback_Machine/nmQ3a6JC">Wayback Machine</a>)</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Appel, Andrew; Jacobsen, Guy (1988). The World's Fastest Scrabble Program. Communications of the ACM, 31 (5): 572–578 <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Anselm Blumer, Janet Blumer, Andrzej Ehrenfeucht, David Haussler, Ross M. McConnell (1983). Linear size finite automata for the set of all subwords of a word - an outline of results. Bull Europ. Assoc. Theoret. Comput. Sci, 21: 12-20 <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Anselm Blumer, Janet Blumer, Andrzej Ehrenfeucht, David Haussler, Ross M. McConnell (1983). Linear size finite automata for the set of all subwords of a word - an outline of results. Bull Europ. Assoc. Theoret. Comput. Sci, 21: 12-20 <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Appel, Andrew; Jacobsen, Guy (1988). The World's Fastest Scrabble Program. Communications of the ACM, 31 (5): 572–578 <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Jan Daciuk, Stoyan Mihov, Bruce Watson and Richard Watson (2000). Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1):3-16. <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Kowaltowski, T.; CL Lucchesi (1993). "Applications of finite automata representing large vocabularies". Software-Practice and Experience. 1993: 15–30. CiteSeerX 10.1.1.56.5272. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
</ol>

Deterministic acyclic finite state automaton open-in-new

Deterministic acyclic finite state automaton