TRE (computing)

<h2 id="features">Features</h2>
<p>TRE uses <a href="/facts/Regular_expression/KFL3veHX">extended regular expression</a> syntax with the addition of "directions" for matching preceding fragment in approximate way. Each of such directions specifies how many typos are allowed for this fragment.
</p><p>Approximate matching<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a> is performed in a way similar to <a href="/facts/Levenshtein_distance/F0otyNSl">Levenshtein distance</a>, which means that there are three types of typos 'recognized':<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a>
</p>
<table><tbody><tr><th>Typo</th><th>Example</th><th>Data</th></tr><tr><td>insertion of an extra character</td><td>regullar experession</td><td>extra l, extra e</td></tr><tr><td>missing a character from pattern</td><td>reglar expession</td><td>missing u, missing r</td></tr><tr><td>replacement of some character</td><td>regolar exprezsion</td><td>u → o, s → z</td></tr></tbody></table>
<p>TRE allows specifying of <i>cost</i> for each of three typos type independently.
</p><p>The project comes with a command-line utility, a reimplementation of <a href="/facts/Agrep/QWARW3jS">agrep</a>.
</p><p>Though approximate matching requires some syntax extension, when this feature is not used, TRE works like most of other regular expression matching engines. This means that
</p>
<ul><li>it implements ordinary regular expressions written for strict matching;<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a><a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a></li>
<li>programmers familiar with <a href="/facts/Regular_expression/KFL3veHX">POSIX-style</a> regular expressions<a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a> need not do much study to be able to use TRE.<a class="footnote-ref" id="fnref:10" href="#fn:10"><sup>10</sup></a></li></ul>
<h3>Predictable time and memory consumption</h3>
<p>The library's author states<a class="footnote-ref" id="fnref:11" href="#fn:11"><sup>11</sup></a> that time spent for matching grows linearly with increasing of input text length, while memory requirement is constant during matching and does not depend on the input, only on the pattern.
</p>
<h3>Other</h3>
<p>Other features, common for most regular expression engines could be checked in <a href="/facts/Comparison_of_regular_expression_engines/lMuelcvC">regex engines comparison tables</a> or in list of TRE features on its web-page.
</p>
<h2 id="usage-example">Usage example</h2>
<p>Approximate matching directions are specified in curly brackets and should be distinguishable from repetitive quantifiers (possibly with inserting a space after opening bracket):
</p>
<ul><li>(regular){~1}\s+(expression){~2} would match variants of phrase "regular expression" in which "regular" have no more than one typo and "expression" no more than two; as in ordinary regular expressions "\s+" means one or more space characters — i.e. rogular    ekspression
 would pass test;</li>
<li>(expression){ 5i + 3d + 2s < 11} would match word "expression" if total cost of typos is less than 11, while insertion cost is set to 5, deletion to 3 and substitution of character to 2 - i.e. ekspresson gives cost of 10.</li></ul>
<h2 id="language-bindings">Language bindings</h2>
<p>Apart from C, TRE is usable through <a href="/facts/Language_binding/V5HW4Wr4">bindings</a> for <a href="/facts/Perl/fx31kjlT">Perl</a>, <a href="/facts/Python_(programming_language)/YbuGqofa">Python</a> and <a href="/facts/Haskell_(programming_language)/4Htv9WLu">Haskell</a>.<a class="footnote-ref" id="fnref:12" href="#fn:12"><sup>12</sup></a> It is the default regular expression engine in <a href="/facts/R_(programming_language)/LSrkr8K8">R</a>.<a class="footnote-ref" id="fnref:13" href="#fn:13"><sup>13</sup></a> However, if the project should be <a href="/facts/Cross-platform/16zqdvNR">cross-platform</a>, each target platform would need a separate interface.
</p>
<h2 id="disadvantages">Disadvantages</h2>
<p>Since other regular expression engines usually do not provide approximate matching ability, there is almost no concurrent implementation with which TRE could be compared. However, there are a few things which programmers may wish to see implemented in future releases:<a class="footnote-ref" id="fnref:14" href="#fn:14"><sup>14</sup></a>
</p>
<ul><li>a replacement mechanism for substituting matched text fragments (like in <a href="/facts/Sed/5pXUwRdD">sed</a> string processor and many modern implementations of regular expressions, including built into <a href="/facts/Perl/fx31kjlT">Perl</a> or <a href="/facts/Java_(programming_language)/9ScgFyAL">Java</a>);</li>
<li>opportunity to use another approximate matching algorithm (than <a href="/facts/Levenshtein_distance/F0otyNSl">Levenshtein's</a>) for better typo value assessment (for example <a href="/facts/Soundex/1EL8PN6b">Soundex</a>), or at least this algorithm to be improved to allow typos of the "swap" type (see <a href="/facts/Damerau%E2%80%93Levenshtein_distance/Y8zpLKpY">Damerau–Levenshtein distance</a>).</li></ul>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Levenshtein_automaton/c4Eqe6hS">Levenshtein automaton</a></li>
<li><a href="/facts/Comparison_of_regular_expression_engines/lMuelcvC">Comparison of regular expression engines</a></li>
<li><a href="/facts/Agrep/QWARW3jS">Agrep</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li><a href="https://laurikari.net/tre/">TRE - The free and portable approximate regular expression matching library</a></li></ul>
<h2 id="further-reading">Further reading</h2>
<ul><li>Navarro, Gonzalo (March 2001), "A guided tour to approximate string matching", <i>ACM Computing Surveys</i>, 33 (1): 31–88, <a href="/facts/CiteSeerX_(identifier)/SceDmd3c">CiteSeerX</a> <a href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.452.6317">10.1.1.452.6317</a>, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1145%2F375360.375365">10.1145/375360.375365</a>, <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:207551224">207551224</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>"Tre for Windows". <a href="http://gnuwin32.sourceforge.net/packages/tre.htm" target="_blank">http://gnuwin32.sourceforge.net/packages/tre.htm</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>"Using fuzzy searches with tre-agrep". Linux Magazine. <a href="https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3" target="_blank">https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>"R: Pattern Matching for Raw Vectors". MIT.edu. <a href="http://web.mit.edu/r/current/arch/i386_linux26/lib/R/library/base/html/grepRaw.html" target="_blank">http://web.mit.edu/r/current/arch/i386_linux26/lib/R/library/base/html/grepRaw.html</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>"tre 0.8.0-6 (x86_64)". July 7, 2020. <a href="https://www.archlinux.org/packages/community/x86_64/tre" target="_blank">https://www.archlinux.org/packages/community/x86_64/tre</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Andoni, Alexandr; Krauthgamer, Robert; Onak, Krzysztof (2010). Polylogarithmic approximation for edit distance and the asymmetric query complexity. IEEE Symp. Foundations of Computer Science (FOCS). arXiv:1005.4033. Bibcode:2010arXiv1005.4033A. CiteSeerX 10.1.1.208.2079. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>"TRE web-page - Regex Syntax". <a href="https://laurikari.net/tre/documentation/regex-syntax/" target="_blank">https://laurikari.net/tre/documentation/regex-syntax/</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>"Using fuzzy searches with tre-agrep". Linux Magazine. <a href="https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3" target="_blank">https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>"Tre-agrep has all of grep's functionality but can also do ambiguous or fuzzy" <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>"tre 0.8.0-6 (x86_64)". July 7, 2020. <a href="https://www.archlinux.org/packages/community/x86_64/tre" target="_blank">https://www.archlinux.org/packages/community/x86_64/tre</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
<li id="fn:10"><p>"Using fuzzy searches with tre-agrep". Linux Magazine. <a href="https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3" target="_blank">https://www.linux-magazine.com/Issues/2016/186/Command-Line-tre-agrep/(offset)/3</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></p></li>
<li id="fn:11"><p>"TRE web-page - About". <a href="https://laurikari.net/tre/about/" target="_blank">https://laurikari.net/tre/about/</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></p></li>
<li id="fn:12"><p>"TRE web-page - FAQ". <a href="https://laurikari.net/tre/faq/" target="_blank">https://laurikari.net/tre/faq/</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></p></li>
<li id="fn:13"><p>"Regular Expressions as used in R". <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html" target="_blank">https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></p></li>
<li id="fn:14"><p>Trofimovich, Ulya (2019). "Tagged Deterministic Finite Automata with Lookahead". arXiv:1907.08837 [cs.FL]. practical improvements .. Lurikari algorithm, notably .. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></p></li>
</ol>

TRE (computing) open-in-new

TRE (computing)