In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking, tree transducers extend tree automata in the same way that word transducers extend word automata.
Manipulating tree structures instead of words enable TT to model syntax-directed transformations of formal or natural languages. However, TT are not as well-behaved as their word counterparts in terms of algorithmic complexity, closure properties, etcetera. In particular, most of the main classes are not closed under composition.
The main classes of tree transducers are:
Top-Down Tree Transducers (TOP)
A TOP T is a tuple (Q, Σ, Γ, I, δ) such that:
- Q is a finite set, the set of states;
- Σ is a finite ranked alphabet, called the input alphabet;
- Γ is a finite ranked alphabet, called the output alphabet;
- I is a subset of Q, the set of initial states; and
- δ is a set of rules of the form q ( f ( x 1 , … , x n ) ) → u {\displaystyle q(f(x_{1},\dots ,x_{n}))\to u} , where f is a symbol of Σ, n is the arity of f, q is a state, and u is a tree on Γ and Q × 1.. n {\displaystyle Q\times 1..n} , such pairs being nullary.
Examples of rules and intuitions on semantics
For instance,
q ( f ( x 1 , … , x 3 ) ) → g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle q(f(x_{1},\dots ,x_{3}))\to g(a,q'(x_{1}),h(q''(x_{3})))}is a rule – one customarily writes q ( x i ) {\displaystyle q(x_{i})} instead of the pair ( q , x i ) {\displaystyle (q,x_{i})} – and its intuitive semantics is that, under the action of q, a tree with f at the root and three children is transformed into
g ( a , q ′ ( x 1 ) , h ( q ″ ( x 3 ) ) ) {\displaystyle g(a,q'(x_{1}),h(q''(x_{3})))}where, recursively, q ′ ( x 1 ) {\displaystyle q'(x_{1})} and q ″ ( x 3 ) {\displaystyle q''(x_{3})} are replaced, respectively, with the application of q ′ {\displaystyle q'} on the first child and with the application of q ″ {\displaystyle q''} on the third.
Semantics as term rewriting
The semantics of each state of the transducer T, and of T itself, is a binary relation between input trees (on Σ) and output trees (on Γ).
A way of defining the semantics formally is to see δ {\displaystyle \delta } as a term rewriting system, provided that in the right-hand sides the calls are written in the form q ( x i ) {\displaystyle q(x_{i})} , where states q are unary symbols. Then the semantics [ [ q ] ] {\displaystyle [\![q]\!]} of a state q is given by
[ [ q ] ] = { u ↦ v ∣ u is a tree on Σ , v is a tree on Γ , and q ( u ) → δ ∗ v } . {\displaystyle [\![q]\!]=\{u\mapsto v\mid u{\text{ is a tree on }}\Sigma ,\ v{\text{ is a tree on }}\Gamma {\text{, and }}q(u)\to _{\delta }^{*}v\}.}The semantics of T is then defined as the union of the semantics of its initial states:
[ [ T ] ] = ⋃ q ∈ I [ [ q ] ] . {\displaystyle [\![T]\!]=\bigcup _{q\in I}[\![q]\!].}Determinism and domain
As with tree automata, a TOP is said to be deterministic (abbreviated DTOP) if no two rules of δ share the same left-hand side, and there is at most one initial state. In that case, the semantics of the DTOP is a partial function from input trees (on Σ) to output trees (on Γ), as are the semantics of each of the DTOP's states.
The domain of a transducer is the domain of its semantics. Likewise, the image of a transducer is the image of its semantics.
Properties of DTOP
- DTOP are not closed under union: this is already the case for deterministic word transducers.
- The domain of a DTOP is a regular tree language. Furthermore, the domain is recognisable by a deterministic top-down tree automaton (DTTA) of size at most exponential in that of the initial DTOP.1
- The image of a DTOP is not a regular tree language.
- However, the domain of a DTOP cannot be restricted to a regular tree language. That is to say, given a DTOP T and a language L, one cannot in general build a DTOP T ′ {\displaystyle T'} such that the semantics of T ′ {\displaystyle T'} is that of T, restricted to L.
- DTOP are not closed under composition. However this problem can be solved by the addition of a lookahead: a tree automaton, coupled to the transducer, that can perform tests on the domain which the transducer is incapable of.2
- The typechecking problem—testing whether the image of a regular tree language is included in another regular tree language—is decidable.
- The equivalence problem—testing whether two DTOP define the same functions—is decidable.3
Bottom-Up Tree Transducers (BOT)
As in the simpler case of tree automata, bottom-up tree transducers are defined similarly to their top-down counterparts, but proceed from the leaves of the tree to the root, instead of from the root to the leaves. Thus the main difference is in the form of the rules, which are of the form f ( q 1 ( x 1 ) , … , q n ( x n ) ) → q ( u ) {\displaystyle f(q_{1}(x_{1}),\dots ,q_{n}(x_{n}))\to q(u)} .
- Comon, Hubert; Dauchet, Max; Gilleron, Rémi; Jacquemard, Florent; Lugiez, Denis; Löding, Christof; Tison, Sophie; Tommasi, Marc (November 2008). "Chapter 6: Tree Transducers". Tree Automata Techniques and Applications. Retrieved 11 February 2014.
- Hosoya, Haruo (4 November 2010). Foundations of XML Processing: The Tree-Automata Approach. Cambridge University Press. ISBN 978-1-139-49236-2.
References
Baker, B.S.: Composition of top-down and bottom-up tree transductions. Inf. Control 41(2), 186–213 (1979) ↩
Maneth, Sebastian (December 2015). "A Survey on Decidable Equivalence Problems for Tree Transducers" (PDF). International Journal of Foundations of Computer Science. 26 (8): 1069–1100. doi:10.1142/S0129054115400134. hdl:20.500.11820/2f1acef4-1b06-485f-bfd1-88636c9e2fe6. https://www.pure.ed.ac.uk/ws/files/25073983/survey_1.pdf ↩
"Decidability results concerning tree transducers I". www.inf.u-szeged.hu. http://www.inf.u-szeged.hu/actacybernetica/edb/vol05n1/Esik_1980_ActaCybernetica.xml ↩