Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules. Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.
Description
A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form f ( ⟨ x 1 , . . . , x m ⟩ , ⟨ y 1 , . . . , y n ⟩ , . . . ) = γ {\displaystyle f(\langle x_{1},...,x_{m}\rangle ,\langle y_{1},...,y_{n}\rangle ,...)=\gamma } , where γ {\displaystyle \gamma } is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like X → f ( Y , Z , . . . ) {\displaystyle X\to f(Y,Z,...)} , where Y {\displaystyle Y} , Z {\displaystyle Z} , ... are string tuples or non-terminal symbols.
The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.
Example
A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1), which generates the palindrome language { w w R : w ∈ { a , b } ∗ } {\displaystyle \{ww^{R}:w\in \{a,b\}^{*}\}} , where w R {\displaystyle w^{R}} is the string reverse of w {\displaystyle w} , we can define the composition function conc as in (2a) and the rewrite rules as in (2b).
S → ϵ | a S a | b S b {\displaystyle S\to \epsilon ~|~aSa~|~bSb} | 1 |
c o n c ( ⟨ x ⟩ , ⟨ y ⟩ , ⟨ z ⟩ ) = ⟨ x y z ⟩ {\displaystyle conc(\langle x\rangle ,\langle y\rangle ,\langle z\rangle )=\langle xyz\rangle } | 2a |
S → c o n c ( ⟨ ϵ ⟩ , ⟨ ϵ ⟩ , ⟨ ϵ ⟩ ) | c o n c ( ⟨ a ⟩ , S , ⟨ a ⟩ ) | c o n c ( ⟨ b ⟩ , S , ⟨ b ⟩ ) {\displaystyle S\to conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle )~|~conc(\langle a\rangle ,S,\langle a\rangle )~|~conc(\langle b\rangle ,S,\langle b\rangle )} | 2b |
The CF production of abbbba is
S aSa abSba abbSbba abbbbaand the corresponding GCFG production is
S → c o n c ( ⟨ a ⟩ , S , ⟨ a ⟩ ) {\displaystyle S\to conc(\langle a\rangle ,S,\langle a\rangle )} c o n c ( ⟨ a ⟩ , c o n c ( ⟨ b ⟩ , S , ⟨ b ⟩ ) , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle a\rangle )} c o n c ( ⟨ a ⟩ , c o n c ( ⟨ b ⟩ , c o n c ( ⟨ b ⟩ , S , ⟨ b ⟩ ) , ⟨ b ⟩ ) , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )} c o n c ( ⟨ a ⟩ , c o n c ( ⟨ b ⟩ , c o n c ( ⟨ b ⟩ , c o n c ( ⟨ ϵ ⟩ , ⟨ ϵ ⟩ , ⟨ ϵ ⟩ ) , ⟨ b ⟩ ) , ⟨ b ⟩ ) , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle ),\langle b\rangle ),\langle b\rangle ),\langle a\rangle )} c o n c ( ⟨ a ⟩ , c o n c ( ⟨ b ⟩ , c o n c ( ⟨ b ⟩ , ⟨ ϵ ⟩ , ⟨ b ⟩ ) , ⟨ b ⟩ ) , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,\langle \epsilon \rangle ,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )} c o n c ( ⟨ a ⟩ , c o n c ( ⟨ b ⟩ , ⟨ b b ⟩ , ⟨ b ⟩ ) , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,\langle bb\rangle ,\langle b\rangle ),\langle a\rangle )} c o n c ( ⟨ a ⟩ , ⟨ b b b b ⟩ , ⟨ a ⟩ ) {\displaystyle conc(\langle a\rangle ,\langle bbbb\rangle ,\langle a\rangle )} ⟨ a b b b b a ⟩ {\displaystyle \langle abbbba\rangle }Linear Context-free Rewriting Systems (LCFRSs)
Weir (1988)2 describes two properties of composition functions, linearity and regularity. A function defined as f ( x 1 , . . . , x n ) = . . . {\displaystyle f(x_{1},...,x_{n})=...} is linear if and only if each variable appears at most once on either side of the =, making f ( x ) = g ( x , y ) {\displaystyle f(x)=g(x,y)} linear but not f ( x ) = g ( x , x ) {\displaystyle f(x)=g(x,x)} . A function defined as f ( x 1 , . . . , x n ) = . . . {\displaystyle f(x_{1},...,x_{n})=...} is regular if the left hand side and right hand side have exactly the same variables, making f ( x , y ) = g ( y , x ) {\displaystyle f(x,y)=g(y,x)} regular but not f ( x ) = g ( x , y ) {\displaystyle f(x)=g(x,y)} or f ( x , y ) = g ( x ) {\displaystyle f(x,y)=g(x)} .
A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a proper subclass of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.
On the other hand, LCFRSs are strictly more expressive than linear-indexed grammars and their weakly equivalent variant tree adjoining grammars (TAGs).3 Head grammar is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.
LCFRS are weakly equivalent to (set-local) multicomponent TAGs (MCTAGs) and also with multiple context-free grammar (MCFGs [1]).4 and minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in polynomial time.5
See also
References
Weir, David Jeremy (Sep 1988). Characterizing mildly context-sensitive grammar formalisms (PDF) (Ph.D.). Paper. Vol. AAI8908403. University of Pennsylvania Ann Arbor. http://www.sussex.ac.uk/Users/davidw/resources/papers/dissertation.pdf ↩
Weir, David Jeremy (Sep 1988). Characterizing mildly context-sensitive grammar formalisms (PDF) (Ph.D.). Paper. Vol. AAI8908403. University of Pennsylvania Ann Arbor. http://www.sussex.ac.uk/Users/davidw/resources/papers/dissertation.pdf ↩
Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 33. ISBN 978-3-642-14846-0. 978-3-642-14846-0 ↩
Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 35-36. ISBN 978-3-642-14846-0. 978-3-642-14846-0 ↩
Johan F.A.K. van Benthem; Alice ter Meulen (2010). Handbook of Logic and Language (2nd ed.). Elsevier. p. 404. ISBN 978-0-444-53727-0. 978-0-444-53727-0 ↩