Given a natural number k ≥ 0 {\displaystyle k\geq 0} , a context-free grammar G = ( V , Σ , R , S ) {\displaystyle G=(V,\Sigma ,R,S)} is an LL(k) grammar if
there is at most one production rule r ∈ R {\displaystyle r\in R} such that for some terminal symbol strings w 2 , w 3 ∈ Σ ∗ {\displaystyle w_{2},w_{3}\in \Sigma ^{*}} ,
An alternative, but equivalent, formal definition is the following: G = ( V , Σ , R , S ) {\displaystyle G=(V,\Sigma ,R,S)} is an LL(k) grammar if, for arbitrary derivations
when the first k {\displaystyle k} symbols of w 2 w 3 {\displaystyle w_{2}w_{3}} agree with those of w 2 ′ w 3 ′ {\displaystyle w'_{2}w'_{3}} , then ν = ω {\displaystyle \nu =\omega } .23
Informally, when a parser has derived w 1 A w 3 {\displaystyle w_{1}Aw_{3}} , with A {\displaystyle A} its leftmost nonterminal and w 1 {\displaystyle w_{1}} already consumed from the input, then by looking at that w 1 {\displaystyle w_{1}} and peeking at the next k {\displaystyle k} symbols w {\displaystyle w} of the current input, the parser can identify with certainty the production rule r {\displaystyle r} for A {\displaystyle A} .
When rule identification is possible even without considering the past input w 1 {\displaystyle w_{1}} , then the grammar is called a strong LL(k) grammar.4 In the formal definition of a strong LL(k) grammar, the universal quantifier for w 1 {\displaystyle w_{1}} is omitted, and w 1 {\displaystyle w_{1}} is added to the "for some" quantifier for w 2 , w 3 {\displaystyle w_{2},w_{3}} . For every LL(k) grammar, a structurally equivalent strong LL(k) grammar can be constructed.5
The class of LL(k) languages forms a strictly increasing sequence of sets: LL(0) ⊊ LL(1) ⊊ LL(2) ⊊ ….6 It is decidable whether a given grammar G is LL(k), but it is not decidable whether an arbitrary grammar is LL(k) for some k. It is also decidable if a given LR(k) grammar is also an LL(m) grammar for some m.7
Every LL(k) grammar is also an LR(k) grammar. An ε-free LL(1) grammar is also an SLR(1) grammar. An LL(1) grammar with symbols that have both empty and non-empty derivations is also an LALR(1) grammar. An LL(1) grammar with symbols that have only the empty derivation may or may not be LALR(1).8
LL grammars cannot have rules containing left recursion.9 Each LL(k) grammar that is ε-free can be transformed into an equivalent LL(k) grammar in Greibach normal form (which by definition does not have rules with left recursion).10
Let Σ {\displaystyle \Sigma } be a terminal alphabet. A partition π {\displaystyle \pi } of Σ ∗ {\displaystyle \Sigma ^{*}} is called a regular partition if for every R ∈ π {\displaystyle R\in \pi } the language R {\displaystyle R} is regular.
Let G = ( V , Σ , R , S ) {\displaystyle G=(V,\Sigma ,R,S)} be a context free grammar and let π = { R 1 , … , R n } {\displaystyle \pi =\{R_{1},\dotso ,R_{n}\}} be a regular partition of Σ ∗ {\displaystyle \Sigma ^{*}} . We say that G {\displaystyle G} is an LL( π {\displaystyle \pi } ) grammar if, for arbitrary derivations
such that x ≡ y mod π {\displaystyle x\equiv y\mod \pi } it follows that ν = ω {\displaystyle \nu =\omega } .11
A grammar G is said to be LL-regular (LLR) if there exists a regular partition of Σ ∗ {\displaystyle \Sigma ^{*}} such that G is LL( π {\displaystyle \pi } ). A language is LL-regular if it is generated by an LL-regular grammar.
LLR grammars are unambiguous and cannot be left-recursive.
Every LL(k) grammar is LLR. Every LL(k) grammar is deterministic, but there exists a LLR grammar that is not deterministic.12 Hence the class of LLR grammars is strictly larger than the union of LL(k) for each k.
It is decidable whether, given a regular partition π {\displaystyle \pi } , a given grammar is LL( π {\displaystyle \pi } ). It is, however, not decidable whether an arbitrary grammar G is LLR. This is due to the fact that deciding whether a grammar G generates a regular language, which would be necessary to find a regular partition for G, can be reduced to the Post correspondence problem.
Every LLR grammar is LR-regular (LRR, the corresponding[clarify] equivalent for LR(k) grammars), but there exists an LR(1) grammar that is not LLR.13
Historically, LLR grammars followed the invention of the LRR grammars. Given a regular partition a Moore machine can be constructed to transduce the parsing from right to left, identifying instances of regular productions. Once that has been done, an LL(1) parser is sufficient to handle the transduced input in linear time. Thus, LLR parsers can handle a class of grammars strictly larger than LL(k) parsers while being equally efficient. Despite that the theory of LLR does not have any major applications. One possible and very plausible reason is that while there are generative algorithms for LL(k) and LR(k) parsers, the problem of generating an LLR/LRR parser is undecidable unless one has constructed a regular partition upfront. But even the problem of constructing a suitable regular partition given grammar is undecidable.
A context-free grammar is called simple deterministic,14 or just simple,15 if
A set of strings is called a simple deterministic, or just simple, language, if it has a simple deterministic grammar.
The class of languages having an ε-free LL(1) grammar in Greibach normal form equals the class of simple deterministic languages.16 This language class includes the regular sets not containing ε.17 Equivalence is decidable for it, while inclusion is not.18
LL grammars, particularly LL(1) grammars, are of great practical interest, as they are easy to parse, either by LL parsers or by recursive descent parsers, and many computer languages[clarify] are designed to be LL(1) for this reason. Languages based on grammars with a high value of k have traditionally been considered to be difficult to parse, although this is less true now given the availability and widespread use of parser generators supporting LL(k) grammars for arbitrary k.
Rosenkrantz & Stearns (1970, p. 227). Def.1. The authors do not consider the case k=0. - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
where " ⇒ L {\displaystyle \Rightarrow ^{L}} " denotes derivability by leftmost derivations, and w 1 , w 2 , w 3 , w 2 ′ , w 3 ′ ∈ Σ ∗ {\displaystyle w_{1},w_{2},w_{3},w'_{2},w'_{3}\in \Sigma ^{*}} , A ∈ V {\displaystyle A\in V} , and χ , ν , ω ∈ ( Σ ∪ V ) ∗ {\displaystyle \chi ,\nu ,\omega \in (\Sigma \cup V)^{*}} ↩
Waite & Goos (1984, p. 123) Def. 5.22 - Waite, William M.; Goos, Gerhard (1984). Compiler Construction. Texts and Monographs in Computer Science. Heidelberg: Springer. ISBN 978-3-540-90821-0. ↩
Rosenkrantz & Stearns (1970, p. 235) Def.2 - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Rosenkrantz & Stearns (1970, p. 235) Theorem 2 - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Rosenkrantz & Stearns (1970, p. 246–247): Using " + {\displaystyle +} " to denote "or", the string set { a n ( b k d + b + c c ) n : n ≥ 1 } {\displaystyle \{a^{n}(b^{k}d+b+cc)^{n}:n\geq 1\}} has an L L ( k + 1 ) {\displaystyle LL(k+1)} , but no ε-free L L ( k ) {\displaystyle LL(k)} grammar, for each k ≥ 1 {\displaystyle k\geq 1} . - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Rosenkrantz & Stearns (1970, pp. 254–255) - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Beatty (1982) - Beatty, J. C. (1982). "On the relationship between LL(1) and LR(1) grammars" (PDF). Journal of the ACM. 29 (4 (Oct)): 1007–1022. doi:10.1145/322344.322350. S2CID 14700480. https://cs.uwaterloo.ca/research/tr/1979/CS-79-36.pdf ↩
Rosenkrantz & Stearns (1970, pp. 241) Lemma 5 - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Rosenkrantz & Stearns (1970, p. 242) Theorem 4 - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩
Poplawski, David (1977). "Properties of LL-Regular Languages". Purdue University. {{cite journal}}: Cite journal requires |journal= (help) /wiki/Template:Cite_journal ↩
David A. Poplawski (Aug 1977). Properties of LL-Regular Languages (Technical Report). Purdue University, Department of Computer Science. https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1176&context=cstech ↩
Korenjak & Hopcroft (1966) - Korenjak, A.J.; Hopcroft, J.E. (1966). "Simple deterministic languages". IEEE Conf. Rec. 7th Ann. Symp. on Switching and Automata Theory (SWAT). IEEE Pub. No. Vol. 16-C-40. pp. 36–46. doi:10.1109/SWAT.1966.22. https://doi.org/10.1109%2FSWAT.1966.22 ↩
Hopcroft & Ullman (1979, p. 229) Exercise 9.3 - Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 978-0-201-02988-8. https://archive.org/details/introductiontoau00hopc ↩
Rosenkrantz & Stearns (1970, p. 243) - Rosenkrantz, D. J.; Stearns, R. E. (1970). "Properties of Deterministic Top Down Grammars". Information and Control. 17 (3): 226–256. doi:10.1016/s0019-9958(70)90446-8. https://doi.org/10.1016%2Fs0019-9958%2870%2990446-8 ↩