lazy

lazy(fn: function, m?: string): Parser

Takes a parser factory function and defers execution of that function until the returned parser is executed. This can be used to define recursive parsers.

First, note that throughout this page, we will be in one of the rare territories where the distinction between parser and parser factory is actually very important. On this page, when parser is used, it actually means parser and not parser factory.

To illustrate why lazy is necessary, let's look at a naive definition of a parser that recursively parses comma-separated lists of words.

const word = many1(letter())
const list = seq(
  word, 
  opt(right(char(','), list)),
)

This looks fine, but it actually is illegal JavaScript. A variable (list in this case) cannot be used until it is fully defined. That means that list cannot be used within the definition of list, so a recursive variable declaration like this is not allowed.

The simple way to fix that is to make list into a function instead (in fact, this function is a parser factory, as it's a function that returns a parser). It's perfectly okay to reference a function from inside its own definition.

function list() {
  return seq(
    word,
    opt(right(char(','), list()))
  )
}

This also looks fine, but it has an even scarier problem. When you try to create a parser by invoking list, an infinite loop is created. list calls list, which calls list, and so on, forever (or at least until the stack overflows). We can see by looking at the definition that list wouldn't need to be called if the input didn't have the optional comma after a word, but there's no way for list to know whether that's the case, since an input hasn't been applied yet.

What we need is a solution like this one, except that it would defer the internal call to list until it was sure that it was actually needed. In other words, it would have to only call list when the parser was executed, not when it was created.

That's what lazy is for. Instead of running the parser factory that's passed to it, it wraps it in a parser which calls that parser factory when it's executed. If that parser is never executed — in this case, because that opt isn't satisfied because there's no comma to parse — then the parser factory doesn't run and the infinite loop never happens.

Here's the final, working solution to this recursive parser factory.

function list() {
  return seq(
    word,
    opt(right(char(','), lazy(list)))
  )
}

list is now a safe parser factory that can be executed like any other parser factory without fear that it will loop infinitely.

Important notes on recursion

There are two important things to know about recursive parsers and how to use lazy to implement them.

The first is that, as with any recursion, there has to be a signal to stop. In recursive functions, there's usually an if statement that stops the function from running itself again once a certain point is reached (for example, a recursive factorial function will have some statement like if (n === 0) return n). That's the same with recursive parsers. lazy will do no good unless there is a case where it's not run, such as (as above) being inside an opt combinator that will eventually not match.

The second is that lazy cannot solve left-recursion. Top-down parser libraries like Kessel simply can't do left-recursive grammars. The example above is an example of right-recursion, which simply means that the recursive term (list in the above case) appears on the right side of a sequence. right is a sequence parser, and list appears on the right side of that sequence. Kessel can parse this without a problem.

Something like this, on the other hand, would make trouble. (This is done in a BNF-like grammar; I am no BNF expert and don't expect you to be either. The meaning should be fairly clear anyway.)

Expression ::= Expression '*' Expression
            || Expression '+' Expression
            || Term
Term ::= Number | Variable

This is not an uncommon sort of thing to want to parse; it's just a way of recursively defining multiplication and addition operations. But in this form, there's no way for Kessel (or any other top-down parser, combinator-based or not) to parse it successfully. The problem is that the recursive term, Expression in this case, is on the left side of a sequence. It's left-recursive.

Fortunately all left-recursive grammars can be transformed via algorithm into right-recursive grammars, at the cost of some complexity. Here's the same grammar, but with no left-recursion.

Expression ::= Multiplication '+' Expression
            || Multiplication
Multiplication ::= Term '*' Term
                || Term
Term ::= Number | Variable

This time, Expression appears on the right side of a sequence. This is right-recursive, and with the help of lazy, a Kessel parser for this grammar is easily constructed.

Algorithms to eliminate left-recursion are beyond the scope of this documentation, but there are a lot of sources available to explain how to do it. A quite academic resource (for those of you that like things like that) is available a the Western University CS department, and a more accessible one (at least in my opinion) can be found at the Worcester Polytechnic Institute CS department. Wikipedia's article on left recursion has a reasonable algorithm; Wikipedia's material on math and on theoretical computing is often extremely dence, but I find this one usable.

If none of those excite you, just do a Google search for "left recursion elimination". You will get a lot of results, and perhaps one of them will work better for you than these three. If not, there are even videos on YouTube on how to do it.

Example

const parser = (function parser() {
  return clean(flat(seq(
    join(many1(letter())),
    opt(right(char(','), lazy(parser))),
  )))
})()

const s = parse(parser, 'alfa,bravo,charlie')
console.log(status(s))  // "ok"
console.log(success(s)) // ["alfa","bravo","charlie"]

const f = parse(parser, '123,456,789')
console.log(status(f))  // "fail"
console.log(failure(f)) // Parse error at (line 1, column 1):
                        //
                        // 123,456,789
                        // ^
                        // Expected a letter

This is a "real" example of the code in the discussion above, in that it accounts for handling joining and flattening of character arrays into strings and for eliminating the null that opt will eventually produce when it doesn't match anymore. That was left out in the text above because it wasn't relevant to that discussion, but if you ran the s example case with the simple code from the discussion, you'd get

[
  ['a', 'l', 'f', 'a'], 
  [
    ['b', 'r', 'a', 'v', 'o'], 
    [
      ['c', 'h', 'a', 'r', 'l', 'i', 'e'],
      [null],
    ]
  ]
]

And that's probably not what you want.

It is also idiomatic in that it's in an IIFE, so parser is an actual parser rather than a parser factory. This matches up with the examples in all of the other API documentation pages. You can, of course, do it how you want.

If this code makes sense to you without taking a few minutes to study it, then you've come far in your knowledge of Kessel.

Parameters

fn: A parser factory function whose invocation will be deferred until the lazy parser is executed.
m: The optional expected error message that will take the place of the default error message.

Success

Succeeds if the parser produced by fn succeeds. Returns that parser's result.

Failure

Fails if the parser produced by fn fails.

Fatal Failure

Fails fatally if the parser produced by fn fails fatally.

Throws

Throws an error if fn is not a function.
Throws an error if m exists and is not a string.
Throws an error if fn does not return a parser.