Skip to content

1. Parsers Parsers Parsers

Most programmers probably have a good intuitive idea of what a parser is: it's some code that reads a piece of a longer string of text. By that reckoning, this might be the world's simplest parser:

const parser = text => {
  const [char, ...rest] = text
  return [char, rest]
}

This simple little function strips the first character off the text and returns it along with the rest of the unparsed text (which can then have the parser called on it again). It matches any character, so it's like Kessel's any parser. This example parser doesn't handle parsing errors (what happens when text is the empty string?) and is pretty simplistic with what it can work with, but it shares a surprising amount with a Kessel parser.

The type of the example parser is (text: string) => [string, string]. It takes a string and returns a tuple of two strings, one for the parsed character and one for the remaining text.

!!! note "Tuples" JavaScript, of course, doesn't have tuples. However, its arrays act very much like tuples in that, unlike most other languages, they can have elements of different types. We consistently use the term tuple when using arrays like this.

The type of a Kessel parser is very similar:

Parser = (ctx: Context) => [Context, Result]

A parser in Kessel is a function which takes a context as an argument. It returns an updated context, along with a result. These types are just enhanced versions of what is in the simple parser; a context is just the input text along with information about the current parsing location within that text, and a result is either the parsed value (if the parser was successful) or error information (if it was not).

!!! note "Types" Kessel is written in JavaScript, which famously doesn't assign types to variables. However, one of the purposes of a type system is to create a common language, and we use that here to talk about parameters and return types. The type system here is sort of a mix of that used by JSDoc and that used by TypeScript, but its only actual function is to make it easy to talk about what parsers expect. You will not use them in code.

Basic parsers

Many of the parsers that Kessel come ready to go without any additional information needed. A good example of this is letter. letter is a function that attempts to match a character in the range 'a'-'z' or 'A'-'Z' in the current location in the context passed into it. If it does, the parser is successful; the returned tuple will have a context updated to show which is the next character to parse, along with a result that has the parsed character as its value.

There are a few of these basic parsers that match certain classes of characters, like digits, whitespace, and different cases of letters. And of course there's any, which like our simple little example above, parses any character at all.

Parser factories

Okay, we're gonna reveal a secret quite early here. letter and any aren't actually parsers at all. They are functions that produce parsers (you can tell because they have to be used like letter() and any()). In fact, Kessel does not provide a single parser but instead a lot of functions that make parsers.

So why not just provide parsers? There's no good reason why the letter that Kessel provides can't be one.

The reason is that letter parser is ready to use right out of the box; it has all of the information that it needs to do its job. But most parsers need a little extra to get started. For instance, char is a parser that parses only a particular character, but it doesn't automatically know which character, so we have to have a way to tell it.

That's done with parameters, and in the case of char, it's just a single parameter that is the character the parser should be trying to parse. char('a') will try to match the letter 'a', while char('Я') will match a Russian capital letter "Ya". In this guide we refer to char as a parser factory.

In fact, letter doesn't need a parameter, but it can take one. If you don't like the error message that letter automatically generates, you can supply your own by passing it directly to the letter function (like this: letter('my error message')). This is the reason why letter is implemented as a parser factory. We'll talk about error messages in a lot more detail later.

There are a small number of Kessel parsers that require no parameters and cannot fail, so it makes no sense to send them alternate error messages. But for the sake of consistency, even these are implemented as parser factories. (See all for an example.)

!!! note "Another way to look at it" The fact is that we continually refer to these parser factories as parsers. But there is good reason for it.

`char` is a parser factory. But we don't ever just use `char`. We use it along with a parameter, as in `char('a')` or `char('Я')`, and these actually *are* parsers. Anything that is the result of an application of a parser factory is a parser, since the factories always return parsers.

If there is a reason why we need to differentiate between parsers and their factories, then we will use the term "parser factory", but for the most part we can just use "parser" without worrying about it causing confusion. So we do.

Combinators

There is a class of parser factories that take other parsers for their arguments. These parsers are called combinators and hold a special place in a library like Kessel because they can do things that no regular parser can do. They deserve several chapters of discussion, and we'll do just that starting in Chapter 4.