The Reader

The code you write begins its life merely as text – a stream of characters with no meaning at all to the computer they lie within (and possibly to the author who wrote them, for that matter). In Clojure, it is the job of the reader to parse that text into meaningful parts.

Imagine writing our own reader for Clojure. That may seem like a pointless thing to do since Clojure already has one (and ours will likely be buggier and less complete), but that hasn't stopped programmers from reinventing the wheel in the past. Our wheel may barely move, but it'll be our barely moveable wheel.

When our reader comes across 42, it should parse it as a single number. When it comes across 1 2 10, it should parse three separate numbers. To do this, our reader makes a rule:

If I come across a digit 0 through 9, keep reading until I find something that isn't a digit.

That works for the simplest cases, but our reader forgot about negative numbers like -42, so we add a rule:

If I come across a hyphen followed by a digit, include it in the number.

That's better, but there are still more things to consider, such as 0.5 and 1/2. After adding new rules to account for decimals and ratios, our reader is getting pretty good at parsing numbers.

To reiterate, our reader receives these numbers as text (in particular, ASCII / UTF-8 numerals) and turns them into a format understandable by the computer, which we will generically call data. The reader turns text into data.

While 0.5 and 1/2 are different textual representations, once they are parsed by our reader they end up more-or-less as the same data. This distinction between the text and the data that it turns into will keep coming up, and is extremely important to keep in mind. Text is for humans, data is for machines (and for humans named Rich Hickey).

In addition to numbers, our reader also needs to parse strings. In Clojure they are surrounded by double quotes:

"Hello, world!"

The rule for our reader is easy enough:

If I come across a double quote, keep reading until I come across another double quote.

But we have a problem: With this rule, it's impossible to create a string that contains a double quote. If we want to make a string that contains The " character surrounds strings, we'll get an error, because our reader will stop parsing the string when it gets to the second double quote:

"The " character surrounds strings"

The typical solution is escaping the internal double quote with a backslash character:

"The \" character surrounds strings"

So we modify our rule:

If I come across a double quote, keep reading until I come across another double quote not preceded by a backslash.

Here again we see the all-important distinction between the text written by a programmer, and the data that our reader turns it into. The double quotes and escaping are all artifacts of the textual representation – that is, they exist purely to help our reader parse the string from text.

The numbers and strings above are called literals because they represent their value directly. It can be extremely useful to represent values indirectly, which in Clojure is done with symbols. Symbols are names that point to some other value.

even? odd? + -

Our reader should parse four separate symbols here. So the rule boils down to...

If I come across a non-numeric character, keep reading until I reach whitespace.

Let's stick with just these three data types: numbers, strings, and symbols. Our reader doesn't know what the symbols point to; figuring that out is the job of the evaluator.

Next: The Evaluator