Vectors and Sets

Now that we understand the difference between the reader and the evaluator, let's make our own reader. We'll start by coming up with a line that contains different things we want to parse:

42 0.5 1/2 foo-bar

To begin, we'll store this text in a string and put it in a var:


(def text "42 0.5 1/2 foo-bar")

Our reader will look at each character in that string and try to figure out what to do with it. To do that, we need to break it up into individual characters using the vec function:


(vec text)

You can see that the results are surrounded by square brackets, meaning it's a vector. This is a common way of storing a collection of data in Clojure. We should store that in a var as well:


(def chars (vec text))

We can pull out individual characters from the vector with get:


(get chars 0)
(get chars 1)
(get chars 2)

To make this a bit easier, we'll focus on the first character. Let's store it in a var:


(def first-char (get chars 0))

We know this character happens to be a digit, but how would we determine that with code? There are only 10 digits, so let's just make a vector that contains all of them. Notice that we are using a special backslash syntax to write each character, instead of surrounding them in double quotes. This is normally how they are written:


(def digits [\1 \2 \3 \4 \5 \6 \7 \8 \9 \0])

Now we want to figure out if first-char is inside of digits. One way is to use the .indexOf function, which will return the index of the value if it finds it:


(.indexOf digits first-char)

It returns 4, because the \5 character is in the 4th position (starting with 0!). If we search for a character that isn't in there, it returns -1:


(.indexOf digits \e)

If the name .indexOf looks odd, it's because it is not actually part of Clojure; it is actually a Java method. Any time you see a function starting with a period, it is reaching down to the underlying platform. Normally Clojure tries not to duplicate functionality that already exists underneath.

While the technique above works, it's not very efficient. The .indexOf function works by going through every item in the vector until it finds what it's looking for. There is a much better way to check if a value is contained within a finite set of things -- and it's called a set. Let's redefine digits as a set:


(def digits #{\1 \2 \3 \4 \5 \6 \7 \8 \9 \0})

We can see that sets are surrounded by #{...} instead of [...]. One interesting thing about them is, they don't maintain order. Check it out:


digits

This is the tradeoff you make between vectors and sets. Vectors maintain order, but their .indexOf function is slow. Sets can shuffle their contents into any arbitrary order they want, but they are very fast at checking if they contain something. Instead of using .indexOf, they use a Clojure function:


(contains? digits first-char)
(contains? digits \e)

As you can see, contains? just returns true or false. There would be no point to returning an index value like .indexOf does, since sets don't maintain order in the first place. For our purposes, that's ok, since we just want to check if the character is anywhere inside of digits.

Previous: The Evaluator Next: Functions and Lists