What's the Big Deal with Generators?

Last month the V8 team landed an initial implementation of ES6 Generators. Generators, of course, are nothing new - they first appeared in the CLU programming language back in 1974. Today they're available in several modern programming languages, including Python, C#, and Ruby (see the wiki article for a more complete timeline of language adoption).

Given that Node.js is built on top of V8, the inclusion of generators has, well, generated quite a bit of buzz in the world of Node. This is with good reason, too. In this post I'm going to attempt to explain what exactly generators are and why their addition is such a big deal in JavaScript (and more particularly, Node). Also, a quick hat tip to Andy Wingo and Rick Waldron is due for clearing up some of the finer details for me.

Precursory Concepts

While generators themselves aren't overly complex, some of the concepts relevant to them aren't exactly common knowledge, so bear with me while we set some context.

Concept 1: Iterators

In computer programming, an iterator is an object that enables a programmer to traverse a container.

-- Wikipedia

We'll get back to generators in a minute, but for now it's important to know that when invoked, generator functions return iterators. So, let's start with those. As the name implies, an iterator is not actually the collection that is being iterated, but is rather an object that assists with the iteration. Implementations vary across languages, but they all expose some variant of a .next() interface, enabling code to iterate over the items in a collection (or, more generically, container).

In JavaScript, iterators yield successive nextResult objects in response to .next() calls. These objects contain the yielded value in a value field, and a flag indicating whether or not the iterator is done in the aptly named done field. Let's call that "good enough" on iterators for now.

Please note: the specifications for iterators are currently in draft status, and subject to change.

Concept 2: Run-to-completion

Run-to-completion scheduling is a scheduling model in which each task runs until it either finishes, or explicitly yields control back to the scheduler.

-- Wikipedia

Hopefully this is a topic that most Node developers will be familiar with. Just as a recap, though, JavaScript has run-to-completion execution semantics. What this means, more-or-less, is that once a task begins, it runs until it's complete. Unlike systems that rely on preemptive scheduling (such as threading), there is nothing in the JavaScript runtime that will preemptively pause the execution of a given task, permit some other code to execute for awhile, and then resume the original task.

This isn't to say, of course, that JavaScript prohibits multi-tasking. At the heart of the JavaScript runtime is a message queue that contains a list of "messages" (which we'll refer to as tasks) that need to be run. This is where the infamous "event loop" comes in to play. In simplest terms, the event loop is nothing more than a "loop" that shifts tasks off of the queue and executes them. This event loop, then, is how we multi-task in JavaScript, and how we achieve non-blocking I/O in Node (see below).

The "run-to-completion" terminology comes from the guarantee that a task will complete without being preempted. Therefore, if we actually do want a task to take a "timeout" and let some other code run for awhile, we use the event-loop. This type of multi-tasking is said to be "cooperative", as the currently executing task has complete control until it voluntarily gives it up.

Concept 3: Non-blocking I/O

Every block is kinda mean.

-- 2Pac

We've already covered about 90% of what we need to on this one. When we refer to non-blocking I/O, we simply mean that other tasks should be allowed to run whenever we're waiting on I/O. When we don't yield execution, we're blocking. Blocking on slow operations (like fetching data from disk or the network) is detrimental to concurrent operations (like handling multiple requests in a web server). If we're blocked, nothing else can happen until the operation is complete, and everything grinds to a hault. This is why I/O operations in Node use callbacks; the operation is initiated, a callback is registered, and the task "runs to completion" without having to block. Once the result of the operation is ready, the callback is pushed onto the queue and subsequently run by the event loop.

Can We Actually Talk About Generators Now?

Alright, we definitely took the scenic route getting here, but let's talk about generators!

Introduction to Generators

Let's start with a quick code example, and then we'll talk about what's happening.

function* helloWorldGenerator() {
    yield 'hello';
    yield 'world';
}

var hw = helloWorldGenerator();
console.log(hw.next()); // prints { value: 'hello', done: false }
console.log(hw.next()); // prints { value: 'world', done: false }
console.log(hw.next()); // prints { value: undefined, done: true }

In the above example, helloWorldGenerator is a "generator function". Generator functions are special functions that return iterators and are denoted using the function* syntax. So, the hw variable above is the iterator returned from the helloWorldGenerator generator function.

As we covered earlier, iterators yield successive values in response to .next() calls, until it is exhausted. Here's a quick breakdown of what's happening above:

var hw = helloWorldGenerator();

We invoke our generator, but note that the generator body doesn't actually run at this point. Invoking helloWorldGenerator() simply returns our iterator.

console.log(hw.next());

We invoke .next() on the iterator, which causes the generator body to run up until the first yield expression. This expression "yields" our first value, 'hello', and then suspends execution of the generator body until .next() is called again. Meanwhile, hw.next() evaluates to a nextResult object, which as we previously discussed, contains the yielded value and a done flag indicating whether or not there are more values.

console.log(hw.next());

We again invoke the .next(), which causes execution of the generator body to resume up until the next yield expression. The same yield semantics apply, and we get our next result object with our 'world' value.

console.log(hw.next());

For the last time, we invoke .next(). Since there are no more yield expressions in the generator, we are simply given a result object with the done flag set to true. This is our sign to stop calling .next(), or we'll get an error next time.

Things Get Interesting: Lazy Evaluation

So far we've seen some fancy new syntax, but nothing that significantly advances our ability to express iteration in JavaScript. Let's look at another example, though:

function* powersOfTwo(maxExponent) {
    var exponent = 0;
    while (exponent <= maxExponent) {
        yield Math.pow(2, exponent);
        exponent++;
    }
}

var it = powersOfTwo(10),
    result = it.next();

while (!result.done) {
    console.log(result.value);
    result = it.next();
}

We're introducing two new concepts here. First, generator bodies can be initialized with parameters that remain in scope for the life of the iterator (maxExponent in the example above). Second, we can create generators without knowing how many yield expressions it will have. for-of loops are currently being standardized that will significantly simplify the above example, but we can already start to see the power of lazy sequence computation.

Best For Last: Async Awesomeness

You may be wondering why we spent so much time talking about run-to-completion and non-blocking I/O. Well, prior to generators, JavaScript as a language offered exactly one construct for resuming after an asynchronous operation: callbacks. In highly asynchronous applications, this easily turned into deeply nested callbacks, affectionately known as "callback hell" (or, alternatively, the "pyramid of doom"). This gave rise to all sorts of libraries to assist with the code structure, with promises in particular receiving a lot of attention lately.

Well, like I said, this was prior to generators. Let's go back to our helloWorldGenerator() again:

function* helloWorldGenerator() {
    yield 'hello';
    yield 'world';
}

Nothing has changed here, but consider the following modified usage of the iterator:

var hw = helloWorldGenerator();
console.log(hw.next()); // prints { value: 'hello', done: false }
setTimeout(function() {
    console.log(hw.next()); // prints { value: 'world', done: false }
}, 1000);

To reiterate (last pun, I promise [unless that counts too]), we have a full one-thousand beautiful milliseconds between yield 'hello' and yield 'world', and yet those lines of code are written in a very synchronous-looking syntax. This is a big deal: generators finally provide us with a pseudo-synchronous syntax that doesn't break run-to-completion semantics, doesn't require transpiling, and doesn't require callbacks.

Now, the previous code example obviously leaves something to be desired, and even though our generator didn't require a callback, we still had to use one to match the setTimeout() method signature. Fortunately, a new breed of generator-based control-flow libraries are springing up to assist with this.

Harnessing the Power of Generators with Suspend

The moment generators landed in V8, I had to get my hands on them. While using yields to manage asynchronous code had some immediate academic interest, it quickly became apparent that some library support was needed to truly realize the expressiveness that generators enable. So, last month I introduced suspend, a generator-based control-flow utility for Node. Here's our "hello-sleep-world" example re-written with suspend:

suspend(function* (resume) {
    console.log('hello');
    yield setTimeout(resume, 1000);
    console.log('world');
})();

This... this is the future we were promised (does that count as a double-pun? Yeah, I lied about that). As can be seen, suspend considerably reduces the boiler-plate code required to use generators, and allows us to move all of our logic into the generator body itself. No need to mess with .next(), .value, .done or the like.

It's also quite useful when dealing with node-style, error-first callbacks:

suspend(function* (resume) {
    var data = yield fs.readFile(__filename, 'utf8', resume);
    console.log(data);
})();

As a final example, suspend can work seamlessly with promise-based code as well:

suspend(function* () {
    var user = yield db.users.findWithPromise({ username: 'jmar777' });
    console.log(user.favoriteColor);
})();

The suspend README does a pretty thorough job of explaining how to use it, so I won't spend more time on it here. Additionally, several other generator-based control-flow libraries have appeared as well, and are definitely worth checking out: co, galaxy, genny, and gen-run.

Wrapping Things Up

I'm sorry, I don't know how these things get so long. Massive props are owed to the V8 team for working so quickly to get generators implemented. ES6 and beyond is continuing to define awesome new features for us, so a thank you is owed to everyone working on the specs as well; it's definitely an exciting time to be a JavaScript developer. And, as always, thanks for reading!

comments powered by Disqus