While working on some recent projects, I had the need to run some basic dashboard analytics against moderate volumes of machine generated data. Already having some experience with MongoDB (and being quite the fan of it), I decided to do some research on real-time analytics with MongoDB.
A quick search turns up dozens of articles and presentations on how this can be achieved. However, after reading through quite a few of them, it became clear that most of the existing how-tos on the subject are based on pre-Aggregation Framework techniques, relying largely on MongoDB's atomic upsert, $inc, and $set operations.
These techniques are still largely useful, and are powering several successful applications. Unfortunately, however, they tend to be lacking when it comes to the ad hoc side of things - specifically, once multiple values from distinct events have been aggregated into a single value, the ability to slice and dice the results becomes limited. Additionally, these techniques typically require pre-aggregating at multiple levels to support pre-determined aggregation durations, rely on MapReduce, or delegate some re-reducing labor to the application itself.
Given that we now have the Aggregation Framework available to us (since MongoDB 2.1), I decided to run some tests to see how feasible it is to achieve real-time, interactive, ad hoc, dashboard analytics with MongoDB.
Note that this article is intended to be platform agnostic, so all tests are implemented as MongoDB shell scripts.