Announcing storm-docker - Making distributed, multiple server Storm setups easy, in Docker

Docker all the things!

At Viki, many decisions are driven by data from the user community. From high level business decisions such as figuring out the best shows to get, to improving our user experience by carrying out A/B tests not only for the UI but generalized to things such as recommending you the next video you should watch, data plays an irreplaceable and critical role.

However, the premise is that we must receive the data. That requires all parts of our infrastructure to be functional. To ensure that, we make use of a variety of services for monitoring the health of our servers and the processes running on them. For many situations, a simple yes/no answer on whether some service is running is sufficient. If something isn't running properly, our engineering team can step in to fix the problem.

What if a simple up/down answer isn't what we want? What if we want almost immediate, real-time information on the traffic our multitude of platforms are receiving? What if we want to correlate spikes and drops in traffic to very specific deployments and changes we made to the code, and have a way to look at them when it happens? The possibilities are endless.

At Viki, we make use of Apache Storm for giving us real-time updates and alerts. This information ties in with our custom dashboard and reporting tools, allowing us to quickly figure out if something is going wrong on a certain platform, and narrow down the possible causes of it.

Introducing storm-docker

Today, I am proud to announce the release of storm-docker, a Docker setup which makes distributed, multiple server Storm setups easy (I hope), or at least much less painful. This project was initiated to improve the fault tolerance of our Storm cluster, and to offer a solution for future scalability requirements.

Once again, storm-docker's github repository:

Most of storm-docker's core scripts are written in Python, and effort has been made to write good inline documentation inside the code, so that someone new to the codebase can jump in and get up to speed quickly. In fact you are highly encouraged to do so, seeing how we built storm-docker from hacking on another similar project, wurstmeister/storm-docker. We see storm-docker not as THE solution, but as a starting point on which you can build on for your own Storm cluster.

What's special?

If we perform a search of the word storm on the Docker Registry, we can see several pages of results. What makes our storm-docker project different from other similar offerings?

For one, one of the aims of our storm-docker is to make multiple server Storm setups simpler. We started our storm-docker project by hacking on wurstmeister/storm-docker, which does not support multiple server Storm topologies (in terms of Storm Supervisor and Zookeeper). This is not to say that wurstmeister's storm-docker project is bad; it just didn't meet our requirements of fault tolerance and scalability, by virtue of which is not possible running everything on a single server.

Running every component of Storm on a single machine is trivial. It takes quite a bit of effort to run multiple Storm components over multiple machines; it takes even more effort to do that inside Docker. We have invested a non-trivial amount of time and energy (mostly staring at the Storm UI, SSH'ing into machines to stare at logs and configuration files, reading a lot of documentation, googling a lot, all the while wondering what went wrong) to making storm-docker work for us. Believe me, you may not want to do the same, especially because we have already done the hard work for you.

storm-docker is, as its name suggests, a project based on Docker. This allows you to easily deploy the same Storm setup on a different cluster as long as you edit some details (such as IP addresses) in the configuration files.

To state a final point, I personally think that the code is pretty well documented. Ok, not all parts of it. But definitely the parts that matter. And there's a github pages for storm-docker here: which provides more details not stated in the README.

Final Notes

Storm has been proved to be a very powerful component of our data infrastructure, and storm-docker simplifies Storm setup for us tremendously. We hope it will do the same for you.

comments powered by Disqus