Cloud computing solutions have become very popular because it is easy to use, very scalable, low in cost, high reliability and performance. Here at VIKI, we rely on Heroku’s architecture to scale our application.
Paying for the perceived demand
While Heroku has gained credibility for one of the best rails cloud hosting solutions, it has a few quirks in its business model that are not so friendly to their customers. For one, it is not really a true pay-per-usage solution.
Dynos and Workers are set manually by the customers according to what the customer “thinks” is enough for their app to support its user traffic. When we set our dynos manually, we pay for them at a specific constant amount according to our perceived demand of what the traffic might be.
In reality, the amount of traffic an app generates varies at different period of times with unpredictable sharp peaks and troughs. Most often, dynos are set at levels that are way above what we think is required by our app. It is not possible to achieve true pay-per-usage because we are paying for “perceived demand” and not the actual number of dynos we are using, which is often more than required. Good business model for Heroku, not really very cost efficient for its customers.
There are a few solutions in the community now that have proclaimed to autoscale dynos like heroscale or autoscale-heroku gem. However, none of them seems to work effectively at this point of time.
At some point, we’ve decided to create our own custom working solution to autoscale Heroku dynos, so that we can only pay for the dynos that our app actually use.
Here’s a technical summary of what we did, note that this works seamlessly for us:
We came up with a ruby shell script that enabled us to pull capacity metrics at 3 minute intervals from New Relic, a Heroku system analytics tool we use to track application performance. The busy_percent metric tells us what percentage of (dynos+workers) the application is actually using based on traffic. Since this is the only useful metric we can get from New Relic, we had to trim the metric so that it makes sense for the dynos.
By getting the current number of dynos and workers set in heroku using
heroku dynos --app <your_app>,
we determined the actual projected percentage (used_dynos) of the dynos being used by proportioning out the “busy_percent” from the dynos to workers ratio. We then set the dynos (should_dynos) to a number that will achieve (used_dynos / should_dynos) = 80%. It means that we try to attain an modest 80% utilization of dynos so that we have enough buffer time for the script to react in case there is a peak. So, we try to only pay for a little more than what we use. Simple, right?
The Pessimism Ratio
There are however a few issues we had to face when creating this script:
1) Because busy_percent is an average, it does not truly reflect the maximum because dyno usage fluctuates. When a surge comes, the maximum “busy_percent” is usually more reflective of maximum dynos required. We solved this by adding a “projected” 20% to the average to achieve a perceived maximum.
2) If the proportion of workers to dynos gets large, the formula does not work as well because the sensitivity drops. For example, if I have 10 workers and 1 dyno. 1 dyno is being utilized and no workers are working. New Relic’s busy_percent will only show 9% because it is 1/11. No matter how that dyno struggles, the script will think that it is ok even after proportioning. We added a “pessimism” ratio so that we can manually set the sensitivity of the projected used_dynos according to what our workers are currently at.
The results of this simple script has been very rewarding. Our peak period is between 3am to 4am. During this period the application uses as much as 36 dynos. Because of this, we had to run on 36 dynos 24/7. Now the app averages on 20 dynos, sometimes dropping to as low as 13. Our bill dropped by about $2000 per month. That’s about 24 grand per year saving. Not bad for 2 days of work.
Here it is. Run it with a 3 minute cron and it should work like a charm if you are using New Relic as well! Drop us a line if you find any problems with it.