Mar 12 2013

How to Improve Rails Application Performance on Heroku

There are many options for Ruby on Rails hosting, but Heroku is one that we recommend to clients and use internally for Ruby on Rails applications. Using Heroku? There are steps you can take to get more performance out of your Heroku application.

Measure

To optimize your application, you first need to measure your application’s performance. There are many ways to do this, including Heroku’s New Relic add-on. If you don’t worry about any other metric, you should at least be concerned with the average response time for your application.

New Relic Avg Response Time

You want this number to be as low as possible. You can monitor this number to see how changes to your code affect the responsiveness of your application. You can also use this number to figure out how many dynos your application will need for a given load.

First, we will need to know how many requests a dyno can serve in a second.

requests_per_second_per_dyno = 1000 / average_response_time_in_milliseconds

Knowing the number of requests per second that a dyno can handle will allow you to figure out how many dynos you will need in order to handle a certain level of traffic. Suppose you know from New Relic that your site gets about 20,300 requests per minute and your average response time is 243 ms.

New Relic Throughput

Doing the math:

requests_per_second_per_dyno = 1000 / 243   # ~4.12 requests per second per dyno
requests_per_minute_per_dyno = 4.12 * 60    # 247.2 requests per minute per dyno
dynos_needed = 20,300 / 247.2               # 82.12... dynos

So if you want to handle 20,300 requests per minute, you’re going to need at least 82 dynos.

But let’s say you want to handle twice as many requests in a minute. You wouldn’t be able to solve this problem simply by adding more dynos, because Heroku currently limits you to 100 dynos for a web worker. Instead, you have to reduce the average response time of your application. If you could cut this number down to 123 ms per request from 243 ms per request, you’d have doubled your capacity without adding any more dynos.

So how do you decrease response times? Common methods include:

  • Cache views when possible.
  • Move long-running tasks to a background worker such as Sidekiq, Resque or DelayedJob.
  • Add database indexes for slow queries where possible.

However, at some point it will become very hard to shave milliseconds off this number and you may wonder what else you can do (besides leaving Heroku).

Enter Unicorn

The Unicorn HTTP server can help you increase the number of requests per second that a dyno can handle. Unicorn allows you to have multiple workers processing requests on one dyno.

How many workers can one dyno have? It depends on the memory usage of your application. To figure out how many workers your dyno can handle, you need to know how much memory a single worker uses. New Relic’s dyno graph will show you this number. Keep in mind that your dyno is limited to 512 MB of memory, so to make use of two workers, your average memory usage for a dyno would need to be at or below 250 MB. The lower your application’s memory usage, the more workers a dyno can handle. If your application can handle 600 requests per minute with one Unicorn worker, it can handle 1200 requests per minute with two workers, 1800 with three workers, and so on.

Avg memory usage per dyno

Increasing the number of Unicorn workers rather than the dynos allows you to mitigate some of the pains associated with random routing, because you’re increasing the chance of routing the request to a free worker.

When configuring Unicorn for Heroku, there are a couple of values you want to pay special attention to:

  • worker_processes – This tells Unicorn how many workers you want to run per dyno. Use your average memory usage per dyno to figure out what number is best for you. If this number is 1, consider using something else besides Unicorn.

  • timeout – Heroku times out requests at 30 seconds. This number should be 30 at the maximum. If you don’t want your application waiting 30 seconds to timeout a long-running request, you could set this number even lower.

  • preload_app – Set this to true. If you are using ActiveRecord, you will want to call ActiveRecord::Base.connection.disconnect! in the before_fork block and ActiveRecord::Base.establish_connection in the after_fork block. This will insure that the application is preloaded before it is forked.

  • listenlisten takes a port and a configuration hash. One of the options is backlog. This number defaults to 1024. The documentation states:

“If you are running Unicorn on multiple machines, lowering this number can help your load balancer detect when a machine is overloaded and give requests to a different machine.”

This is exactly the case with Heroku. You will need to experiment with this number to see what works best for your application, but we have gotten good results by setting this number in the single digits. If you are using the default setting for the backlog, one slow request could potentially affect more than 1,000 requests lined up in the worker’s queue.

Conclusion

If you’re looking for ways to improve your application’s performance on Heroku, first make sure you are measuring performance and looking for ways to optimize your application. If your application’s memory footprint allows, consider using Unicorn to double, triple or maybe even quadruple the number of requests your web dynos can handle. If you decide to give Unicorn a try, be sure to dig in to the tuning docs so you are sure your unicorns are tuned to 11.

10 Comments

  1. Conor

    We have found that Puma over Unicorn improve concurrency for increasing the requests a single dyno handles. As stated Unicorn close to doubles your memory footprint per worker. We found that we can run 8-12 threads as ideal on Puma (for an app that has a base 120 mega footprint and average responses time of 340ms). With Puma we can handle ~40 requests per second. More Puma threads increases thrashing around the 250 requests a second mark.

    Another useful testing tool is Blitz a free add-on at Heroku that lets you simply rush your server with tons of requests. The basic command for rushing is:
    -p 1-250:60 -H 'Accept-Encoding: gzip, deflate' http://webserver.com/awesomedetails?q=1
    It also integrates with New Relic to complete the testing cycle (however New Relic API access is not allowed on the free New Relic add-on).

    As a side note listen:1 is the ideal value for Heroku and Unicorn, this way passing your requests to other dynos when busy or holding them on the queue so that it shows up in the New Relic logs and queued, lets you tell which requests are waiting around instead of in your Unicorn queue were you will not notice the difference between work time and wait time.

    • Seivan

      I second that, Puma instead of Unicorn.

    • Onno Faber

      Can you explain something about the listen:1 ?
      I don’t see this in any other examples.

      We are running our instances on Unicorn/Heroku, Puma seems definetly an interesting option. Now that we can see request queue time in New Relic, seems that this usually takes more time than anything else.

      • Stafford Brooke

        I believe Conor is saying to set the backlog option for listen to 1. This is number of request a Unicorn worker will queue up before refusing any more request. When the routing mesh tries to send a request to a worker that has a full backlog, that worker tells the routing mesh it is overloaded. When this happens routing mesh will retry the request on another worker.

        You can read more about this here: http://unicorn.bogomips.org/Unicorn/Configurator.html#method-i-listen

        Since the Heroku routing mesh is handling load balancing you want to set this to a really low number. I recommend trying different single digit values and measuring the results to find the right backlog setting for your application.

  2. Stafford Brooke

    Thanks for your comment Conor. The ideal server and configuration values will vary from application to application, but these are definitely great suggestions!

  3. Tom Fakes

    With the availability of Ruby 2.0 on Heroku, the new GC mechanism should allow more Unicorn workers to be configured per dyno, as more memory should be shared between workers.

    I’ve had problems running apps with Puma in the past, but I haven’t tried this option for a while. Maybe it’s time to try it again.

    A few months ago, I wrote a blog post about the 3 things you need to configure differently to run a Rails app on Heroku with the best performance: http://blog.craz8.com/articles/2012/12/7/heroku-config-for-performace-rails-apps

  4. Michel Pigassou

    What about threadsafe for Puma? Didn’t you have problems with that? Seems a lot of work (for a big app) to ensure the code is threadsafe.

  5. Scott Shea

    For anyone using Unicorn you may want to check in to Unicorn Worker Killer. It kills off Unicorn workers when they have received n number of requests and/or memory has exceeded a specified threshold. This is increasingly important on Heroku with the Double Dyno option and Unicorn.

    Here is a shameless plug to my blog post regarding UWK: http://mycatwantstolearnrails.blogspot.com/2013/04/heroku-unicorn-request-backlog.html

Leave a Comment

Join the discussion. Do not worry, your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>