LogoDTreeLabs

Async queries in Rails using load_async

Gaurav SinghBy Gaurav Singh in RailsRails7Async on February 22, 2022

Rails introduces the load_async feature for executing long-running queries in the background. When we have a controller action that needs to execute two or more unrelated queries, we can now do it concurrently through Relation#load_async. Let's consider we are building a dashboard for the blogs app where we display aggregated activity on each post:

  • All the likes on the post(100ms)
  • All the views on a post(200ms)
  • Average time spent by the users on each post (400ms)

It would take around 700ms to run the above SQL queries sequentially. As result of one query is not an input of any other query, these queries are independent of each other. Thus, we can parallelize these queries. It will take around 400ms (maximum time required for any of the query) to run these queries. We can run these queries in parallel using a thread pool.

A thread pool operates on a fixed number of threads. At any point, at most num_threads will be active processing tasks. When all threads are busy new tasks #post to the thread pool are enqueued until a thread becomes available. When a thread crashes for any reason the thread will immediately be removed from the pool and replaced.

We can use concurrent-ruby gem to implement a thread pool in a ruby app. Concurrent::FixedThreadPool provides the interface to implement thread pools in ruby applications.

Let's look at the code snippet below:

  require 'concurrent'
  pool = Concurrent::FixedThreadPool.new(5) # 5 threads
  pool.post do
    # execute the SQL query in the background thread
    post.likes.count
  end
  pool.post do
    # execute the SQL query in the background thread
    post.views.count
  end
  pool.post do
    # execute the SQL query in the background thread
    UserViewTime.group(:post).sum
  end

We are creating a thread pool of 5 threads, posting ActiveRecord operations inside the thread pool to run them in parallel. This solution should also take ~400ms to execute.

It might look like an acceptable solution, there are two issues with this approach:

  • We'll have to manage the lifecycle of the thread pool by ourselves.
  • we'll lose the context of the request in the thread pool since Rails relies on the thread's local state

Rails introduced the load_async to overcome these issues. We can simply append the load_async to the ActiveRecord Relation object and the SQL query will be executed in the background thread.

  likes = post.likes.load_async
  views = post.views.load_async
  viewer_time = UserViewTime.group(:post).load_async

In the above snippet post.likes returns an ActiveRecord relation object. load_async will add the execution of the post.likes to the thread pool. Its result will be saved in the rails cache once the query is executed. When we fetch the value of the post.likes then results will be returned from the cache.

Let's first try to understand, how load_async works behind the scene before moving to the benchmarking section. Concurrent::ThreadPoolExecutor tries to execute the query in the background thread when an ActiveRecord relation is appended with the load_async. If the background thread is completed before the values are fetched in the main thread then results are fetched from the ThreadPool. Otherwise, main thread waits until the query is executed in the background thread.

We've tried to benchmark the difference between the usual ActiveRecord queries and load_async queries. We've added complex_query scope in the rails model, which will delay the SQL output by 2 secs by adding sleep. This will ensure that the SQL query is taking an adequate amount of time while executing.

class User < ApplicationRecord
  scope :complex_query, -> { select('*, sleep(2)') }
end

Let's look at our benchmarking script. I've added 3 secs sleep in the rails script so that the main thread does not finish before the background threads. sleep represents the long running piece of code like API Requests or some other I/O operation.

require 'benchmark'

def sync_action
  users_1 = User.where(id: 10001).complex_query
  users_2 = EncryptedUser.where(id: 10001).complex_query
  sleep(3) # This  represents a long running piece of code which takes more time than the active record queries
  users_1.first
  users_2.first
end

def async_action
  users_1 = User.where(id: 10001).complex_query.load_async
  users_2 = EncryptedUser.where(id: 10001).complex_query.load_async
  sleep(3) # This  represents a long running process which takes more time than the active record queries
  users_1.first
  users_2.first
end

ActiveRecord::Base.logger = nil
puts "== Benchmarking =="
Benchmark.bmbm do |x|
  x.report("sync") { sync_action }
  x.report("async") { async_action }
end

== Benchmarking ==
Rehearsal -----------------------------------------
sync    0.003138   0.003276   0.006414 (  7.194043)
async   0.003342   0.002265   0.005607 (  3.004943)
-------------------------------- total: 0.012021sec

            user     system      total        real
sync    0.003483   0.000175   0.003658 (  7.146778)
async   0.001630   0.000219   0.001849 (  3.002004)

As we can see async_action is completed in ~3 secs since it's running queries in background, while sync_action is running all the tasks sequentially, thus taking more than 7 secs(3 secs(sleep) + 2 secs(User query) + 2secs(EncryptedUser query)).

Rails log indicates how the load_async executes these queries:

def async_action_with_delayed_fetch
  users_1 = User.where(id: 10001).complex_query.load_async
  users_2 = EncryptedUser.where(id: 10001).complex_query.load_async
  users_1.first
  users_2.first
end
 => :async_action_with_delayed_fetch
3.0.3 :084 > async_action_with_delayed_fetch
  ASYNC User Load (2076.0ms) (db time 2076.2ms)  SELECT *, sleep(2) FROM `users` WHERE `users`.`id` = 10001
  ASYNC EncryptedUser Load (0.0ms) (db time 2074.2ms)  SELECT *, sleep(2) FROM `encrypted_users` WHERE `encrypted_users`.`id` = 10001
 =>

When queries have been executed by the background thread, Rails log indicates how much time query took to execute in the DB and how much time have been taken by the main thread. In above example: ASYNC User Load took ~two secs in the DB while ASYNC EncryptedUser Load took zero secs in the main thread and ~two secs in the DB. Which means both of the queries run parallel to each other.

So, load_async has the advantage in terms of performance impact. Now, the question comes can we use it everywhere? The answer to this question is NO!. Let's consider following points:

  • load_async will not have any performance impact if main threads fetches the record before execution in the DB.
  • Multiple open connection can cause DB performance issue if DB is already under a heavy load.

We'll have to be a little careful while using the load_async since this can cause performance degradation. Let's consider following benchmarking script where we are not making the sleep call in the main thread:

def sync_action
  users_1 = User.where(id: 10001).complex_query.first
  users_2 = EncryptedUser.where(id: 10001).complex_query.first
end

def async_action_with_immediate_fetch
  users_1 = User.where(id: 10001).complex_query.load_async.first
  users_2 = EncryptedUser.where(id: 10001).complex_query.load_async.first
end

def async_action_with_delayed_fetch
  users_1 = User.where(id: 10001).complex_query.load_async
  users_2 = EncryptedUser.where(id: 10001).complex_query.load_async
  users_1.first
  users_2.first
end

old_logger = ActiveRecord::Base.logger
ActiveRecord::Base.logger = nil

puts "== Benchmarking =="
Benchmark.bmbm do |x|
  x.report("sync") { sync_action }
  x.report("async_action_with_immediate_fetch") { async_action_with_immediate_fetch }
  x.report("async_action_with_delayed_fetch") { async_action_with_delayed_fetch }
end

Rehearsal ---------------------------------------------------------------------
sync                                0.005048   0.000184   0.005232 (  4.149368)
async_action_with_immediate_fetch   0.008566   0.000792   0.009358 (  4.150932)
async_action_with_delayed_fetch     0.006235   0.001050   0.007285 (  2.083599)
------------------------------------------------------------ total: 0.021875sec

                                        user     system      total        real
sync                                0.005045   0.000190   0.005235 (  4.150031)
async_action_with_immediate_fetch   0.005550   0.000296   0.005846 (  4.150492)
async_action_with_delayed_fetch     0.002709   0.000249   0.002958 (  2.072396)

As we can see here, we are fetching the value of the ActiveRecord relation before the background thread is finished then there is no visible performance advantage of the load_async.

Now, the question comes under what circumstances should we use the load_async? The general rule of thumb of using the load_async should be when the main thread gives a considerable amount of time to the background thread to execute. Let's consider the following approaches for using the load_async:

  • Suppose we have 3 unrelated queries
    • Q1 is taking 100 ms
    • Q2 is taking 200 ms
    • Q3 is taking 300 ms
      Q1.load_async
      Q2.load_async
      Q3.first
      
      If we keep Q1 and Q2 in the background and Q3 in the main thread, all three queries will be executed in parallel.
  • Suppose we have a Rails action where we are fetching data from an external service and running a complex query. In this case, we can use load_async to run the query in the background thread. While I/O operation will happen in the main thread. This will ensure that both operations are executed in parallel.

    Q.load_async # Executes query in the background thread
    external_api.call # Fetches data in the main thread
    

    In this case, when the main thread waits for the API call to finish executing, the SQL query runs in parallel.

Feedback and suggestions are welcome at Twitter