How to avoid N+1 Query
“Why is it so slow???” One of the top reasons for slow performance in backend is N+1 query. It is a query to load the main model and N queries for loading the associated model, inefficient query, and a common performance anti-pattern.
It is much faster to issue 1 query which returns 100 results than to issue 100 queries which each return 1 result.
Let’s take a look at this example with rails active record. Let’s say you have two tables on your DB, cars and variants, where an entity in cars can have many variants.
class Car < ActiveRecord::Base
has_many :variants ...
endclass Variant < ActiveRecord::Base
belongs_to :car ...
end
You want to process all the variants for all of the BMW
brand from your cars, you may be doing something like this:
cars = Car.where(brand: 'BMW')
cars.each do |car|
variants = car.variants
# Process....
end> SELECT `cars`.* FROM `cars` WHERE `cars`.`brand` = 'BMW'
> SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` = 1 > SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` = 2
> SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` = 3
> SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` = 4
> SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` = 5
..............
The above program is inefficient and N+1 query. This is still okay if you have a small number of cars but if you have a large number, let’s say 1000 BMW cars (I wish), then you will make 1000 queries to the variants table.
How to fix it
Batching the query
For the above example, you can load all variants in batch instead of querying them for each car.
car_ids = Car.where(brand: 'BMW').pluck(:id)
variants = Variant.where(car_id: car_ids)
....
From the above code, you will always make 2 queries no matter how many cars do you have. It’s better than previous one, it’s no longer n+1 query, but it can still lead to a problem when the size of car_ids is too big. You may want to consider to limit the batch size, for example:
car_ids = Car.where(brand: 'BMW').pluck(:id)
car_ids.in_groups_of(100).each do |grouped_car_ids|
variants = Variant.where(car_id: grouped_car_ids)
....
end
Using include method
ActiveRecord has a method called :includes which loads associated records in advance and limits the number of SQL queries made to the database. Depending on what your query is, :includes
will use either the ActiveRecord method :preload
or :eager_load
.
Preload
Car.includes(:variants).where(brand: 'BMW')> SELECT `cars`.* FROM `cars` WHERE `cars`.`brand` = 'BMW'
> SELECT `variants`.* FROM `variants` WHERE `variants`.`car_id` in (1,2,3,4,5,..)
Yes, it makes the same result with the batching on the above section. Preload always generates two SQLs. It fetches the records for the leading model first (cars) and then load the associated records with the leading model (variants).
If your main model is too large, you can still do batching with include. For example:
Car.includes(:variants).where(brand:'BMW')
.in_batches(of:100) do |grouped_car_ids|
....
end
Eager Load
Car.includes(:variants).where(variants:{type:"vehicle"})
> SELECT ... FROM "cars" LEFT OUTER JOIN "variants" ON "variants"."car_id" = "cars"."id" WHERE (variants.type = "vehicle")
Eager loading loads all associations in a single query using LEFT OUTER JOIN. The include method will be forced to use eager load when the where clause is using an attribute from the associated table.
Highlight N+1 query during development
Code review can help to catch the N+1 issue before it goes to prod but sometimes it’s not straightforward to catch. Thanks to bullet gem, it can help you to track and report inefficient queries.
You can integrate bullet gem with your rails project and enable it in test mode. It will print N+1 query into log/bullet.log
Please check here for more details
Thanks for reading this article, I hope you found it helpful. Here are some great references to read