Working around very complex SQL joins with ActiveRecord

We use Rails at work, and we recently had a problem with an underperforming query. The problem was that the table from which the models were pulled needed to be the absolute last table in the query, and ActiveRecord really wants the table to be the first in the list. No combination of joins(), includes(), where(), or anything else was getting us what we wanted.

The schema below has been changed to protect the innocent.

Let’s say I have a table of Products, each produced by a Company, with a Person who is the CEO, and a Pet that they got from the local pound. Presume that there are millions of products across tens of thousands of companies, millions of people who could be CEO, and hundreds of thousands of pets.

What if I want to find the names of the dogs owned by CEOs of companies that produce televisions?

Naively, that might look something like:

Pet.select(:name).joins(:owner, :company, :product).where(type: :dog, product: {type: :television}).uniq

The query that runs is going to look like:

SELECT DISTINCT pet.name
FROM pet
  INNER JOIN people ON (pet.owner = people.id)
  INNER JOIN company ON (people.id = company.ceo)
  INNER JOIN product ON (company.id = product.company) AND (product.type = 'television')
WHERE (pet.type = 'dog')

Problem being, if you’re running a database that doesn’t correctly optimize this query, this join is awful: it’s going to take every pet and join its owner, then bring in every company for those owners, then bring in every product for every company, and only at the very end will it bother to go back and filter down pets that are docs and products that are televisions.

But if you ran the query in reverse, you’re likely to get significantly better results based on the cardinality of the data: there will be relatively few television products (let’s say thousands), which will be produced by only a few dozen companies and thus with a few dozen CEOs, and by the time you get their dogs you’ve never had more than a few thousand records.

So how do you work around this?

I want to get to this:

Product.where(type: :television).ceo_pets.where(type: :dog)
Product.find(1234).ceo_pets.where(type: dog)

John Lynch has a good blog post that gives the inspiration for our solution:

class Product < ActiveRecord::Base
  def self.ceo_pets
    existing_scope = self.connection.unprepared_statement { self.reorder(nil).select('company').uniq.to_sql }
    Pet.joins( <<-MEGAQUERY
        INNER JOIN (#{existing_scope}) AS product ON (1 = 1)
        INNER JOIN company ON (product.company = company.id)
        INNER JOIN people ON (company.ceo = people.id)
          AND (people.id = pet.owner)
      MEGAQUERY
    ).uniq
  end
  def ceo_pets
    Product.where(id: self.id).ceo_pets
  end

This gives you the best of both worlds: you can use it in a class-level (Arel) context, or in an instance-level context.

To explain some magic:

  • The unprepared_statement line basically just takes the Arel as it is so far, anything to the left of .ceo_pets in the chain, and gets the SQL statement that would fetch those records. Or, more specifically, the company IDs for those products.
  • We then take that SQL and inject it as a subquery in our big join. The (1 = 1) part is meant to work around the fact that ActiveRecord assumes the table for the class is the first table in the join. In this case, the class is Pet, so it transparently makes pet the first table in the join. But we can’t directly relate pet to product at this point in the join, so we use (1 = 1) to fake a cross join and continue on. We could, if we wanted, refactor this a bit to perform the same to_sql trick more than once to use more subqueries and/or really get the order we want, but this is good enough to show the point.
  • The dangling AND at the end finally ties our pets to their owners.
  • And, of course, we uniq so ActiveRecord will add a DISTINCT for pets.
  • The instance version works around the fact that we’re working with non-magical methods instead of magical scopes and relationships, so if we have an instance we need to convert that back to an Arel if we want to do anything interesting with it.

I know some pedant is going to come around and point out that by doing the cross join above we may not have gained anything, but that’s really just an artifact of the very contrived example. The real-world version of this refactoring took a query that was taking 300+ seconds to return and got it returning in 3 seconds at the worst case.