Composable SQL Queries in Rails Using Arel

If you've spent any time working with Rails, you've been using Arel. Arel is a SQL abstraction that ActiveRecord uses to build SQL queries. Arel wraps each component of the SQL query language with Ruby objects and provides an expressive DSL for composing SQL queries. When using Arel, you're mainly interacting with tables (Arel::Table) and nodes (Arel::Nodes::Node subclasses).

In a nutshell, Arel gives you the tools to build SQL queries in Ruby-land. Sounds great, right? Mostly! Documentation for Arel is pretty sparse, so it takes some digging around to figure out what all you can do with it. In this post, I'd like to save you some of that time and trouble! come

How ActiveRecord Uses Arel

The query interface for ActiveRecord is built on top of Arel. Anytime you call things like Model.find_by, Model.where, Model.joins, ActiveRecord is using Arel to construct the SQL query string. Scope chains are basically chained Arel nodes. At any point in the scope chain, you can check the SQL query string that Arel will build by tossing a .to_sql at the end.

Advantages of Arel

The core advantage of Arel is that working in a Ruby abstraction of SQL allows developers to tap into language features of Ruby -- method chaining, inheritance, metaprogramming -- without sacrificing any aspect of the SQL language. If you can express something in SQL, you can do it with Arel.

With Arel, we're able to abstract segments of a SQL query behind expressive, reusable Ruby methods. This makes Arel an excellent tool for situations where you're working with long, complex SQL queries, which can be difficult to read and reason about.

Demonstrating Arel

I could keep talking about why Arel is great, but I'd rather just show you! The remainder of this post will lay out examples to further demonstrate the concepts and advantages discussed.

A Simple Example

Using ActiveRecord, we frequently write query statements like:

Employee.where(first_name: 'Ryan', last_name: 'Stenberg')

If we were to drop down a level and generate the same SQL query from scratch using Arel, here's what it would look like:

table = Employee.arel_table
table
  .project(table[Arel.star])
  .where(table[:first_name].eq('Ryan').and(table[:last_name].eq('Stenberg')))
  .to_sql

Using Arel inside a Ruby object might look like this:

class EmployeeFullNameSearch < Struct.new(:first, :last)
  def sql
    arel_table
      .project(arel_table[Arel.star])
      .where(where_clause(first, last))
      .to_sql
  end
  private
  def arel_table
    @arel_table ||= Employee.arel_table
  end
  def where_clause
    arel_table[:first_name].eq(first).and arel_table[:last_name].eq(last)
  end
end

Arel, when laid out in a similar fashion, is easier to read through and reason about. This becomes exponentially more useful as the complexity of the SQL queries increases.

While Arel seems to make a lot of sense and is pretty intuitive, the project(arel_table[Arel.star]) portion of the sql method might be confusing. Calling project is how you tell Arel what to return in the SELECT statement. The Arel.star bit generates a *, so putting those two together will result in a SQL query fragment like SELECT * FROM employees. It knows the table from arel_table and it knows to select all the things with project(arel_table[Arel.star]).

The above example demonstrated how to build the full SQL query string via the sql method, which you'd have to use with a .find_by_sql:

Employee.find_by_sql(EmployeeFullNameSearch.new('Ryan', 'Stenberg').sql)

This isn't really desirable since .find_by_sql returns an array, so we wouldn't be able to scope chain off of it.

In practice, we'd make the sql method focus on the WHERE conditions and let ActiveRecord do the rest:

class EmployeeFullNameSearch < Struct.new(:first, :last)
  def sql
    arel_table[:first_name].eq(first).and(arel_table[:last_name].eq(last)).to_sql
  end
  def results
    Employee.where(full_name_matches('Ryan', 'Stenberg'))
  end
  # ...
end

..and in use:

EmployeeFullNameSearch.new('Ryan', 'Stenberg').results

With the above code, we'd get an ActiveRecord::Relation back. By default, Employee.where will select all columns and we get something we can scope chain.

While this example is very simple and doesn't necessarily warrant the use of Arel, it illustrates the function and purpose of Arel.

Dynamic SQL Generation

You can dynamically build SQL statements through Arel node chains using Enumerable methods like reduce:

class Conference < ActiveRecord::Base
  DEADLINES = %i(
    speaker_submission_deadline
    early_bird_registration_deadline
    standard_registration_deadline
  )
  scope :with_deadlines_on, -> (date, deadlines = DEADLINES) { where(deadlines_on(date, deadlines)) }
  def self.deadlines_on(date, deadlines = DEADLINES)
    Array(deadlines).reduce(nil) do |query, deadline|
      deadline_check = arel_table[deadline].eq(date)
      if query.nil?
        deadline_check
      else
        query.or(deadline_check)
      end
    end.to_sql
  end
  # Given the default deadlines, we're basically doing this:
  # arel_table[:speaker_submission_deadline].eq(date)
  #   .or(arel_table[:early_bird_registration_deadline].eq(date))
  #   .or(arel_table[:standard_registration_deadline].eq(date))
  #   .to_sql
end

In this example, using Arel frees us from having to maintain the with_deadlines_on scope if we end up adding additional deadlines. Since we're dynamically building the Arel query, we're able to make deadlines an optional parameter so our scope can be used to find a given subset of deadlines as well:

# With given deadlines:
Conference.with_deadlines_on(
  1.month.from_now,
  %i(early_bird_registration_deadline standard_registration_deadline)
)
# Without (matching any deadline):
Conference.with_deadlines_on(1.month.from_now)

Complex Example Demonstrating Composability, Re-usability, and Expressiveness

In this example, let's say we're working on an auditing platform for a client that tracks employment records between companies and employees. The application is built in Rails and uses MySQL as its database.

class Employee < ActiveRecord::Base
  has_many :employments
  has_many :companies, through: :employments
end
class Company < ActiveRecord::Base
  has_many :employments
  has_many :employees, through: :employments
end
class Employment < ActiveRecord::Base
  belongs_to :employee
  belongs_to :company
end

The Problem

Our app needs to allow its users to find employees who've worked at a specific set of companies within a given time period.

There's trickiness anytime you need to search against a record's associated things -- in this case, that's an employee's companies. That information isn't captured in any single field. It's described through database relationships. This complexity results in a complex SQL query:

SELECT employees.*
FROM employees
INNER JOIN (
  SELECT employee_id, GROUP_CONCAT(
        DISTINCT company_id
        ORDER BY company_id
    ) AS search_values
  FROM employments
  WHERE employments.start_date <= ? AND employments.end_date >= ?
  GROUP BY employee_id
) AS associated
ON associated.employee_id = employees.id
WHERE associated.search_values REGEXP '(2,){1}([[:digit:]]+,)*(4,|4$){1}'

oh-no

Stay with me! Basically, we're attaching a serialized list of company IDs to each employee. For each employee, the list contains IDs for companies the employee has worked at within the given time frame.

It's worth noting that MySQL doesn't have array-types like Postgres, which is why we need to utilize serialized lists (created via the MySQLGROUP_CONCAT() function).

At the end of our SQL, we're matching that serialized list of company IDs against a regular expression that catches -- in this case -- employees having worked at companies 2 and 4 within the given time frame.

To implement this, we'd need to build that regex from user input and then stick the whole SQL string inside a scope or class method.. or we could refactor with Arel!

The Solution

Arel lets us do some cool stuff. Let's build a search object that builds the same SQL query using Arel:

class EmploymentHistorySearch
  attr_reader :company_ids, :start_date, :end_date
  def initialize(company_ids, options = {})
    @company_ids = Array(company_ids).uniq
    @start_date  = options[:start_date]
    @end_date    = options[:end_date]
  end
  def results
    Employee.where(sql)
  end
  def sql
    query.to_sql
  end
  private
  def query
    employees[:id].in(matching_result_set)
  end
  def matching_result_set
    employees
      .project(employees[:id])
      .join(join_clause)
      .on(join_condition)
      .where(where_condition)
  end
  def employees
    @employees ||= Employee.arel_table
  end
  def join_clause
    Arel.sql("(#{associated_values.to_sql})").as(derived_table)
  end
  # INNER JOIN (#{associated_values.to_sql}) AS associated
  def associated_values
    employments
      .project(foreign_key, grouped_values)
      .where(within_time_period)
      .group(foreign_key)
  end
  # SELECT employee_id, #{grouped_values}
  # FROM employments
  # WHERE employments.start_date <= ? AND employments.end_date >= ?
  # GROUP BY employee_id
  def employments
    @employments ||= Employment.arel_table
  end
  def foreign_key
    'employee_id'
  end
  def grouped_values
    Arel.sql(values_list).as(values_alias)
  end
  def values_list
    %Q(
      GROUP_CONCAT(
        DISTINCT #{search_column}
        ORDER BY #{search_column}
      )
    )
  end
  def search_column
    'company_id'
  end
  def values_alias
    Arel.sql('search_values')
  end
  def within_time_period
    employments[:start_date].lteq(end_date).and(employments[:end_date].gteq(start_date))
  end
  def join_condition
    fk_with_table.eq employees[:id]
  end
  # ON associated.employee_id = employees.id
  def fk_with_table
    Arel.sql [
      derived_table_name,
      foreign_key
    ].join('.')
  end
  def derived_table_name
    'associated'
  end
  def where_condition
    Arel.sql [
      derived_values,
      values
    ].join(' REGEXP ')
  end
  def derived_values
    Arel.sql [
      derived_table_name,
      values_alias
    ].join('.')
  end
  def derived_table
    Arel.sql derived_table_name
  end
  def values
    @values ||= main_class.sanitize input.sort.map { |value| value_regex(value) }.join(filler_regex)
  end
  # Given [4,2], #values would return '(2,|2$){1}([[:digit:]]+,|[[:digit:]]+$)*(4,|4$){1}'
  def regex_partial(value, frequency = '*')
    "(#{value},|#{value}$)#{frequency}"
  end
  def filler_regex
    regex_partial '[[:digit:]]+'
  end
  def value_regex(value)
    regex_partial value, '{1}'
  end
end

Using Arel, we've created an EmploymentHistorySearch object with a simple API (#sql and #results) that abstracts each component of our query into descriptive, concise, and re-usable methods.

Here's our search object in action:

# Standalone
EmploymentHistorySearch.new(company_ids, 2.months.ago, 1.month.ago).results
# Class Method
def self.with_employment_history(ids, start_date, end_date)
  where(EmploymentHistorySearch.new(ids, start_date, end_date).sql)
end

mmm

This approach really shines when there are a number of complex, mostly similar SQL queries where we can utilize inheritance to override only what's necessary. If, later on, we had to support an employment history search that returned all employees who had terminated employments during a given range and in a particular manner -- whether they quit or were fired, we'd be able to re-use almost all of our previous search object:

class EndedEmploymentHistorySearch < EmploymentHistorySearch
  attr_reader :termination_type
  def initialize(company_ids, options = {})
    @termination_type = options[:termination_type]
    super
  end
  def associated_values
    employments
      .project(foreign_key, grouped_values)
      .where(within_time_period.and(matches_termination_type))
      .group(foreign_key)
  end
  def within_time_period
    employments[:end_date].gteq(start_date).and(employments[:end_date].lteq(end_date))
  end
  def matches_termination_type
    employments[:termination_type].eq(termination_type)
  end
end

mindblown

In Closing

Ruby tends to be pretty expressive and readable -- to which Arel is no exception. Arel is smart -- taking care of SQL subtleties like precedence and correct ordering of SQL clauses for you. For example, when generating a SQL query containing an ORDER BY and a WHERE clause, Arel ensures the WHERE clause comes first.

Simply put, Arel allows Rails developers to focus on the domain in their language. So, next time you find yourself dealing with complex SQL queries, look to Arel for great justice!

Ryan is a developer in Viget's Falls Church, VA, HQ, where he believes in being a liason for both the technical and non-technical. He builds elegant tools for clients such as Bozzuto and Millitello Capital—as well as internal tools that we use at Viget every day.

More posts by Ryan