Materialize Views in the Rails Application Layer
Tony Pitale, Former Viget
In my previous post I talked about how we used a VIEW to act as an auto-updating join between our Group and Entry tables on FeedStitch. After the changes were made there was a concern that as the view grew there would be a significant slowdown in selecting rows.
Identifying our Problem
In general, the results of running EXPLAIN on the VIEW we have set up is that, in order to find a row, the VIEW must be sequentially scanned. This is telling in two ways. First, sequential scans are pretty much the single least efficient way to find a row (unless you're lucky and it's the first row). And second, without indexes, this is the only way to find a row. The primary drawback in using views is that they lack the ability to have indexes, as the data isn't really persisted as we see it.
The answer to our index problem may be solved with something called "View Materialization". Simply, the view, which contains a complex SELECT would be stored in a table which can be indexed as needed. Sounds simple, right?
The Tricky Parts
Unfortunately, as with any caching mechanism, this one must be expired and updated as the data changes, to keep it fresh. To do this at the database level requires functions and triggers, often a complex task.
The Simplest Thing That Works
Fortunately, in Rails, or with any ORM, we have a sensible place to put this sort of logic. And, with a little SQL written, we can refresh our materialized view with a semblance of ease. The two benefits to this are simplicity and ease of maintenance.
It takes a little bit of planning, but the first thing to do is to understand at what points the data in the VIEW will change. In our case, when we run a rolling update to fetch new feeds we can invalidate and re-materialize the entire thing. This is not very efficient in and of itself. However, it is simple (call destroy_all, only one SELECT) and does what it needs to do. Given that this only happens every few hours or so, this is a feasible option.
An even better solution would be to invalidate (delete) a portion of the materialized view and only SELECT a changed subset. This is better because, at least in our case, we never truly need to re-materialize the entire VIEW. In fact, we never really need to delete any records besides the subset we're working with.
A Little Setup
ALTER VIEW group_entries RENAME TO group_entries_view; -- Or recreate the view in MySQL CREATE TABLE group_entries AS SELECT * FROM group_entries_view; ALTER TABLE group_entries ADD PRIMARY KEY (group_id, entry_id);
Updating the Table
We currently have a VIEW which selects a group_id and an entry_id. FeedStitch's updater fetches new entries for a group. Once done, the VIEW will hold an updated set of entry_ids for the group_id we are working with. This must then be materialized into a table. First we delete the rows give the group_id then we INSERT new rows from our VIEW:
GroupEntry.connection.execute <<-EOSQL BEGIN; DELETE FROM group_entries WHERE group_id = 1; INSERT INTO group_entries SELECT * FROM group_entries_view WHERE group_id = 1; COMMIT; EOSQL
Though not always applicable to performance issues, materialized views can be useful way to cache frequently selected data. This simple method by which to materialize a view and update from within our application code gives us this functionality in a more accessible, flexible way.