Regular Expressions in MySQL
Did you know MySQL supports using regular expressions in
SELECT statements? I’m surprised at the number of developers who don’t, despite using SQL and regexes on a daily basis. That’s not to say that putting a regex into your SQL should be a daily occurrence. In fact, it can cause more problems than it solves, but it’s a handy tool to have in your belt under certain circumstances.
Regular expressions in MySQL are invoked with the
REGEXP keyword, aliased to
RLIKE. The most basic usage is a hardcoded regular expression in the right hand side of a conditional clause, e.g.:
SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@';
This SQL would grab every user whose email address begins with ‘a’, ‘b’, or ‘c’ and has a number as the final character of its local portion.
Something More Advanced
The regex used with RLIKE does not need to be hardcoded into the SQL statement, and can in fact be a column in the table being queried. In a recent project, we were tasked with creating an interface for managing redirect rules à la mod_rewrite. We were able to do the entire match in the database, using SQL like this (albeit with a few more joins, groups and orders):
SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern;
In this case, ‘/news’ is the incoming request path and
pattern is the column that stores the regular expression. In our benchmarks, we found this approach to be much faster than doing the regular expression matching in Ruby, mostly because of the lack of ActiveRecord overhead.
Using regular expressions in your SQL has the potential to be slow. These queries can’t use indexes, so a full table scan is required. If you can get away with using
LIKE, which has some regex-like functionality, you should. As always: benchmark, benchmark, benchmark.
Additionally, MySQL supports POSIX regular expressions, not PCRE like Ruby. There are things (like negative lookaheads) that you simply can’t do, though you probably ought not to be doing them in your SQL anyway.
Support for regular expressions in PostgreSQL is similar to that of MySQL, though the syntax is different (e.g.
email ~ '^a' instead of
email RLIKE '^a'). What’s more, Postgres contains some useful functions for working with regular expressions, like
regexp_replace. See the documentation for more information.
In certain circumstances, regular expressions in SQL are a handy technique that can lead to faster, cleaner code. Don't use
LIKE will suffice and be sure to benchmark your queries with datasets similar to the ones you’ll be facing in production.