10 SQL Tricks That I Like

Noah Over, Application Developer

Article Categories: #Code, #Back-end Engineering, #Data & Analytics, #Performance

Posted on

A list of 10 SQL tricks I have picked up over my time working with databases that could help improve your queries

At my previous job, I wrote raw SQL a lot for our massive internal database. Over time, I picked up a few little tricks that allowed me to condense my queries or improve their performance. Recently, I've been getting the itch to revisit some SQL stuff and I'm going to use this blog post as an excuse to write some raw SQL without the help of Ecto or ActiveRecord or anything like that. Hopefully, this exercise of me goofing around will help you learn something. Guess we'll see. While none of these tricks are particularly complicated, developers less familiar with SQL may not be familiar with them.

1. USING #

SELECT friend_id, e.name AS entree, d.name AS dessert
FROM entrees e
  INNER JOIN desserts d USING (friend_id);

USING is helpful for simplifying your join when you are joining two tables on columns with the same name. In the above example, you have two tables which are lists of entrees and desserts and the ID of the friend who knows how to prepare them. If you have offers from multiple friends to come over for dinner, you want to know what possible combinations of entree and dessert each friend can make for you before you decide which offer to accept, so you run this query. USING makes it so you do not have to write out ON e.friend_id = d.friend_id, but what I find particularly helpful is that you no longer have to qualify which friend_id you are referring to. This prevents the ever-frustrating error ERROR: column reference "friend_id" is ambiguous when you forget to put e. or d. in front of friend_id.

2. COALESCE #

SELECT c.id as client_id, COALESCE(c.email, c.phone) AS contact_method
FROM clients c;

COALESCE takes a list of columns (or other info) and returns the first non-NULL one. In the example, you have a table of your business's clients and you want to get a list of them with a singular way to contact them. You prefer to contact them over email, so if they provide you with an email address, return the email address. Otherwise, you will settle for returning the phone number if the email address is not present. Thanks to COALESCE you now just have one column and you do not need to select both columns individually and compare them.

3. CASE #

SELECT CASE
  WHEN c.country = 'US' THEN c.state
  ELSE c.country END AS region
FROM clients c;

CASE operates similarly to if, else if, and else statements. It returns what comes after THEN for the first WHEN statement that is true. If none of the WHEN statements are true, it returns what is under the ELSE statement. In the example, you want to send a gift to each of your clients because of how much you appreciate them, but first, you want to approximate shipping costs so you need to find out where they all live. If they don't live in the United States, you're fine with just calculating the shipping costs based on the country, but otherwise, you want to know which specific state they live in. CASE allows you to check the country condition and return the shipping region based off that check.

4. Semi Joins (EXISTS/IN) #

SELECT u.id as user_id
FROM users u
WHERE EXISTS (
  SELECT 1
  FROM models m
  WHERE m.user_id = u.id
    AND m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);

SELECT u.id as user_id
FROM users u
WHERE u.id IN (
  SELECT m.user_id
  FROM models m
  WHERE m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);

Semi Joins such as EXISTS and IN allow you to check for the existence of matching rows in other tables without having to join to the table itself. In these two examples, which accomplish the same thing, you want to get a list of your software's users who have worked on one of their models within the last 30 days so you know who to reach out to for feedback on some of the newer features. You could just do an INNER JOIN paired with DISTINCT or GROUP BY to get the same result, but the semi joins provide a more performant solution since they only have to find one matching row before they can return while the INNER JOIN finds all the matching rows which are then filtered down by the DISTINCT or GROUP BY into one row. Worth noting that if you are joining on a unique column, regular joins are equally performant to these semi joins.

5. String Pattern Matching (LIKE/ILIKE/~/~*) #

SELECT b.title, b.author
FROM books b
WHERE b.title LIKE '%Pirate%';

SELECT b.title, b.author
FROM books b
WHERE b.title ~ 'Pirate';

SELECT b.title, b.author
FROM books b
WHERE b.title ILIKE '%pirate%';

SELECT b.title, b.author
FROM books b
WHERE b.title ~* 'pirate';

If you need to pattern match on a string, you are provided with quite a few options. The most performant option available to you is LIKE, which uses the built-in SQL matching including % for 0 or more characters. You also have the Postgres-exclusive ~ which has the power of regex behind it, if you need a more complicated match. Then, you have the case insensitive versions of both, ILIKE and ~* respectively. For the example, we see four versions of trying to find a book that mentions "Pirate" in the title, since you want to read an old school high seas adventure. Personally, I have always loved the ~* for quick queries where I just need to find something in a table quickly and the performance is not much of an issue, but I would recommend using LIKE or ILIKE for production code if possible.

6. UNION (or UNION ALL) #

SELECT s.pricing_id, s.price
FROM snacks s
UNION
SELECT t.pricing_id, t.price
FROM tickets t
UNION
SELECT m.pricing_id, m.price
FROM memberships m
ORDER BY price;

UNION allows you to combine the results of multiple queries into one result set. In the example, we have a theater which has three types of products that they store in separate tables due to the different information required for each product type. They have snacks from the snack bar, tickets for the shows, and memberships that allow you to support the theater while also getting discounted prices on other purchases. The theater wants a list of all the prices of their products along with the ID used by their POS system for record-keeping purposes. UNION allows them to take the results from all three tables and bring the distinct ones together. This works as long as each SELECT returns the same number of columns and they columns have similar types. If you are willing to not worry about making each row distinct, you can use UNION ALL instead.

7. FILTER #

SELECT m.id as member_id, COUNT(*) as member_count, COUNT(*) FILTER(WHERE m.expiration_date > current_date) as active_member_count
FROM members_m;

FILTER gives you the ability to run an aggregate function over a subset of the overall result set. Let's go back to the theater from the previous example for this one. Now, they want to know how many total members they have ever had and how many active members they currently have. To get overall total, you can just run COUNT, but you can run COUNT again with the additional FILTER to only get the members you have not hit their expiration date yet.

8. Windows (OVER) #

WITH info AS (
  SELECT b.label, b.price, b.category,
    SUM(b.price) OVER (PARTITION BY b.category) as total_category_price
  FROM bills b
)
SELECT i.label, i.price, i.category,
  (i.price::float / i.total_category_price) * 100 AS percentage
FROM info i

WITH info AS (
  SELECT b.label, b.price, b.category,
    SUM(b.price) OVER w
  FROM bills b
  WINDOW w AS (PARTITION BY b.category)
)
SELECT i.label, i.price, i.category,
  (i.price::float / i.total_category_price) * 100 AS percentage
FROM info i

Windows provide a variety of new functionalities to you (I'll touch on some of these in the next couple of points), but one of the things they allow you to do is have a column call an aggregate function without having to run a GROUP BY. In these two examples, you see two different ways of defining a window. One is inline and the other is separated out at the bottom of the internal query. In both examples, we are looking at a table of our bills and we want a list of the bills alongside what percentage of our budget for each billing category (Food, Entertainment, etc.) we are spending on that bill so we know what will be most beneficial to cancel. The window allows us to find the total of the category by using PARTITION BY in the same query so we can just easily calculate the percentage without any joins to multiple subqueries. Windows also support using ORDER BY which we will see more examples of coming up.

9. LAG/LEAD #

SELECT us.user_id,
  us.log_in_timestamp - LAG(us.log_out_timestamp, 1) OVER (PARTITION BY us.user_id ORDER BY us.log_in_timestamp)
FROM user_sessions us

LAG gives you the ability to access data from a previous row (as determined by the window) of the result set. LEAD does the opposite, giving you data from an upcoming row. The second parameter is how many rows in the past or future you are traveling to get the information. In the example, you want to find out how much time a user is putting your software down before they log back in again and since you save log in and log out times for some reason, you can do that. You just need to get the log out time from the previous row determined by the ORDER BY in the window and compare it to the current log in time. You also want to make sure you are not getting the times from different users mixed in together so you PARTITION BY the user in the window as well. You could also do the opposite by using LEAD to get the log in timestamp from the next row and comparing it to the log out timestamp from the current row. Overall, I find these helpful when comparing data sequentially.

10. ROW_NUMBER #

SELECT m.*, ROW_NUMBER() OVER (PARTITION BY m.user_id ORDER BY m.updated_at DESC) AS rn
FROM models m
HAVING rn >= 2

ROW_NUMBER does basically what you would expect it to do. It assigns each row in the result set a number based off the window. If you do not define a window, it will just assign the number based off the ordering of the overall query. In this example, we only want our users to be able to have two models so we don't crash our database with too many, so we want to get a list of all the users' models that are not one of their two most recently updated in order to delete them. ROW_NUMBER allows us to PARTITION BY the user and order the numbering based on when the model was updated. Then, we can use the HAVING clause to filter out all the rows that are one of the two most recently edited based on the returned row number.

The End #

I hope you find some of this helpful or learned something by reading this. At the very least, I enjoyed coming up with examples and writing this, so I'll take that as a win.

Noah Over

Noah is a Developer in our Durham, NC office. He’s passionate about writing Ruby and working with databases to overcome problems.

More articles by Noah

Related Articles