Sentry Answers>SQL>

In SQL how do I select only rows with max value on a column?

In SQL how do I select only rows with max value on a column?

Richard C.

The Problem

If you have a SQL table with different versions of content in multiple rows, how do you get the latest version of the content? Or how do you select the row with the maximum value from a group of rows? This is called the greatest-n-per-group problem.

Consider the example below, where user email addresses change over time, and you want to select the most recent address for every user:

Click to Copy
CREATE TABLE EmailAddress ( UserId INT, Email VARCHAR(255), DateCreated DATE ); INSERT INTO EmailAddress(UserId, Email, DateCreated) VALUES (1, 'Amir@example.com', '2020-01-01'), (1, 'Amir2example.com', '2021-01-01'), (1, 'Amir3example.com', '2022-01-01'), (2, 'Sofia@example.com', '2020-01-01'), (2, 'Sofia2@example.com', '2021-01-01'), (2, 'Sofia3@example.com', '2022-01-01');

The Solution

Unfortunately, you cannot return an ungrouped column value in a grouped query. Instead, for each row you have to find the maximum value in the group, return one result set, and then match that value in the original table.

The query below will return the correct result and run in MS SQL Server, MySQL, and PostgreSQL:

Click to Copy
SELECT * FROM EmailAddress as a WHERE DateCreated = (SELECT MAX(DateCreated) FROM EmailAddress as b WHERE a.UserId = b.UserId); -- UserId Email DateCreated -- 1 Amir3example.com 2022-01-01 -- 2 Sofia3@example.com 2022-01-01

For every UserId in the outer select, the subquery with the inner select is run once to find the maximum value. This is called a correlated subquery. The code is short and easy to understand, but the performance might be poor on large tables.

Solution With Simple Join

If your tables are large, and you want to try a faster query, you can revert to using a simple join with the table on itself.

Click to Copy
SELECT a.* FROM EmailAddress AS a JOIN (SELECT UserId, MAX(DateCreated) DateCreated FROM EmailAddress GROUP BY UserId) AS b ON a.UserId = b.UserId AND a.DateCreated = b.DateCreated;

The query consists of two parts:

  • An inner query to get the maximum date for each user.
  • An outer query to select the email address based on the inner query.
  • Syntax.fmListen to the Syntax Podcast
  • Community SeriesIdentify, Trace, and Fix Endpoint Regression Issues
  • Syntax.fm logo
    Listen to the Syntax Podcast

    Tasty treats for web developers brought to you by Sentry. Get tips and tricks from Wes Bos and Scott Tolinski.

    SEE EPISODES

Considered “not bad” by 4 million developers and more than 100,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.

© 2024 • Sentry is a registered Trademark of Functional Software, Inc.