In SQL how to use GROUP BY on multiple columns?

Richard C.
jump to solution

The Problem

In SQL, what exactly does GROUP BY do? How does it work when applied to multiple columns instead of just one?

The Solution

Let’s use an example table of people with names and ages. (All commands in this article work on MySQL, PostgreSQL, and MS SQL Server.)

CREATE TABLE Person
(
    Id   INT PRIMARY KEY,
    Name VARCHAR(255),
    Age  INT
);

INSERT INTO Person (Id, Name, Age)
VALUES (1, 'Amir', 25),
       (2, 'Amir', 25),
       (3, 'Aya', 2),
       (4, 'Aya', 30),
       (5, 'Leila', 8),
       (6, 'Leila', 35),
       (7, 'Ndidi', 12),
       (8, 'Ndidi', 12),
       (9, 'Ndidi', 25),
       (10, 'Sofia', 42),
       (11, 'Mateo', 8),
       (12, 'Mateo', 67),
       (13, 'Yara', 35),
       (14, 'Yara', 35),
       (15, 'Yara', 4),
       (16, 'Yara', 35);

First, let’s select Name from the table:

SELECT Name
FROM Person;

-- Name
-- Amir
-- Amir
-- Aya
-- Aya
-- Leila
-- Leila
-- Ndidi
-- Ndidi
-- Ndidi
-- Sofia
-- Mateo
-- Mateo
-- Yara
-- Yara
-- Yara
-- Yara

Sixteen rows are returned because there are 16 people in the table.

If you want to remove duplicates you can either use DISTINCT or GROUP BY:

SELECT DISTINCT Name
FROM Person;

-- Name
-- Amir
-- Aya
-- Leila
-- Mateo
-- Ndidi
-- Sofia
-- Yara
SELECT Name
FROM Person
GROUP BY Name;

-- Name
-- Amir
-- Aya
-- Leila
-- Mateo
-- Ndidi
-- Sofia
-- Yara

GROUP BY merges all identical values in a column into a single row, effectively making it return unique values for a column. Unlike DISTINCT, GROUP BY allows you to calculate information related to each group. The functions that do this are called aggregate functions. They are COUNT, SUM, AVG, MIN, and MAX. For example:

SELECT Name, COUNT(Name)
FROM Person
GROUP BY Name;

-- Name 	  COUNT(Name)
-- Amir 	  2
-- Aya 	    2
-- Leila 	  2
-- Ndidi 	  3
-- Sofia 	  1
-- Mateo 	  2
-- Yara 	  4

If you use GROUP BY on two or more columns, it will merge all pairs of values that are the same in both columns into a single row. In our example, if we group by both Name and Age, then one row will be returned for every person that has the same name and the same age. Here’s the query:

SELECT Name, Age, COUNT(1)
FROM Person
GROUP BY Name, Age;

-- Name 	  Age 	 COUNT(1)
-- Amir 	  25 	    2
-- Aya 	    2 	    1
-- Aya 	    30 	    1
-- Leila 	  8 	    1
-- Leila 	  35 	    1
-- Mateo 	  8 	    1
-- Mateo 	  67 	    1
-- Ndidi 	  12 	    2
-- Ndidi 	  25 	    1
-- Sofia 	  42 	    1
-- Yara 	  4 	    1
-- Yara 	  35 	    3

You can see that there are seven people with the same name, but there are only three people with the same name and age.

Using GROUP BY Name, Age as opposed to GROUP BY Age, Name will make no difference to the results returned (except for the column order).

Do not write GROUP BY Name AND Age instead of GROUP BY Age, Name. Even if your database server accepts this syntax, you will likely get incorrect results.

Considered "not bad" by 4 million developers and more than 150,000 organizations worldwide, Sentry provides code-level observability to many of the world's best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.

Sentry