What is the difference between UNION and JOIN in SQL?
5 minute guide to UNION, EXCEPT and INTERSECT in SQL
In this article we dive deep in three often overlooked SQL operators: EXCEPT
, INTERSECT
and UNION
. We will:
- use clear examples and visuals to thoroughly understand the concepts
- understand the ‘rules’ of the operator to really grasp how to use them
- explore a more complex example
- understand the difference with
JOIN
and discuss when to use which
At the end of this article you’ll have some powerful new tools under your SQL belt so let’s code!
Before we begin..
In some databases like SQL Server, PostgreSQL and SQLite we use the EXCEPT
operator. In other databases (MySQL and Oracle e.g.) this operator goes under a different name: MINUS
. Both MINUS
and EXCEPT
work in the exact same way.
TL;DR: EXCEPT == MINUS
For the rest of this article we’ll use SQL Server examples with EXCEPT
; if you use another database, just swap this out for MINUS
where needed.
1. Understanding EXCEPT, INTERSECT and UNION
Let’s start with a very simple example. We’ll first create a table that contains countries: we have an id, a name and a population. Then we’ll insert 10 countries:
CREATE TABLE countries (
id INT,
name VARCHAR(50),
population INTEGER,
);
INSERT INTO countries (id, name, population) VALUES
(1, 'Armenia', 2978763),
(2, 'Belgium', 11730997),
(3, 'Canada', 37742154),
(4, 'Denmark', 5792202),
(5, 'Egypt', 102334404),
(6, 'Ghana', 31072940),
(7, 'Hungary', 9660351),
(8, 'Iraq', 40222493),
(9, 'Kiribati', 119449),
(10, 'Liechtenstein', 38128);
Now we’ll create two queries:
- small countries: select countries with a population of less than 1 million
- c-countries: selects countries whose name contains the letter ‘c’
-- small countries (pop < 1M)
SELECT name FROM countries WHERE population < 1000000
--> Kiribati & Liechtenstein
-- c-countries (name contains a c)
SELECT name FROM countries WHERE NAME LIKE '%c%'
--> Canada and Liechtenstein
Next we’ll demonstrate how to compare and combine the results of these queries into one result set.
1.1 Except
Now we can use EXCEPT
to select all small countries EXCEPT
those that have a ‘c’ in their name:
SELECT name FROM countries WHERE population < 1000000
EXCEPT
SELECT name FROM countries WHERE NAME LIKE '%c%'
--> Kiribati
1.2 Intersect
Use INTERSECT
to find out the overlap: small c-countries:
SELECT name FROM countries WHERE population < 1000000
INTERSECT
SELECT name FROM countries WHERE NAME LIKE '%c%'
--> Liechtenstein
1.3 UNION (and UNION ALL)
Union takes the result of both queries and merges them together in one result set:
SELECT name FROM countries WHERE population < 1000000
UNION
SELECT name FROM countries WHERE NAME LIKE '%c%'
--> Canada, Kiribati, Liechtenstein
SELECT name FROM countries WHERE population < 1000000
UNION ALL
SELECT name FROM countries WHERE NAME LIKE '%c%'
--> Kiribati, Liechtenstein, Canada, Liechtenstein
In the example code above UNION ALL
produces another Liechtenstein! This is because the regular UNION
removes duplicates; UNION ALL
returns all rows from both queries.
2. What determines what overlaps? The rules
SQL analyzes the results of the queries and determines whether there is or isn’t an intersection based on the columns that get returned from either query. Broadly speaking, there are two rules concerning the queries we’re comparing:
- The columns must be comparable (same data type)
- The number of columns must be equal
RULE 1: comparable columns
The code below will fail because we cannot compare id
(integer) and name
(string-type).
-- FAILS: Conversion failed when converting the varchar value 'Canada' to data type int.
SELECT id FROM countries WHERE population < 1000000
EXCEPT
SELECT name FROM countries WHERE NAME LIKE '%c%'
RULE 2: equal number of columns
The code below will fail because we select two columns from query1 and only one from query2:
-- All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
SELECT id, name FROM countries WHERE population < 1000000
EXCEPT
SELECT name FROM countries WHERE NAME LIKE '%c%'
Following the rules
When we adhere to these rules we can also perform queries like the one below: matching all small-country id
's with the population
of ‘c’-countries. Although this is perfectly valid syntax; it will yield no records.
SELECT id FROM countries WHERE population < 1000000
INTERSECT
SELECT population FROM countries WHERE NAME LIKE '%c%'
--> returns no rows (id and population match nowhere)
In the same way we can UNION
the results.
SELECT id FROM countries WHERE population < 1000000
UNION
SELECT population FROM countries WHERE NAME LIKE '%c%'
Will result into:
Also notice that the names of the columns don’t have to match. If the names conflict, the column-name of the first query will be used.
3. EXCEPT, INTERSECT & UNION with different tables
Of course, we don’t have to compare data from the same table, we can use different tables as well, as long as the data structures resulting from the SELECT
statements match. I’ve prepared the following examples with this statement.
1. EXCEPT example:
In this example we compare records from the products
table with records from the sales
table: we want to know if there are any products with a price < 100 cents that have been sold anywhere but the Netherlands:
SELECT id as product_id FROM products WHERE price < 100
EXCEPT
SELECT product_id FROM sales WHERE country = 'Netherlands'
--> Results in product_id 3
2. INTERSECT example
Next up, we’ll select any product with SALES
in the USA that also have returns:
SELECT product_id FROM sales WHERE country = 'USA'
INTERSECT
SELECT product_id from returns
--> Results in product_id 2
3. UNION example
Lastly, we’ll UNION the results of three queries: cheap products, products with any sales in the Netherlands or any product that is returned:
SELECT id as product_id FROM products WHERE price < 100
UNION
SELECT product_id FROM sales WHERE country = 'Netherlands'
UNION
SELECT product_id from returns
--> Returns product_id 1, 2, and 3
4. What is the difference with joins?
I think of JOIN
's as combining columns from two or more tables data sets using a related column between them. The UNION
, INTERSECT
and EXCEPT
operator, on the other hand, combine or compares data from two or more tables and then appends this data.
I like to think of a join operating “horizontally”, “zipping” query results together. The UNION
, INTERSECT
and EXCEPT
operator to work “vertically”, “stacking up” result sets. Let’s demonstrate with some images:
Visualizing a join
Below a visual representation from joining two tables: we take two columns from table1 and combine them with one column from table2 using values in related columns (table1.Id <-> table2.SomeFk):
Think of it this way: using a JOIN
we “zip up” to tables. We make the table wider than it was before: we add to it horizontally.
Visualizing a UNION
Now let’s check out a UNION
statement. Notice that the results are pretty independent of each other. They have no related columns; we just “stack” the result sets on top of each other, expanding the table vertically. We can do this because both queries output the same number of columns and all column data types match.
In this case we use a UNION operator to stack the records on top of each other but INTERSECT
works the same way; only keeping the overlap. In the same way EXCEPT
“subtracts” some records from the “stack”.
Conclusion
I hope to have clarified the workings of these three wonderful operators and that you had fun reading this article.
I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:
- Git for absolute beginners: understanding Git with the help of a video game
- Create and publish your own Python package
- Create a fast auto-documented, maintainable, and easy-to-use Python API in 5 lines of code with FastAPI
Happy coding!
— Mike
P.S: like what I’m doing? Follow me!