What is the difference between UNION and JOIN in SQL?

5 minute guide to UNION, EXCEPT and INTERSECT in SQL

What is the difference between UNION and JOIN in SQL?
Union-ing result sets into a single stack of data (image by author)
We’ve got mountains of data to process (image by Simon Berger on Unsplash)

In this article we dive deep in three often overlooked SQL operators: EXCEPT, INTERSECT and UNION. We will:

  1. use clear examples and visuals to thoroughly understand the concepts
  2. understand the ‘rules’ of the operator to really grasp how to use them
  3. explore a more complex example
  4. understand the difference with JOIN and discuss when to use which

At the end of this article you’ll have some powerful new tools under your SQL belt so let’s code!


Before we begin..

In some databases like SQL Server, PostgreSQL and SQLite we use the EXCEPT operator. In other databases (MySQL and Oracle e.g.) this operator goes under a different name: MINUS. Both MINUS and EXCEPT work in the exact same way.

TL;DR: EXCEPT == MINUS

For the rest of this article we’ll use SQL Server examples with EXCEPT; if you use another database, just swap this out for MINUS where needed.

SQL — insert, delete and update in ONE statement: sync your tables with MERGE

1. Understanding EXCEPT, INTERSECT and UNION

Let’s start with a very simple example. We’ll first create a table that contains countries: we have an id, a name and a population. Then we’ll insert 10 countries:

CREATE TABLE countries ( 
  id INT, 
  name VARCHAR(50), 
  population INTEGER, 
); 
 
INSERT INTO countries (id, name, population) VALUES 
    (1, 'Armenia', 2978763), 
    (2, 'Belgium', 11730997), 
    (3, 'Canada', 37742154), 
    (4, 'Denmark', 5792202), 
    (5, 'Egypt', 102334404), 
    (6, 'Ghana', 31072940), 
    (7, 'Hungary', 9660351), 
    (8, 'Iraq', 40222493), 
    (9, 'Kiribati', 119449), 
    (10, 'Liechtenstein', 38128);

Now we’ll create two queries:

  • small countries: select countries with a population of less than 1 million
  • c-countries: selects countries whose name contains the letter ‘c’
-- small countries (pop < 1M) 
SELECT name FROM countries WHERE population < 1000000  
-->  Kiribati & Liechtenstein 
 
 
-- c-countries (name contains a c) 
SELECT name FROM countries WHERE NAME LIKE '%c%'  
--> Canada and Liechtenstein

Next we’ll demonstrate how to compare and combine the results of these queries into one result set.

SQL — Understand how indices work under the hood to speed up your queries.
No more waiting for slow queries to finish

1.1 Except

Now we can use EXCEPT to select all small countries EXCEPT those that have a ‘c’ in their name:

SELECT name FROM countries WHERE population < 1000000  
EXCEPT 
SELECT name FROM countries WHERE NAME LIKE '%c%'  
--> Kiribati
The result of query1 EXCEPT the result of query2 (image by author)

1.2 Intersect

Use INTERSECT to find out the overlap: small c-countries:

SELECT name FROM countries WHERE population < 1000000  
INTERSECT 
SELECT name FROM countries WHERE NAME LIKE '%c%' 
--> Liechtenstein
The intersection between our two queries (Image by author)

1.3 UNION (and UNION ALL)

Union takes the result of both queries and merges them together in one result set:

SELECT name FROM countries WHERE population < 1000000  
UNION 
SELECT name FROM countries WHERE NAME LIKE '%c%' 
--> Canada, Kiribati, Liechtenstein 
 
SELECT name FROM countries WHERE population < 1000000  
UNION ALL 
SELECT name FROM countries WHERE NAME LIKE '%c%' 
--> Kiribati, Liechtenstein, Canada, Liechtenstein

In the example code above UNION ALL produces another Liechtenstein! This is because the regular UNION removes duplicates; UNION ALL returns all rows from both queries.

Returns the outputs of both result sets (image by author)
Python to SQL — UPSERT Safely, Easily and Fast
Lightning-fast insert and/or update with Python

2. What determines what overlaps? The rules

SQL analyzes the results of the queries and determines whether there is or isn’t an intersection based on the columns that get returned from either query. Broadly speaking, there are two rules concerning the queries we’re comparing:

  1. The columns must be comparable (same data type)
  2. The number of columns must be equal

RULE 1: comparable columns

The code below will fail because we cannot compare id (integer) and name (string-type).

-- FAILS: Conversion failed when converting the varchar value 'Canada' to data type int. 
SELECT id FROM countries WHERE population < 1000000  
EXCEPT 
SELECT name FROM countries WHERE NAME LIKE '%c%'

RULE 2: equal number of columns

The code below will fail because we select two columns from query1 and only one from query2:

-- All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists. 
SELECT id, name FROM countries WHERE population < 1000000  
EXCEPT 
SELECT name FROM countries WHERE NAME LIKE '%c%'
SQL — DELETE INTO another table

Following the rules

When we adhere to these rules we can also perform queries like the one below: matching all small-country id's with the population of ‘c’-countries. Although this is perfectly valid syntax; it will yield no records.

SELECT id FROM countries WHERE population < 1000000  
INTERSECT 
SELECT population FROM countries WHERE NAME LIKE '%c%' 
--> returns no rows (id and population match nowhere)

In the same way we can UNION the results.

SELECT id FROM countries WHERE population < 1000000 
UNION 
SELECT population FROM countries WHERE NAME LIKE '%c%'

Will result into:

Also notice that the names of the columns don’t have to match. If the names conflict, the column-name of the first query will be used.


Dramatically improve your database insert speed with a simple upgrade
4 levels of creating blazingly fast database connections in Python

3. EXCEPT, INTERSECT & UNION with different tables

Of course, we don’t have to compare data from the same table, we can use different tables as well, as long as the data structures resulting from the SELECT statements match. I’ve prepared the following examples with this statement.

1. EXCEPT example:

In this example we compare records from the products table with records from the sales table: we want to know if there are any products with a price < 100 cents that have been sold anywhere but the Netherlands:

SELECT id as product_id FROM products WHERE price < 100 
EXCEPT 
SELECT product_id FROM sales WHERE country = 'Netherlands' 
--> Results in product_id 3

2. INTERSECT example

Next up, we’ll select any product with SALES in the USA that also have returns:

SELECT product_id FROM sales WHERE country = 'USA' 
INTERSECT 
SELECT product_id from returns 
--> Results in product_id 2

3. UNION example

Lastly, we’ll UNION the results of three queries: cheap products, products with any sales in the Netherlands or any product that is returned:

SELECT id as product_id FROM products WHERE price < 100 
UNION 
SELECT product_id FROM sales WHERE country = 'Netherlands' 
UNION 
SELECT product_id from returns 
--> Returns product_id 1, 2, and 3
The easiest way to UPSERT with SQLAlchemy
One command to both INSERT new data and UPDATE existing records in your database

4. What is the difference with joins?

I think of JOIN's as combining columns from two or more tables data sets using a related column between them. The UNION, INTERSECT and EXCEPT operator, on the other hand, combine or compares data from two or more tables and then appends this data.

I like to think of a join operating “horizontally”, “zipping” query results together. The UNION, INTERSECT and EXCEPT operator to work “vertically”, “stacking up” result sets. Let’s demonstrate with some images:


Visualizing a join

Below a visual representation from joining two tables: we take two columns from table1 and combine them with one column from table2 using values in related columns (table1.Id <-> table2.SomeFk):

A colorful join (image by author)

Think of it this way: using a JOIN we “zip up” to tables. We make the table wider than it was before: we add to it horizontally.


Visualizing a UNION

Now let’s check out a UNION statement. Notice that the results are pretty independent of each other. They have no related columns; we just “stack” the result sets on top of each other, expanding the table vertically. We can do this because both queries output the same number of columns and all column data types match.

In this case we use a UNION operator to stack the records on top of each other but INTERSECT works the same way; only keeping the overlap. In the same way EXCEPT “subtracts” some records from the “stack”.

No Need to Ever Write SQL Again: SQLAlchemy’s ORM for Absolute Beginners
With this ORM you can create a table, insert, read, delete and update data without writing a single line of SQL

Conclusion

I hope to have clarified the workings of these three wonderful operators and that you had fun reading this article.

I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!

Join Medium with my referral link - Mike Huls
Read every story from Mike Huls (and thousands of other writers on Medium). Your membership fee directly supports Mike…