Python to SQL — UPSERT Safely, Easily and Fast

Lightning-fast insert and/or update with Python

Python to SQL — UPSERT Safely, Easily and Fast
Time to update our storage and insert some new things (image by Steve Johnson on Unsplash)

When you upsert data into a table, you update records that already exist and insert new ones. After reading this article, you’ll be able to connect your Python application to a database and upsert data lightning fast. We’ll go through a few practical examples that demonstrate various ways to do this. They are all geared towards handling large sets of data; focusing on high speed and data validity. Let’s code!

P.S. We’ll use SQL Server for this example but the techniques used are applicable to a wide range of databases.


1. Setup and Preparations

We have a website that sells various musical articles. Every hour we receive a file from all of our suppliers with their current inventory; all of the articles they sell plus information about stock and prices.

In this example, we’ll focus on the file of one supplier. Our goal is to upsert; to insert all new articles and update ones that already exist. Maybe they have a new stock value or a different price. As a bonus, we’ll check out how we can delete items that get taken out of stock.

1.1 Database prep

Let’s first create a table in the database that we can upsert into.

Our MusicalArticles table

As you can see, our little webshop is not selling a lot of things. That’s okay, though, it’ll keep this article nice and concise. The most important thing to notice is the SKU column. This is the stock-keeping unit; a number that is unique for every product.

1.2 Python — loading data

Let’s say our suppliers send us new CSV-files with their article updates. We have written a Python program that loads the file as soon as it detects a new one. Now the goal is to load the CSV and then upsert it into the database. Keep in mind that it is not required to read your data from a CSV; any data structure will do. First, we’ll load the CSV:

I’ve chosen to use Pandas to read my CSV but this is not required. You can use any method you like. This produces the following dataframe:

1.3 Python — creating the database connection

We’ll use SQLAlchemy to create a database connection to a SQL Server database that runs on my localhost.

Notice fast_executemany=True. This is a feature especially for SQL Server that makes sure to insert your data lightning fast. More info on this in this article.

Preparations are done, let’s insert some data! (image by Elevate on Pexels)

2. Upserting

Our CSV is loaded, our table is prepared and the database connection is made; let’s UPSERT!

The plan

We’ll first insert all of our data into a temporary table. Then, if this succeeded, we’ll MERGE our tables. This operation allows you to insert, update and delete in one statement! For more information about the MERGE check out this article.

Execution

We’ll walk through the code below.

We perform our upsert in 4 steps:

  1. Create a temporary table called #NewProducts. Nothing too exciting here
  2. Insert new data into the temporary table.
    We use some nice parameters to bulk insert our new values into the temporary table. This insertion also uses the fast_executemany option when inserting to SQL Server; this is superfast.
  3. MERGE temporary table with target table (MusicArticles)
    This is where the magic happens: the merge! We take the SKU to compare products.
    - If they are not in the target (our MusicalArticles table) then we’ll simply insert the new musical instruments.
    - products with SKU in both tables get updated. Also we update the Modified column.
    - products that are in the MusicalArticles but not in the temp table get deleted from MusicalArticles. So actually this statement does not only insert and update, it also deletes! You can optionally remove this part to keep our statement a pure upsert.
  4. Delete our temp table

Safe

Also notice that we’re performing these two steps in a block of code that originates at with dbEngine.begin() as con. This starts a new transaction which ensures that everything within the block either succeeds or gets rolled back. This means that if the MERGE fails, inserting the data and creating the temp table gets undone. More information on transactions in this article.

This is particularly handy if there are more steps, e.g. cleaning the data after insertion or merging with many tables. Optionally you can put the insert-statement in a different block so that large inserts don’t get rolled back and just keep the data stored in the temp table.

Results

Sure this is some nice code but let’s see some results! Here is the MusicalArticles table before and after we’ve executed this python script.

Before the upsert (red will get deleted, orange will get modified)
After the upsert (orange is modified, green is newly inserted)

You’ll notice that SKU 123, 24, 33 and 34 are modified; either the price has changed or the inventory. In addition we have two new instruments SKU 35 and 111. Also, SKU 5577 is gone! Must have sold out. With SKU 1233, nothing changed.


Conclusion

With this article, I hope to have shed some light on how to safely, easily and quickly upsert data. If you have suggestions/clarifications please comment so I can improve this article. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!