Cython for absolute beginners: 30x faster code in two simple steps
Easy Python code compilation for blazingly fast applications
Python is very easy to work with; clear syntax, the interpreter and duck-typing allow you to develop very quickly. There are some downsides though: if you don’t have to adhere to a strict syntax, then Python has to do some extra work to get your code to run, causing some functions e.g. to execute very slowly because it has to do all those checks again and again.
Combine the ease and speed of developing with Python with the speed of C: the best of both worlds
In this article we’ll take a “slow” function from a vanilla Python project and make it 30x faster. We do this by using a package called Cython that will convert our Python-code to a compiled, superfast piece of C-code that we can directly import into our project again.
A package called CythonBuilder will automatically Cythonize Python code for us in just 2 steps. With CythonBuilder we’ll Cythonize a function from an example project we’ll define below. Let’s code!
But first..
For those unfamiliar with Cython and CythonBuilder we’ll answer some exploratory questions. Then we’ll define our example project. We’re going to be using the command line so read up on that if you’re unfamiliar:
What is Cython / why use Cython?
Cython converts Python-code to a file that contains instructions for the CPU. The Python interpreter doesn’t have to perform any check anymore on this file; it can just run it. This results in a major performance increase. Check the article below for more detailed information on how Python works under the hood and how it compares to C:
When you Cythonize a piece of code you add extra information to your code; defining types e.g. Then the code is compiled so that Python doesn’t have to perform the extra checks. Again, check the article above for a more in-depth analysis.
How does Cython work
Just like you write Python code in a .py
file, you write Cython code in a .pyx
files. Cython will then convert these files to either a .so file or a .pyd file(depending on your OS). These files can be directly imported into your python project again:
Can all code be optimized by compiling?
Not all code is better off compiled. Awaiting the response of an API is not faster in a C-package for example. In short: we focus on _CPU-heavy_ tasks that require a lot of calculating. Read more in the article below for a more clear distinction.
CythonBuilder — automating Cythonizing
How do you actually Cythonize your .pyx
files? This process is pretty complex; you have to create a setup.py
, define all your packages and then run some commands (see the article below). Instead, we’re going to use CythonBuilder: a package that automates everything for us: build your .pyx
file in one command!
The Example project
Our project contains a function that, for some reason, calculates a number of primes. This function takes a lot of computations that we can optimize.
Lets first install cythonbuilder with pip install cythonbuilder
and then define the regular prime-calculating function
Preparation — the vanilla Python primecalculation function
The function is pretty simple: we’ll pass a number to the function and it returns the number of prime numbers between 0 and the target number:
This function is pure Python. It could be optimized a bit more but the goal is to have a function that performs a lot of calculations. Let’s check out how long it takes this function to find the number of primes between 0 and 100.000:PurePython: 29.4883812 seconds
Step 1. — Cythonize
In this part we’ll introduce Cython. We’ll copy the code of our function and save it into a file called cy_count_primes.pyx
(notice the .pyx
).
Next we cd projectfolder
and call cythonbuilder build
. This will find all of the pyx-files in the projectfolder and build them. The result is a .pyd
file on Windows or a .so
file on Linux. This file is a compiled version of our Python function that we can directly import in our project:from someplace.cy_count_primes import count_primes
print(count_primes(100_000))
Lets check out how it performs:PurePython: 29.4883812 seconds
CyPython : 14.0540504 seconds (2.0982 faster than PurePython
Already over 2 times faster! Notice that we haven’t actually changed anything to our Python code. Let’s optimize the code.
Interfaces:
As you’ll notice even your IDE can inspect the imported files. It knows which functions are present and which arguments are required even though the file is compiled. This is possible because CythonBuilder also builds .pyi files; these are interface files that provide the IDE with information abouth the pyd files.
Step 2 — add types
In this part we add types to the cy_count_primes.pyx
file and then build it again:
As you see we define our function with cpdef
(both accessible by c and p(ython), tell it to return an int
(before count_primes
) and that it expects an the limit
argument to be an int
.
Next, on lines 2,3 and 4, we define the types for some of the variables that are used in the loops; nothing fancy.
Now we can cythonbuilder build
again and time our function again:PurePython: 29.4883812 seconds
CyPython : 14.0540504 seconds (2.0982 faster than PurePython
Cy+Types : 1.1600970 seconds (25.419 faster than PurePython
That’s a very impressive speedup!
The reason why this is so much faster is not within the scope of this article but it has to do with the way Python stores its variables in memory. It’s pretty inefficient compared to C so our C-compiled code can run much faster. Check out this article that dives deep in how Python and C differ from each other under the hood (and why C is so much faster).
Bonus — compilation options
We’ve already improved code execution by 25x but I think we can squeeze a bit more out of it. We’ll do this with compiler directives. These take a little bit of explanation:
Because Python is an interpreted language it has to perform a lot of checks at run-time, for example if your program divides by zero. In C, a compiled language, these check happen at build-time; these errors are spotted when compiling. The advantage is that your program can run more efficiently since it doesn’t have to perform these checks at runtime.
With compiler directives we can disable all these checks, but only if we know we don’t need them. In the example below we upgrade our previous code with 4 decorators that:
- prevent checks on ZeroDivisionError
- prevent checks on IndexErrors (calling myList[5] when the list only contains 3 items)
- prevents checks on isNone
- prevents wraparound; prevents extra checks that are required for calling a list relative to the end like mylist[-5])
Let’s re-build our code (cythonbuilder build
) again and see what time-save skipping all these check offerPurePython: 29.4883812 seconds
CyPython : 14.0540504 seconds (2.0982 faster than PurePython
Cy+Types : 1.1600970 seconds (25.419 faster than PurePython
Cy+Options: 0.9562186 seconds (30.838 faster than PurePython
We’ve shaved off another 0.2 seconds!
Even more speed?
It is possible to speed up our code even more by making use of our multiple cores. Check out the article below for a guide on how to apply multi-processing and threading in Python programs. Also check out this article that shows you how to multi-process Cython code and explains Cython’s annotation files: graphical overviews of which parts of your code can be further optimized. Very convenient!
Conclusion
CythonBuilder makes it easy to speed up our Python code using Cython.
As we’ve seen just copying our Python code and building doubles the execution speeds! The greatest speed increase is by adding types; resulting in a 25x speed increase relative to vanilla Python.
I hope everything was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:
- Why Python is so slow and how to speed it up
- Git for absolute beginners: understanding Git with the help of a video game
- Docker for absolute beginners: the difference between an image and a container
- Docker for absolute beginners — what is Docker and how to use it (+ examples)
- Virtual environments for absolute beginners — what is it and how to create one (+ examples)
- Create and publish your own Python package
- Create Your Custom, private Python Package That You Can PIP Install From Your Git Repository
- Create a fast auto-documented, maintainable, and easy-to-use Python API in 5 lines of code with FastAPI
- Dramatically improve your database insert speed with a simple upgrade
Happy coding!
— Mike
P.S: like what I’m doing? Follow me!