Create a Python Package with Super- Fast Rust Code in 3 Steps
Extend you Python code with a package containing Rust code for a >150x performance increase!
Python is a pretty easy language to pick up and it’s super quick to write some code in, compared to some other languages. All this ease-of-use comes with a downside: speed is sacrificed. Sometimes Python is just too slow!
To solve this problem we’ll re-write a bit of our Python-code in Rust and import this code, as a Python package, into our original project. We end up with a super-fast Python package that we can import and use like any other package. As a bonus we’ll multi-process our Rusty Python package and end up with a function that is roughly 150x faster. Let’s code!
Overview
A quick summary of what we’re going to do in this article. We’ll tackle the problem in 6 steps (of which step 2, 3 and 4 are devoted to actually writing the package):
- Examining our slow function; why is it slow?
- Preparing our project
- We re-write this function in Rust
- Compile the Rust code and put it in a Python package
- Import the Python package into our project
- Benchmarking the Python function vs the Rust one
We’ll use a Python package called maturin
. This package will compile our Rust code and convert into a package. The result will be like any other Python package that we can import and use (like pandas
).
1. Examining our slow function
It’s first important that we understand why our function is slow. Let’s imagine that our project requires a function that counts the number of primes between two numbers:
def primecounter_py(range_from:int, range_til:int) -> (int, int):
""" Returns the number of found prime numbers using range"""
check_count = 0
prime_count = 0
range_from = range_from if range_from >= 2 else 2
for num in range(range_from, range_til + 1):
for divnum in range(2, num):
check_count += 1
if ((num % divnum) == 0):
break
else:
prime_count += 1
return prime_count, check_count
Please note that:
The number of prime-checks isn’t really necessary for this function but it allows us to compare Python vs Rust in a later part of this article.
The Python code and Rust code in this article are far from optimized for finding primes. The important thing is to demonstrate that we can optimize small chunks of Python with Rust and that we can compare the performance of these functions.
If you insert primecounter_py(10, 20)
it returns 4
(11, 13, 17, and 19 are primes) and the number of prime-checks the function has performed. These small ranges are executed very quickly but when we use larger ranges you’ll see that performance starts to suffer:
range milliseconds
1-1K 4
1-10K 310
1-25K 1754
1-50K 6456
1-75K 14019
1-100K 24194
You see that as our input-size increases ten-fold; the duration increases much more. In other words: the larger a range becomes; the slower it gets (relatively).
Why is the primecounter_py function slow?
Code can be slow for many reasons. It can be I/O-based like waiting for an API, hardware-related or based on the design of Python as a language. In this article it’s the last case. The way Python is designed and how it handles variables e.g. make it very easy to use but you suffer a small speed penalty that becomes apparent when you have to perform a lot of calculations. On the bright side; this function is very suitable for optimization with Rust.
If you are interested in what Python’s limitations are, I recommend reading the article below. It explains the cause and potential solutions for slowness due to how Python is designed.
Is concurrency the problem?
Doing multiple things simultaneously can solve a lot of speed problems. In our case we could opt for using multiple processes to divide all tasks over multiple cores in stead of the default 1. Still we go for the optimization in Rust since we can also multi-process the faster function as you’ll see at the end of this article.
Many cases that involve a lot of I/O can be optimized by using threads (like waiting for an API)? Check out this article or the one below on how to put multiple CPUs to work to increase execution speed.
2. Preparing our project
This is the part where we install dependencies and create all files and folders we need to write Rust and compile it into a package.
a. Create a venv
Create a virtual environment and activate it. Then install maturin
; this package will help us convert out Rust code to a Python package:
python -m venv venv
source venv/bin/activate
pip install maturin
b. Rust files and folders
We’ll create a directory called my_rust_module
that will contain our rust code and cd
into that directory.
mkdir my_rust_module
cd my_rust_module
c. Initializing maturin
Then we call maturin init
. It shows you some options. Choose pyo3
. Maturin now creates some folders and files. Your project should look like this now:
my_folder
|- venv
|- my_rust_module
|- .github
|- src
|- lib.rs
|- .gitignore
|- Cargo.toml
|- pyproject.toml
The most important one is /my_rust_module/src/lib.rs
. This file will contains our Rust code that we’re about to turn into a Python package.
Notice that maturin
also created a Cargo.toml
. This is the configuration of our project. It also contains all of our dependencies (like requirements.txt
). In my case I’ve edited it to look like this:
[package]
name = "my_rust_module"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "my_rust_module"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.17.3", features = ["extension-module"] }
3. Re-writing the function in Rust
We are now ready to recreate our Python function in Rust. We won’t dive too deep in the Rust syntax but focus more on the way we can make the Rust code work with Python. We’ll first create a pure-Rust function and then put it in a package that we can import
and use in Python.
If you’ve never seen Rust code then the code below maybe a little confusing. The most important thing is that the primecounter
function below is pure Rust; it has nothing to do with Python. Open /my_rust_module/src/lib.rs
and give it the following content:
use pyo3::prelude::*;
#[pyfunction]
fn primecounter(range_from:u64, range_til:u64) -> (u32, u32) {
/* Returns the number of found prime numbers between [range_from] and [range_til] """ */
let mut prime_count:u32 = 0;
let mut check_count:u32 = 0;
let _from:u64 = if range_from < 2 { 2 } else { range_from };
let mut prime_found:bool;
for num in _from..=range_til {
prime_found = false;
for divnum in 2..num {
check_count += 1;
if num % divnum == 0 {
prime_found = true;
break;
}
}
if !prime_found {
prime_count += 1;
}
}
return (prime_count, check_count)
}
/// Put the function in a Python module
#[pymodule]
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(primecounter, m)?)?;
Ok(())
}
Let’s run through the most important things:
- The
primecounter
function is pure Rust - The
primecounter
function is decorated with#[pyfunction]
. This indicates that we want to transform it into a Python function - In the last few lines we build a pymodule. The
my_rust_module
function packages the Rust code into a Python module.
4. Rust code -> Python package
This part may seem the hardest but with the help of the maturin
package it becomes very easy for us. Just callmaturin build --release
.
This compiles all Rust code and wraps it into a Python package that ends up in this directory: your_project_dir/my_rust_module/target/wheels
. We install the wheel in the next part.
For windows users:
In the examples below I work in a Debian environment (via Windows WSL). This makes compiling code with Rust a little easier since the compilers we need are already installed. Building on Windows is possible as well but you’ll likely receive a message like Microsoft Visual C++ 14.0 or greater is required
. This means you don’t have a compiler. You can solve this by installing C++ build tools that you can download here.
5. Importing our Rusty Python package
We can directly pip install
the wheel we’ve created in the previous part:
pip install target/wheels/my_rust_module-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl
Then it’s just a matter of importing our module and using the function:
import my_rust_module
primecount, eval_count = my_rust_module.primecounter(range_from=0, range_til=500)
# returns 95 22279
6. Benchmarking Rust vs Python function
Let’s check out our functions compare. We’ll call both the Python and Rust version of our primecounter
function and time them. We also call the function with multiple arguments. There are the results:
range Py ms py e/sec rs ms rs e/sec
1-1K 4 17.6M 0.19 417M
1-10K 310 18.6M 12 481M
1-25K 1754 18.5M 66 489M
1-50K 6456 18.8M 248 488M
1-75K 14019 18.7M 519 505M
1-100K 24194 18.8M 937 485M
Both our Python and Rust function return the result and the count of the numbers they have evaluated. In the overview above you see that Rust outperforms Python by 27x when it comes to these evaluations per second.
The graph above provides a very clear difference in execution time.
7. Bonus: multiprocessing for even more speed
Of course you can multi-process this new Python package! With the code below we divide all numbers that we need to evaluate over all of our cores:
# batch size is determined by the range divided over the amount of available CPU's
batch_size = math.ceil((range_til - range_from) / mp.cpu_count())
# The lines below divide the ranges over all available CPU's.
# A range of 0 - 10 will be divided over 4 cpu's like:
# [(0, 2), (3, 5), (6, 8), (9, 9)]
number_list = list(range(range_from, range_til))
number_list = [number_list[i * batch_size:(i + 1) * batch_size] for i in range((len(number_list) + batch_size - 1) // batch_size)]
number_list_from_til = [(min(chunk), max(chunk)) for chunk in number_list]
primecount = 0
eval_count = 0
with mp.Pool() as
results = mp_pool.starmap(my_rust_module.primecounter, number_list_from_til)
for _count, _evals in results:
primecount += _count
eval_count += _evals
Let’s try find all primes between 0 and 100K again. With our current algorithm this means that we have to perform almost half a billion checks. As you see in the overview below Rust finishes these in 0.88 seconds. With multiprocessing the process finishes in 0.16 seconds; 5.5 times faster, clocking in at 2.8 billion calculations per second.
calculations duration calculations/sec
rust: 455.19M 882.03 ms 516.1M/sec
rust MP: 455.19M 160.62 ms 2.8B/sec
Compared to our original (single process) Python function we’ve increased the number of calculations per second from 18.8M to 2.8 billion. This means that our function is now roughly 150x faster.
Conclusion
As we’ve seen in this article, it’s not all that difficult to extend Python with Rust. If you know when and how to apply this technique you can really improve the execution speed of your program.
I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:
- Git for absolute beginners: understanding Git with the help of a video game
- Create and publish your own Python package
- Coding a Home Intruder System / Motion Detector with Python
Happy coding!
— Mike
P.S: like what I’m doing? Follow me!