Create a Python Package with Super- Fast Rust Code in 3 Steps

Extend you Python code with a package containing Rust code for a >150x performance increase!

Create a Python Package with Super- Fast Rust Code in 3 Steps
This Python is getting a bit Rusty! (image by Dall-e 2!)

Python is a pretty easy language to pick up and it’s super quick to write some code in, compared to some other languages. All this ease-of-use comes with a downside: speed is sacrificed. Sometimes Python is just too slow!

To solve this problem we’ll re-write a bit of our Python-code in Rust and import this code, as a Python package, into our original project. We end up with a super-fast Python package that we can import and use like any other package. As a bonus we’ll multi-process our Rusty Python package and end up with a function that is roughly 150x faster. Let’s code!


Overview

A quick summary of what we’re going to do in this article. We’ll tackle the problem in 6 steps (of which step 2, 3 and 4 are devoted to actually writing the package):

  1. Examining our slow function; why is it slow?
  2. Preparing our project
  3. We re-write this function in Rust
  4. Compile the Rust code and put it in a Python package
  5. Import the Python package into our project
  6. Benchmarking the Python function vs the Rust one

We’ll use a Python package called maturin. This package will compile our Rust code and convert into a package. The result will be like any other Python package that we can import and use (like pandas).

Destroying Duck Hunt with OpenCV — image analysis for beginners
Write code that will beat every Duck Hunt high score

1. Examining our slow function

It’s first important that we understand why our function is slow. Let’s imagine that our project requires a function that counts the number of primes between two numbers:

def primecounter_py(range_from:int, range_til:int) -> (int, int): 
  """ Returns the number of found prime numbers using range""" 
  check_count = 0 
  prime_count = 0 
  range_from = range_from if range_from >= 2 else 2 
  for num in range(range_from, range_til + 1): 
    for divnum in range(2, num): 
      check_count += 1 
      if ((num % divnum) == 0): 
        break 
    else: 
      prime_count += 1 
  return prime_count, check_count
Please note that:
The number of prime-checks isn’t really necessary for this function but it allows us to compare Python vs Rust in a later part of this article.
The Python code and Rust code in this article are far from optimized for finding primes. The important thing is to demonstrate that we can optimize small chunks of Python with Rust and that we can compare the performance of these functions.

If you insert primecounter_py(10, 20) it returns 4 (11, 13, 17, and 19 are primes) and the number of prime-checks the function has performed. These small ranges are executed very quickly but when we use larger ranges you’ll see that performance starts to suffer:

range      milliseconds 
1-1K                  4 
1-10K               310 
1-25K              1754 
1-50K              6456 
1-75K             14019 
1-100K            24194

You see that as our input-size increases ten-fold; the duration increases much more. In other words: the larger a range becomes; the slower it gets (relatively).

Keep your code secure by using environment variables and env files
Securely load a file containing all of our app’s required, confidential data like passwords, tokens, etc

Why is the primecounter_py function slow?

Code can be slow for many reasons. It can be I/O-based like waiting for an API, hardware-related or based on the design of Python as a language. In this article it’s the last case. The way Python is designed and how it handles variables e.g. make it very easy to use but you suffer a small speed penalty that becomes apparent when you have to perform a lot of calculations. On the bright side; this function is very suitable for optimization with Rust.

If you are interested in what Python’s limitations are, I recommend reading the article below. It explains the cause and potential solutions for slowness due to how Python is designed.

Why Python is so slow and how to speed it up
Take a look under the hood to see where Python’s bottlenecks lie

Is concurrency the problem?

Doing multiple things simultaneously can solve a lot of speed problems. In our case we could opt for using multiple processes to divide all tasks over multiple cores in stead of the default 1. Still we go for the optimization in Rust since we can also multi-process the faster function as you’ll see at the end of this article.

Many cases that involve a lot of I/O can be optimized by using threads (like waiting for an API)? Check out this article or the one below on how to put multiple CPUs to work to increase execution speed.

Applying Python multiprocessing in 2 lines of code
When and how to use multiple cores to execute many times faster

2. Preparing our project

This is the part where we install dependencies and create all files and folders we need to write Rust and compile it into a package.

Thread Your Python Program with Two Lines of Code
Speed up your program by doing multiple things simultaneously

a. Create a venv

Create a virtual environment and activate it. Then install maturin; this package will help us convert out Rust code to a Python package:

python -m venv venv 
source venv/bin/activate 
pip install maturin
Virtual environments for absolute beginners — what is it and how to create one (+ examples)
A deep dive into Python virtual environments, pip and avoiding entangled dependencies

b. Rust files and folders

We’ll create a directory called my_rust_module that will contain our rust code and cd into that directory.

mkdir my_rust_module 
cd my_rust_module

c. Initializing maturin

Then we call maturin init. It shows you some options. Choose pyo3. Maturin now creates some folders and files. Your project should look like this now:

my_folder 
 |- venv 
 |- my_rust_module 
   |- .github 
   |- src 
    |- lib.rs 
   |- .gitignore 
   |- Cargo.toml 
   |- pyproject.toml

The most important one is /my_rust_module/src/lib.rs. This file will contains our Rust code that we’re about to turn into a Python package.

Notice that maturin also created a Cargo.toml. This is the configuration of our project. It also contains all of our dependencies (like requirements.txt). In my case I’ve edited it to look like this:

[package] 
name = "my_rust_module" 
version = "0.1.0" 
edition = "2021" 
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html 
[lib] 
name = "my_rust_module" 
crate-type = ["cdylib"] 
[dependencies] 
pyo3 = { version = "0.17.3", features = ["extension-module"] }
Create a fast auto-documented, maintainable and easy-to-use Python API in 5 lines of code with…
Perfect for (unexperienced) developers who just need a complete, working, fast and secure API

3. Re-writing the function in Rust

We are now ready to recreate our Python function in Rust. We won’t dive too deep in the Rust syntax but focus more on the way we can make the Rust code work with Python. We’ll first create a pure-Rust function and then put it in a package that we can import and use in Python.

If you’ve never seen Rust code then the code below maybe a little confusing. The most important thing is that the primecounter function below is pure Rust; it has nothing to do with Python. Open /my_rust_module/src/lib.rs and give it the following content:

use pyo3::prelude::*; 
 
#[pyfunction] 
fn primecounter(range_from:u64, range_til:u64) -> (u32, u32) { 
 /* Returns the number of found prime numbers between [range_from] and [range_til] """ */ 
 let mut prime_count:u32 = 0; 
 let mut check_count:u32 = 0; 
 let _from:u64 = if range_from < 2 { 2 } else { range_from }; 
 let mut prime_found:bool; 
  
  for num in _from..=range_til { 
    prime_found = false; 
    for divnum in 2..num { 
      check_count += 1; 
      if num % divnum == 0 { 
        prime_found = true; 
        break; 
      } 
    } 
    if !prime_found { 
      prime_count += 1; 
    } 
  } 
  return (prime_count, check_count) 
} 
 
/// Put the function in a Python module 
#[pymodule] 
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> { 
    m.add_function(wrap_pyfunction!(primecounter, m)?)?; 
    Ok(()) 
}

Let’s run through the most important things:

  1. The primecounter function is pure Rust
  2. The primecounter function is decorated with #[pyfunction]. This indicates that we want to transform it into a Python function
  3. In the last few lines we build a pymodule. The my_rust_module function packages the Rust code into a Python module.
Args vs kwargs: which is the fastest way to call a function in Python?
A clear demonstration of the timeit module

4. Rust code -> Python package

This part may seem the hardest but with the help of the maturin package it becomes very easy for us. Just call
maturin build --release.
This compiles all Rust code and wraps it into a Python package that ends up in this directory: your_project_dir/my_rust_module/target/wheels. We install the wheel in the next part.

For windows users:
In the examples below I work in a Debian environment (via Windows WSL). This makes compiling code with Rust a little easier since the compilers we need are already installed. Building on Windows is possible as well but you’ll likely receive a message like Microsoft Visual C++ 14.0 or greater is required. This means you don’t have a compiler. You can solve this by installing C++ build tools that you can download here.

SQLAlchemy for absolute beginners
Creating a database engine and execute SQL from Python

5. Importing our Rusty Python package

We can directly pip install the wheel we’ve created in the previous part:

pip install target/wheels/my_rust_module-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl

Then it’s just a matter of importing our module and using the function:

import my_rust_module 
 
primecount, eval_count = my_rust_module.primecounter(range_from=0, range_til=500) 
# returns 95 22279
Write Your Own C-extension to Speed Up Python by 100x
How to write, compile, package and import your own, superfast C-module into Python

6. Benchmarking Rust vs Python function

Let’s check out our functions compare. We’ll call both the Python and Rust version of our primecounter function and time them. We also call the function with multiple arguments. There are the results:

range   Py ms   py e/sec    rs ms   rs e/sec 
1-1K        4      17.6M     0.19       417M 
1-10K     310      18.6M       12       481M 
1-25K    1754      18.5M       66       489M 
1-50K    6456      18.8M      248       488M 
1-75K   14019      18.7M      519       505M 
1-100K  24194      18.8M      937       485M

Both our Python and Rust function return the result and the count of the numbers they have evaluated. In the overview above you see that Rust outperforms Python by 27x when it comes to these evaluations per second.

Counting primes in Rust is a lot faster than in Python (image by author)

The graph above provides a very clear difference in execution time.

Run Code after Your Program Exits with Python’s AtExit
Register cleanup functions that run after your script ends or errors

7. Bonus: multiprocessing for even more speed

Of course you can multi-process this new Python package! With the code below we divide all numbers that we need to evaluate over all of our cores:

# batch size is determined by the range divided over the amount of available CPU's  
batch_size = math.ceil((range_til - range_from) / mp.cpu_count()) 
 
# The lines below divide the ranges over all available CPU's.  
# A range of 0 - 10 will be divided over 4 cpu's like: 
# [(0, 2), (3, 5), (6, 8), (9, 9)] 
number_list = list(range(range_from, range_til)) 
number_list = [number_list[i * batch_size:(i + 1) * batch_size] for i in range((len(number_list) + batch_size - 1) // batch_size)] 
number_list_from_til = [(min(chunk), max(chunk)) for chunk in number_list] 
 
primecount = 0 
eval_count = 0 
with mp.Pool() as  
    results = mp_pool.starmap(my_rust_module.primecounter, number_list_from_til) 
    for _count, _evals in results: 
        primecount += _count 
        eval_count += _evals

Let’s try find all primes between 0 and 100K again. With our current algorithm this means that we have to perform almost half a billion checks. As you see in the overview below Rust finishes these in 0.88 seconds. With multiprocessing the process finishes in 0.16 seconds; 5.5 times faster, clocking in at 2.8 billion calculations per second.

calculations     duration    calculations/sec 
rust:            455.19M    882.03 ms          516.1M/sec 
rust MP:         455.19M    160.62 ms            2.8B/sec

Compared to our original (single process) Python function we’ve increased the number of calculations per second from 18.8M to 2.8 billion. This means that our function is now roughly 150x faster.

Secure Your Docker Images with Docker Secrets
Add docker secrets to prevent your docker image from leaking passwords

Conclusion

As we’ve seen in this article, it’s not all that difficult to extend Python with Rust. If you know when and how to apply this technique you can really improve the execution speed of your program.

I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!

Join Medium with my referral link - Mike Huls
Read every story from Mike Huls (and thousands of other writers on Medium). Your membership fee directly supports Mike…