Let’s Write a Composable, Easy-to-Use Caching Package in Python

Easy, user-friendly caching that tailors to all your needs

Let’s Write a Composable, Easy-to-Use Caching Package in Python
Python choosing a caching strategy (image by ChatGPT + amateuristic edits by author)

In this article we’ll walk through the process of building a caching system in python from the ground up. The goal is to…

  • better understand caching and explore various caching strategies
  • understand why we went for a composition-approach over inheritance
  • learn how and when to apply a cache effectively

We’ll focus on building a user-friendly package that allows you to easily add caches to your code. Additionally, we’ll provide a simple means to extend the cache by providing it with self-designed behavior. Let’s code!


Before we begin..

This article details creating caches that are part of the the Cachr package on PyPI. All code can be found in this Github repo. Contributions are always welcome; feel free to submit bug reports, bug fixes, feature requests, documentation improvements or enhancements!

Creating and Publishing Your Own Python Package for Absolute Beginners
Create, build an publish a Python Package in 5 minutes

Basics: How Caches Work and Terminology

Caches are high-speed data storage layers designed to temporarily hold data, making it quicker to retrieve. For example, imagine storing product details from a web store in a cache after fetching them from the database. When a user requests the same product information again, you can just quickly retrieve it from the cache; significantly reducing respond time and easing the load on your database.


Cache Capacity

It isn’t advisable to allow your cache to grow without limits as it can eventually consume all available memory. For this reason we provide a capacity to our cache. A cache with a capacity of 5 will only allow 5 items in the cache. When a new item needs to be added, we first need to evict an existing item to make room.


Choosing What to evict

When your cache is full and you want to add a new item, we have to determine which item to evict. The simplest approach is to evict the “oldest” item. Other strategies include evicting the Least Recently Used (LRU) item, the Least Frequently Used (LFU), or even removing an item at random.

This decision-making process is know as a caching strategy or replacement policy. In this article we’ll develop a package that comes pre-packaged with a few strategies, while also giving you the flexibility to create and apply your own.

Python: __init__ is NOT a constructor: a deep dive in Python object creation
Tinkering with Python’s constructor to create fast, memory-efficient classes

Expiring Cache Items

Can an item live in your cache forever? In some cases it might be useful to put a new item in your cache with an “expiration date”, after which it will be automatically removed. This is particularly useful for caching time-sensitive data, such as authentication tokens.

For example, if an authentication token is valid for 9 hours, we can cache the results of our “get-token-function” with an expiration time of 9 hours. This way, after the initial request, the token can be quickly retrieved from the cache for the next 9 hours, rather than making repeated HTTP request to obtain the token. This approach speeds up your system and reduces unnecessary network calls.

Python args, kwargs, and All Other Ways to Pass Arguments to Your Function
Expertly design your function parameters in 6 examples

Writing the Cache

Our goal is to develop a package that contains several pre-built caches that are easy to use and efficient. We also want our package to empower users to create custom caches tailored to their specific needs. This process needs to be straightforward yet flexible enough to accommodate various caching requirements.


General Approach

We’ll start by writing a basic caching class that can be initialized with a caching strategy. This caching strategy determines the cache’s behavior such as how items are stored and evicted. Furthermore, we make it easy for users to create and apply their own caching strategy, allowing them to fully customize the cache’s functionality to suit their particular use cases.

In technical terms: our cache will be built with the strategy design pattern that uses composition to encapsulate the cache’s desired behavior in specific classes.


Cache Data Structure

At its core, a cache is simply a key-value store. While a standard Python dict could serve this purpose, we’ll let’s opt for using an OrderedDict. The main advantage is that OrderedDict maintains a doubly linked list which provides the flexibility we need.

An OrderedDict allows us to manipulate the order of items using methods like move_to_end. This is particularly useful for implementing various eviction strategies as it lets us efficiently reorder items based on usage. making our cache more effective and easier to manage.

How to Store and Query 100 Million Items Using Just 77MB with Python Bloom Filters
Perform lightning-fast, memory efficient membership checks in Python with this need-to-know data structure

A Cache with a Strategy

All caching strategy (like the earlier-discussed LRU, LFU, random etc.) differ from each other in how they handle three ways. Each may have their own, unique behavior in:

  • Accessing an item: e.g. update the order of the cache
  • Inserting a new item: e.g. determine where and how a new item is placed in the cache
  • Evicting an item: e.g. which item to remove

To manage these variations in behavior, we’ll use the previously mentioned strategy design pattern to encapsulate behavior in a strategy-class. This class defines three methods that correspond to the three operations listed above. By doing so we can easily swap out and customize caching strategies as needed.

This is what the caching strategy looks like:

class CacheStrategy(ABC): 
    @abstractmethod 
    def on_access(self, cache: "Cache", key: Any) -> None: 
        """ Method that is called before a key is accessed in the cache.""" 
        pass 
 
    @abstractmethod 
    def on_insert(self, cache: "Cache", key: Any) -> None: 
        """Method that is called before a new key-value pair is inserted into the cache.""" 
        pass 
 
    @abstractmethod 
    def evict(self, cache: "Cache") -> None: 
        """Method that evicts an item. """ 
        pass

Each strategy will extend the class above and implement its own versions of the methods provided. This design allows us to easily plug different strategies into our Cache.

In technical terms: we can provide instances of subclasses of CacheStrategy to our Cache to determine its behavior.


Creating a Cache that accepts a Strategy

In the snippet below we define our Cache. Notice that it accepts a capacity and a strategy. The strategy is used by the two main methods of the cache: get and put. This is where the caches-specific behavior happens.

class Cache: 
    cache: OrderedDict 
    capacity: int 
 
    def __init__(self, capacity: int, cache_strategy: CacheStrategy): 
        self.cache: OrderedDict[Any, Any] = OrderedDict() 
        self.capacity: int = capacity 
        self._cache_strategy = cache_strategy 
 
    def get(self, key: Any) -> Optional[Any]: 
        """Retrieve value from cache by provided key - cache hit or miss.""" 
        if key not in self.cache: 
            return None 
        # Call on_access before retrieving the value from the cache 
        self._cache_strategy.on_access(self, key) 
        return self.cache.get(key) 
 
    def put(self, key: Any, value: Any) -> None: 
        """Insert a key-value pair into the cache.""" 
        if key in self.cache: 
            # The key is already in the cache; call on_access 
            self._cache_strategy.on_access(self, key) 
        else: 
            # The key is new: call on_insert 
            self._cache_strategy.on_insert(self, key) 
        self.cache[key] = value

In get, the strategy's on_access method will be called, which determines what to do before an item is retrieved from the cache. One possibility is that the item will be placed to the back of the cache, making it the most recently used.

In put, we will check if the key is alread contained in the cache. If so, we’ll just call the strategy’s on_access method again. If the key-value-pair is new then we call the strategy’s on_insert method before adding it to the cache. This method will determine which key to evict if the cache is at capacity and how to insert into the cache.

Create a Python Package with Super- Fast Rust Code in 3 Steps
Extend you Python code with a package containing Rust code for a >150x performance increase!

Example Strategies

Now we have a means to define our caching strategies and a cache that accepts these strategies, let’s bring them together. In this part we’ll create two caching strategies and finally create a cache that implements one of these strategies. Check out the list of all of Cachr’s current strategies here.


Example strategy 1: default cache

Creating and adding caching strategies is fairly simple now. We just have to implement the three as defined by CacheStrategy. The strategy below is very basic: keep track of the order of added items and remove the oldest if the capacity is reached.

class DefaultEvictionPolicy(CacheStrategy): 
    def on_access(self, cache: "Cache", key: Any) -> None: 
        """ Do nothing. """ 
        pass 
 
    def on_insert(self, cache: "Cache", key: Any) -> None: 
        """ If capacity is reached: evict. """ 
        if len(cache.cache) >= cache.capacity: 
            self.evict(cache=cache) 
 
    def evict(self, cache: "Cache") -> None: 
        """ Remove the oldest item.""" 
        cache.cache.popitem(last=False)

In an OrderedDict a new item is place at the end of the dictionary. Therefore, popping an item at the front will remove the oldest item.


Example strategy 2: LRU cache

This strategy is very similar to the previous but it will move an item to the end of the cache if it’s accessed. Remember that the oldest item lives at the front of the cache. Therefore the least recently used item will be evicted.

class LRUEvictionPolicy(CacheStrategy): 
    def on_access(self, cache: "Cache", key: Any) -> Any: 
        """Move the accessed item to the end to mark it as most recently used.""" 
        cache.cache.move_to_end(key=key) 
 
    def on_insert(self, cache: "Cache", key: Any) -> None: 
        """ If capacity is reached: evict. """ 
        if len(cache.cache) >= cache.capacity: 
            self.evict(cache=cache) 
 
    def evict(self, cache: "Cache") -> None: 
        """Evict the oldest / least recently used item. """ 
        if cache.cache: 
            cache.cache.popitem(last=False)

Bringing it all together: creating the cache

Creating a cache is as easy as this:

# Create a cache with a LRU eviction policy 
cache = Cache(capacity=2, cache_strategy=LRUEvictionPolicy()) 
 
# put our cache to the test 
cache.put(key=1, value="one") 
cache.put(key=2, value="two") 
print(cache.get(key=1))    # prints "one" 
cache.put(key=3, value="three") 
print(cache.get(key=1))    # prints "one" 
print(cache.get(key=2))    # prints None (replaced because it's least recenty used) 
print(cache.get(key=3))    # prints "three"
Applying Python multiprocessing in 2 lines of code
When and how to use multiple cores to execute many times faster

User-friendliness

In this part we’ll focus on making our package more user-friendly by pre-defining certain caches and allowing our Cache object to be used as a decorator.


Pre-defined cache types

First we’ll define some pre-built caches that we’ll include in the package. An example of this:

class LRUCache(Cache): 
    def __init__(self, capacity: int): 
        super().__init__( 
          capacity=capacity,  
          cache_strategy=policies.LRUEvictionPolicy() 
        )

Check out this link for a list of all caches.


Decorate functions with a Cache

We add some code to our Cache class that makes it possible to use it as a decorator. This way we can easily decorate functions like in the example below.

@LRUCache(capacity=2) 
def add(x, y): 
    print(f"calculating {x} + {y}..") 
    return x + y 
 
add(1, 2)    # prints out "Calculating 1 + 2.." 
add(1, 2)    # Add function not executed; value (3) retrieved from cache with key (1, 2) 
add(2, 1)    # prints out "Calculating 2 + 1.."

The arguments you pass to the function will be the key where the function return value will be the value that gets stored in the cache. You can find the code for providing the decorator access here.

Turn Your Python Function into a Decorator with One Line of Code
A new way to write shorter, clearer, and more readable Python decorators that also act as a context manager

Designing a tailor-made cache

Because we used strategies to compose our cache class in stead of inheritance, we can easily create our own strategy and apply it to the cache:

from cachr import Cache, CacheStrategy 
 
# Subclass our CacheStrategy 
class PrintCache(CacheStrategy): 
    def on_access(self, cache: "Cache", key: Any) -> None: 
        print(f"accessing {key=}") 
 
    def on_insert(self, cache: "Cache", key: Any) -> None: 
        if len(cache.cache) >= cache.capacity: 
            self.evict(cache=cache) 
        print(f"inserting {key=}") 
 
    def evict(self, cache: "Cache") -> None: 
        if cache.cache: 
            evicted = cache.cache.popitem(last=False) 
            print(f"evicted key={evicted[0]}") 
 
# Create our custom class 
cache = Cache(capacity=2, cache_strategy=PrintCache()) 
 
# Test out our class 
cache.put(key=1, value="one")    # <-- "inserting key=1 
cache.put(key=2, value="two")    # <-- "inserting key=2 
cache.get(key=1)                 # <-- accessing key=1 
cache.put(key=3, value="three")  # <-- evicted key=1, inserting key=3 
 
print(cache.size)                # <-- 2

This cache has all functionalities of the pre-built caches; it can be used as a decorator and contains all of the convenience methods like size, and clear .

Cython for absolute beginners: 30x faster code in two simple steps
Easy Python code compilation for blazingly fast applications

Conclusion

In this article we’ve walked through the development of Cachr. Along the way we’ve found out much about how caches work, when to apply one and how to build one. We’ve opted for using composition, using the strategy pattern to create a cache that is easy to extend and fit to your use-case. This way we can bundle all or our desired behavior in one small, readable and understandable class, and even switch out strategies on the fly!

I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics.

Happy coding!

— Mike

P.s: like what I'm doing? Follow me!