Let’s Write a Composable, Easy-to-Use Caching Package in Python
Easy, user-friendly caching that tailors to all your needs
In this article we’ll walk through the process of building a caching system in python from the ground up. The goal is to…
- better understand caching and explore various caching strategies
- understand why we went for a composition-approach over inheritance
- learn how and when to apply a cache effectively
We’ll focus on building a user-friendly package that allows you to easily add caches to your code. Additionally, we’ll provide a simple means to extend the cache by providing it with self-designed behavior. Let’s code!
Before we begin..
This article details creating caches that are part of the the Cachr package on PyPI. All code can be found in this Github repo. Contributions are always welcome; feel free to submit bug reports, bug fixes, feature requests, documentation improvements or enhancements!
Basics: How Caches Work and Terminology
Caches are high-speed data storage layers designed to temporarily hold data, making it quicker to retrieve. For example, imagine storing product details from a web store in a cache after fetching them from the database. When a user requests the same product information again, you can just quickly retrieve it from the cache; significantly reducing respond time and easing the load on your database.
Cache Capacity
It isn’t advisable to allow your cache to grow without limits as it can eventually consume all available memory. For this reason we provide a capacity to our cache. A cache with a capacity of 5 will only allow 5 items in the cache. When a new item needs to be added, we first need to evict an existing item to make room.
Choosing What to evict
When your cache is full and you want to add a new item, we have to determine which item to evict. The simplest approach is to evict the “oldest” item. Other strategies include evicting the Least Recently Used (LRU) item, the Least Frequently Used (LFU), or even removing an item at random.
This decision-making process is know as a caching strategy or replacement policy. In this article we’ll develop a package that comes pre-packaged with a few strategies, while also giving you the flexibility to create and apply your own.
Expiring Cache Items
Can an item live in your cache forever? In some cases it might be useful to put a new item in your cache with an “expiration date”, after which it will be automatically removed. This is particularly useful for caching time-sensitive data, such as authentication tokens.
For example, if an authentication token is valid for 9 hours, we can cache the results of our “get-token-function” with an expiration time of 9 hours. This way, after the initial request, the token can be quickly retrieved from the cache for the next 9 hours, rather than making repeated HTTP request to obtain the token. This approach speeds up your system and reduces unnecessary network calls.
Writing the Cache
Our goal is to develop a package that contains several pre-built caches that are easy to use and efficient. We also want our package to empower users to create custom caches tailored to their specific needs. This process needs to be straightforward yet flexible enough to accommodate various caching requirements.
General Approach
We’ll start by writing a basic caching class that can be initialized with a caching strategy. This caching strategy determines the cache’s behavior such as how items are stored and evicted. Furthermore, we make it easy for users to create and apply their own caching strategy, allowing them to fully customize the cache’s functionality to suit their particular use cases.
In technical terms: our cache will be built with the strategy design pattern that uses composition to encapsulate the cache’s desired behavior in specific classes.
Cache Data Structure
At its core, a cache is simply a key-value store. While a standard Python dict
could serve this purpose, we’ll let’s opt for using an OrderedDict
. The main advantage is that OrderedDict
maintains a doubly linked list which provides the flexibility we need.
An OrderedDict
allows us to manipulate the order of items using methods like move_to_end
. This is particularly useful for implementing various eviction strategies as it lets us efficiently reorder items based on usage. making our cache more effective and easier to manage.
A Cache with a Strategy
All caching strategy (like the earlier-discussed LRU, LFU, random etc.) differ from each other in how they handle three ways. Each may have their own, unique behavior in:
- Accessing an item: e.g. update the order of the cache
- Inserting a new item: e.g. determine where and how a new item is placed in the cache
- Evicting an item: e.g. which item to remove
To manage these variations in behavior, we’ll use the previously mentioned strategy design pattern to encapsulate behavior in a strategy-class. This class defines three methods that correspond to the three operations listed above. By doing so we can easily swap out and customize caching strategies as needed.
This is what the caching strategy looks like:
class CacheStrategy(ABC):
@abstractmethod
def on_access(self, cache: "Cache", key: Any) -> None:
""" Method that is called before a key is accessed in the cache."""
pass
@abstractmethod
def on_insert(self, cache: "Cache", key: Any) -> None:
"""Method that is called before a new key-value pair is inserted into the cache."""
pass
@abstractmethod
def evict(self, cache: "Cache") -> None:
"""Method that evicts an item. """
pass
Each strategy will extend the class above and implement its own versions of the methods provided. This design allows us to easily plug different strategies into our Cache
.
In technical terms: we can provide instances of subclasses of CacheStrategy
to our Cache
to determine its behavior.
Creating a Cache that accepts a Strategy
In the snippet below we define our Cache
. Notice that it accepts a capacity and a strategy. The strategy is used by the two main methods of the cache: get
and put
. This is where the caches-specific behavior happens.
class Cache:
cache: OrderedDict
capacity: int
def __init__(self, capacity: int, cache_strategy: CacheStrategy):
self.cache: OrderedDict[Any, Any] = OrderedDict()
self.capacity: int = capacity
self._cache_strategy = cache_strategy
def get(self, key: Any) -> Optional[Any]:
"""Retrieve value from cache by provided key - cache hit or miss."""
if key not in self.cache:
return None
# Call on_access before retrieving the value from the cache
self._cache_strategy.on_access(self, key)
return self.cache.get(key)
def put(self, key: Any, value: Any) -> None:
"""Insert a key-value pair into the cache."""
if key in self.cache:
# The key is already in the cache; call on_access
self._cache_strategy.on_access(self, key)
else:
# The key is new: call on_insert
self._cache_strategy.on_insert(self, key)
self.cache[key] = value
In get
, the strategy's on_access
method will be called, which determines what to do before an item is retrieved from the cache. One possibility is that the item will be placed to the back of the cache, making it the most recently used.
In put
, we will check if the key is alread contained in the cache. If so, we’ll just call the strategy’s on_access
method again. If the key-value-pair is new then we call the strategy’s on_insert
method before adding it to the cache. This method will determine which key to evict if the cache is at capacity and how to insert into the cache.
Example Strategies
Now we have a means to define our caching strategies and a cache that accepts these strategies, let’s bring them together. In this part we’ll create two caching strategies and finally create a cache that implements one of these strategies. Check out the list of all of Cachr’s current strategies here.
Example strategy 1: default cache
Creating and adding caching strategies is fairly simple now. We just have to implement the three as defined by CacheStrategy. The strategy below is very basic: keep track of the order of added items and remove the oldest if the capacity is reached.
class DefaultEvictionPolicy(CacheStrategy):
def on_access(self, cache: "Cache", key: Any) -> None:
""" Do nothing. """
pass
def on_insert(self, cache: "Cache", key: Any) -> None:
""" If capacity is reached: evict. """
if len(cache.cache) >= cache.capacity:
self.evict(cache=cache)
def evict(self, cache: "Cache") -> None:
""" Remove the oldest item."""
cache.cache.popitem(last=False)
In an OrderedDict a new item is place at the end of the dictionary. Therefore, popping an item at the front will remove the oldest item.
Example strategy 2: LRU cache
This strategy is very similar to the previous but it will move an item to the end of the cache if it’s accessed. Remember that the oldest item lives at the front of the cache. Therefore the least recently used item will be evicted.
class LRUEvictionPolicy(CacheStrategy):
def on_access(self, cache: "Cache", key: Any) -> Any:
"""Move the accessed item to the end to mark it as most recently used."""
cache.cache.move_to_end(key=key)
def on_insert(self, cache: "Cache", key: Any) -> None:
""" If capacity is reached: evict. """
if len(cache.cache) >= cache.capacity:
self.evict(cache=cache)
def evict(self, cache: "Cache") -> None:
"""Evict the oldest / least recently used item. """
if cache.cache:
cache.cache.popitem(last=False)
Bringing it all together: creating the cache
Creating a cache is as easy as this:
# Create a cache with a LRU eviction policy
cache = Cache(capacity=2, cache_strategy=LRUEvictionPolicy())
# put our cache to the test
cache.put(key=1, value="one")
cache.put(key=2, value="two")
print(cache.get(key=1)) # prints "one"
cache.put(key=3, value="three")
print(cache.get(key=1)) # prints "one"
print(cache.get(key=2)) # prints None (replaced because it's least recenty used)
print(cache.get(key=3)) # prints "three"
User-friendliness
In this part we’ll focus on making our package more user-friendly by pre-defining certain caches and allowing our Cache
object to be used as a decorator.
Pre-defined cache types
First we’ll define some pre-built caches that we’ll include in the package. An example of this:
class LRUCache(Cache):
def __init__(self, capacity: int):
super().__init__(
capacity=capacity,
cache_strategy=policies.LRUEvictionPolicy()
)
Check out this link for a list of all caches.
Decorate functions with a Cache
We add some code to our Cache
class that makes it possible to use it as a decorator. This way we can easily decorate functions like in the example below.
@LRUCache(capacity=2)
def add(x, y):
print(f"calculating {x} + {y}..")
return x + y
add(1, 2) # prints out "Calculating 1 + 2.."
add(1, 2) # Add function not executed; value (3) retrieved from cache with key (1, 2)
add(2, 1) # prints out "Calculating 2 + 1.."
The arguments you pass to the function will be the key where the function return value will be the value that gets stored in the cache. You can find the code for providing the decorator access here.
Designing a tailor-made cache
Because we used strategies to compose our cache class in stead of inheritance, we can easily create our own strategy and apply it to the cache:
from cachr import Cache, CacheStrategy
# Subclass our CacheStrategy
class PrintCache(CacheStrategy):
def on_access(self, cache: "Cache", key: Any) -> None:
print(f"accessing {key=}")
def on_insert(self, cache: "Cache", key: Any) -> None:
if len(cache.cache) >= cache.capacity:
self.evict(cache=cache)
print(f"inserting {key=}")
def evict(self, cache: "Cache") -> None:
if cache.cache:
evicted = cache.cache.popitem(last=False)
print(f"evicted key={evicted[0]}")
# Create our custom class
cache = Cache(capacity=2, cache_strategy=PrintCache())
# Test out our class
cache.put(key=1, value="one") # <-- "inserting key=1
cache.put(key=2, value="two") # <-- "inserting key=2
cache.get(key=1) # <-- accessing key=1
cache.put(key=3, value="three") # <-- evicted key=1, inserting key=3
print(cache.size) # <-- 2
This cache has all functionalities of the pre-built caches; it can be used as a decorator and contains all of the convenience methods like size
, and clear
.
Conclusion
In this article we’ve walked through the development of Cachr. Along the way we’ve found out much about how caches work, when to apply one and how to build one. We’ve opted for using composition, using the strategy pattern to create a cache that is easy to extend and fit to your use-case. This way we can bundle all or our desired behavior in one small, readable and understandable class, and even switch out strategies on the fly!
I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics.
Happy coding!
— Mike
P.s: like what I'm doing? Follow me!