Different Flavors of Protocols In Python
The Only Guide you need for Typing, Goose Typing and Protocols in Python!
As we know, Python is a multi-paradigm programming language that enables us to design and create complex systems that interact with each other seamlessly. It is not a coincidence that Python APIs interact with each other cohesively, this is possible because of the interfaces, ABCs and different protocols that we can work with and it's all made possible by the magic of Typing.
The Typing Map
We already went through this once here, but we will mention it again because we need it for the rest of the article.
In Python, we have different approaches to typing:
- First we have the Duck Typing approach, this is the oldest and the most known approach for Python.
- Then the Goose Typing which was introduced in Python 2.6 and with it we started using the isinstance checks.
- The Static Typing most commonly used in Java and C# languages.
- Finally the Static Duck Typing introduced in Python 3.8 with typing.Protocol type hint.
Protocols
A protocol is sort of a role or an implicit interface, a class must implement specific methods, usually Special Data Model Methods, needed by the protocol to fulfill its role, such is the case of the Sequence Protocol we implemented here to have our objects behave and be recognized as sequences.
For example, implementing __getitem__ is enough to allow retrieving items by index, and also to support iteration and the "in" operator. The __getitem__ special method is really the key to the sequence protocol.
To implement a full protocol, we require several methods, but often it is OK to
implement only part of it, given that we're aware of what we are doing.
There are 2 different kinds of protocols:
- The Dynamic Protocols: these are the original, informal and implicit Python protocols we had since the beginning, they are defined by convention and described in the documentation and some of them are recognized by the interpreter, like the Sequence Protocol.
- The Static Protocols: these are protocols with explicit definition and type added in 3.8, they have to be typing.Protocol subclassed.
import io
from typing import Protocol
class FileResource(Protocol):
uri: str
data: str
file: io.FileIO
def __init__(self, uri: str):
self.data = uri
def open(self):
self.file = io.FileIO(self.uri)
return self.file.fileno()
def close(self):
self.file.close()
There are 2 differences between the dynamic and static protocols:
- Static protocols need all the protocol methods to be implemented even if the program doesn't need them, however, the dynamic protocol can be useful with a single protocol method.
- Static protocols can be verified by static type checkers, but dynamic protocols
can’t.
Both do not need inheritance to work, however using static protocols is often discouraged.
Monkey Patching: Implementing Protocols at Runtime
Monkey patching is a process of dynamically altering a module, class, or function at runtime to introduce functionalities or eliminate bugs. For example, the gevent networking library monkey patches parts of Python’s standard library to allow lightweight concurrency without threads or async/await.
Let's try to add the Shuffle feature to our FrenchDeck Sequence at runtime:
import collections
Card = collections.namedtuple('Card', ['rank', 'suit'])
class FrenchDeck:
ranks = [str(n) for n in range(2, 11)] + list('JQKA')
suits = 'spades diamonds clubs hearts'.split()
def __init__(self):
self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks]
def __len__(self):
return len(self._cards)
def __getitem__(self, position):
return self._cards[position]
def set_card(self, key, value):
self._cards[key] = value
>>> FrenchDeck.__setitem__ = set_card
>>> shuffle(deck)
>>> deck[:5]
# [Card(rank='3', suit='hearts'), Card(rank='4', suit='diamonds'), Card(rank='4', suit='clubs'), Card(rank='7', suit='hearts'), Card(rank='9', suit='spades')]
The trick here is that the set_card function calls self._card, which is in the context of the class after being patched at runtime is a valid attribute in the class FrenchDeck.
This is a demonstration of monkey patching: modifying a class or module at runtime, without modifying the source code. It can be quite the powerful tool, yet the code responsible for the actual patching is heavily attached to the modified program, regularly manipulating private attributes and methods.
Fail Fast Approach and Defensive Programming
Given Python's nature, Fail fast is the best possible approach to maintain programs. Failing fast means raising runtime errors as soon as possible, for example, rejecting invalid arguments right at the beginning of a function body.
Duck Typing is hard to debug and cannot be checked by static type checkers like mypy, you will only face the music at runtime. This is why we usually rely on a more explicit form of runtime type checking: Goose Typing.
Goose Typing
Goose typing is a technique for runtime type evaluation that makes use of ABCs. Basically, We use abstract base classes (ABCs) to define interfaces for explicit type checking at runtime, and it's also supported by static type checkers.
Goose typing means that:
- We subclass from ABCs only, to make it explicitly known that we are implementing a specific interface.
- We check types at runtime using ABCs instead of concrete classes inheriting from said ABCs, usually with isinstance or issubclass functions.
Usually it is not okay to use the isinstance checks excessively but it is permitted in the case of ABCs if we want to enforce a certain protocol or API behavior.
It is also discouraged to create and use ABCs excessively and can be considered a code smell especially since Python is a duck typed language, we would be stripping it of its super power.
Abstract Base Classes in Python's Standard Library
The standard library provides different ABCs for many use cases most of which are defined in the collections.abc module, however there are other ABCs in different other packages such as the io and numbers packages.
Here's a UML Diagram of the ABCs in the collections.abc module :
- Iterable, Container, Sized: These are needed by every collection in Python, either inherited directly or having the proper protocol implemented.
- Collection: Utility ABC to make it easier to subclass from Iterable, Container, Sized.
- Sequence, Mapping, Set: These are the immutable collection types and each has a mutable subclass.
- MappingView: This ABC is inherited by ValusView, ItemsView and KeysView. These are the types of the objects returned from the mapping methods .values(), .items(),and .keys().
You can dive deeper into the collections ABCs here, for now let's try to subclass an ABC.
Example: Subclassing the MutableSequence ABC
We'll pick up the previous example, FrenchDeck, where we added the support for shuffling by monkey patching the instance. We will inherit from the MutableSequence ABC instead to have the same result.
from collections import namedtuple, abc
Card = namedtuple("Card", ["rank", "suit"])
class FrenchDeck2(abc.MutableSequence):
ranks = [str(n) for n in range(2, 11)] + list("JQKA")
suits = "spades diamonds clubs hearts".split()
def __init__(self):
self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks]
def __len__(self):
return len(self._cards)
def __getitem__(self, position):
return self._cards[position]
def __setitem__(self, position, value):
self._cards[position] = value
def __delitem__(self, position):
del self._cards[position]
def insert(self, position, value):
self._cards.insert(position, value)
As you can see, Subclassing the ABC forces us to implement all the methods, even though we only need the __setitem__ method to enable shuffling.
To use ABCs effectively, one must know what's available and if it really is worth it to subclass an ABC, because sometimes a protocol can be exactly what we need.
Create an Abstract Base Class(ABC) in Python
It isn't common, creating ABCs, they need to be justified, we have to be in a certain context where they are used as an extension in a framework. But today we are experimenting and learning, so why not? ✌️
We will implement the original example in the book Fluent Python, as it is, in my opinion, a really simple and effective example.
Our ABC will be called Tombola, it will have 4 methods, 2 of them are abstract. The Tombola will:
- Load items through a load abstract method.
- Remove items through a remove abstract method.
- Inspect items through a concrete inspect method.
- Check if there are item with a loaded concrete method.
import abc
class Tombola(abc.ABC):
@abc.abstractmethod
def load(self, iterable):
"""Add items from an iterable."""
@abc.abstractmethod
def remove(self):
"""Remove item at random, returning it."""
def loaded(self):
return bool(self.inspect())
def inspect(self):
items = []
while True:
try:
items.append(self.pick())
except LookupError:
break
self.load(items)
return tuple(items)
The standard way to declare an ABC is to subclass abc.ABC or any other ABC.
Besides the ABC base class and the @abstractmethod decorator, the abc module
defines the @abstractclassmethod, @abstractstaticmethod, and @abstractproperty decorators, however, these last three were deprecated in Python 3.3.
Subclass the Abstract Base Class(ABC)
Given the Tombola ABC, we’ll now develop two concrete subclasses that satisfy its
interface.
import random
class BingoCage(Tombola):
def __init__(self, items):
self._randomizer = random.SystemRandom()
self._items = []
self.load(items)
def load(self, items):
self._items.extend(items)
self._randomizer.shuffle(self._items)
def loaded(self):
try:
return self._items.pop()
except IndexError:
raise LookupError('pick from empty BingoCage')
def __call__(self):
self.loaded()
class LottoBlower(Tombola):
def __init__(self, iterable):
self._balls = list(iterable)
def load(self, iterable):
self._balls.extend(iterable)
def remove(self):
try:
position = random.randrange(len(self._balls))
except ValueError:
raise LookupError('remove from empty LottoBlower')
return self._balls.pop(position)
def loaded(self):
return bool(self._balls)
def inspect(self):
return tuple(self._balls)
The first class BingoCago implemented the abstract methods only, whereas the second class, LottoBlower, implemented both the abstract and the concrete methods.
So far we have seen explicit forms of inheritance, how about a subtle one? like the Virtual Subclasses!
Virtual Subclasses of Abstract Base Classes(ABCs)
One of the most important dynamic features of Goose Typing is the possibility to declare a class as a virtual subclass of an ABC, even if it doesn't inherit directly from it, this is done with the register Decorator or by simply invoking the register function.
When we register a class as a virtual subclass, we must comply with the ABC's interface, i.e. we must implement the interface defined in said ABC.
We accomplish this by calling a register class method on the ABC. The registered class is then regarded as a virtual subclass of the ABC, and will be accepted as such by issubclass, however it does not obtain any methods or attributes from the ABC.
Here's what the original author of the book Fluent Python had to say about them:
Virtual subclasses do not inherit from their registered ABCs, and are not checked for conformance to the ABC interface at any time, not even when they are instantiated. Also, static type checkers can’t handle virtual subclasses at this time.
For example, let's create a virtual subclass of the previous class example, Tombola:
from random import randrange
from tombola import Tombola
@Tombola.register
class TomboList(list):
def remove(self):
if self:
position = randrange(len(self))
return self.pop(position)
else:
raise LookupError("pop from empty TomboList")
load = list.extend # Tombolist.load is the same as list.extend
def loaded(self):
return bool(self)
def inspect(self):
return tuple(self)
# the functions issubclass and isinstance act as if TomboList is a subclass of Tombola
>>> issubclass(TomboList, Tombola)
True
>>> t = TomboList(range(100))
>>> isinstance(t, Tombola)
True
Just keep in mind that TomboList is not a real subclass of Tombola, virtual subclasses are not real subclasses.
Static Protocols
Static protocols are similar to the dynamic protocols, the difference is that we can enforce type checking, with Mypy for example, to make sure that we respected the defined protocol.
from typing import TypeVar, Protocol
T = TypeVar("T")
class Repeatable(Protocol):
def __mul__(self: T, repeat_count: int) -> T:
...
RT = TypeVar("RT", bound=Repeatable)
def double(x: RT) -> RT:
return x * 2
In the previous example, we explicitly tell Mypy that double takes an argument x that supports x * 2, i.e. the type checker is able to verify that the x parameter is an object that can be multiplied by an integer( enforced by the __mul__ method), and the return value has the same type as x, this is the Repeatable protocol we defined.
Runtime Checking for Static Protocols
Static protocols can be checked at runtime, to enable this we have to decorate our protocol with @runtime_checkable Decorator. This enables us to support isinstance/issubclass checks at runtime.
from typing import Protocol, runtime_checkable
from abc import abstractmethod
@runtime_checkable
class SupportsComplex(Protocol):
"""An ABC with one abstract method __complex__."""
__slots__ = ()
@abstractmethod
def __complex__(self) -> complex:
pass
>>> import numpy as np
>>> c64 = np.complex64(3+4j)
>>> isinstance(c64, SupportsComplex)
True
The key is the __complex__ abstract method. At the time of static type checking, an object will be judged as consistent with the SupportsComplex protocol if it has a __complex__ method that takes only self and returns a complex.
Best Practices for Protocols
- It's always better to implement narrow protocols, i.e. protocols with a single method that enforce a specific behavior or rule.
- Protocols are better written close to the functions that use them, i.e. we define it in client code not in a library to avoid tight coupling.
- Avoid unnecessary tight coupling by defining protocols that will be used once in a specific scope or by defining protocols that implement more than one method, basically Clients should not be forced to depend upon interfaces that they do not use.
- Use plain names for protocols that represent a clear concept (e.g., Iterator,
Container). - Use SupportsX for protocols that provide callable methods (e.g., SupportsInt,
SupportsRead, SupportsReadSeek). - Use HasX for protocols that have readable and/or writable attributes or getter/
setter methods (e.g., HasItems, HasFileno)
Conclusion
Our goal here was to show the different complementary ways to implement an interface, each with its advantages and drawbacks. We went into more details with goose typing as it is an advanced topic that we can have fun learning and experimenting with, and dived into protocols and static protocols and the best practices to design them. Unfortunately this is only the beginning, as mastering these topics is not easy feat, so I will leave you with more resources to help your research.
Further Reading
- Chapter 13 of the book Fluent Python.
- The article "I want a new Duck:
typing.Protocol
and the future of duck typing". - The Quick Python Book.
- The Python in a Nutshell book.