Iterators VS Generator VS Classic Coroutines in Python
Diving into Python's Iteration Arsenal: Explore the Magic of Iterators, Generators, and Coroutines to Streamline Data Handling and Asynchronous Programming.
Iteration is a fundamental concept in programming. It allows us to process data series one at a time, which is often necessary when the data is too large to fit in memory.
the Iterator design pattern is used to implement iteration. This pattern allows us to create iterators that can be used to iterate over the elements of any standard collection, such as lists and dictionaries, and strings one at a time and it is a built in feature in Python.
Today, we will cover these topics:
- Iterators and how to implement them.
- Generator Functions and how they differ from iterators.
- Leveraging the built in generator functions in Python.
- Combining Generators and Sub Generators.
- Difference between Generators and Classic Coroutines.
Iterables and Iterators
It's important that we start by defining what an In Python is, and to do this we'll rely on what the author of Fluent Python said:
Any object from which the iter built-in function can obtain an iterator. Objects implementing an__iter__
method returning an iterator are iterable. Sequences are always iterable, as are objects implementing a__getitem__
method that accepts 0-based indexes.
So from this we can understand that iterators are obtained from iterables!
How does Python Iterate over objects?
Python automatically invokes the iter(x)
function whenever it requires to iterate over an object x
. This built-in iter()
function follows these steps:
- It verifies if the object implements the
__iter__
method and uses it to acquire an iterator. - In case the
__iter__
method is absent but the__getitem__
method exists,iter()
generates an iterator that attempts to retrieve items using index-based access, commencing from index 0. - If both of the above steps are unsuccessful, Python raises a TypeError message. stating that the object is not iterable.
That is why all Python sequences are iterable: by definition, they all implement__getitem__
and __iter__
and we should implement them in our custom sequences too!
Here's an example of a Sentence sequence that we can iterate over by calling the __getitem__
method:
To successfully implement our own iterable sequence, we have to implement the iterable protocol, i.e. implement the two methods defining the iterator interface in Python:
__next__
: This method furnishes the subsequent item in the sequence and triggers aStopIteration
exception if there are no further items available.__iter__
: It returns the iterator itself, enabling iterators to be employed in situations where an iterable is required, like within a for loop.
Mistakes often arise when constructing iterables and iterators due to a confusion between the two concepts. To clarify, iterables possess an __iter__
method responsible for creating a fresh iterator with each invocation. On the other hand, iterators incorporate a __next__
method that returns individual items and an __iter__
method that returns the iterator itself.
It's important to note that iterators can indeed be iterated over, yet the reverse is not true, iterables do not inherently function as iterators.
Top Iterable reducing functions in Python's standard library
Reducing functions process iterables to return a single result. While all the mentioned built-in functions can be replicated using functools.reduce
, they are available as built-ins for convenience, as they simplify common use cases.
all(it)
returns True if all items in an iterable are truthy, otherwise False. An empty iterable returns True.any(it)
returns True if any item in an iterable is truthy, otherwise False. An empty iterable returns False.max(it)
returns the maximum value of the items in an iterable. An optional key function can be used to specify a custom ordering. If the iterable is empty, the default value is returned.min(it)
returns the minimum value of the items in an iterable. An optional key function can be used to specify a custom ordering. If the iterable is empty, the default value is returned.reduce(func, it)
applies a function to the first pair of items in an iterable, then to that result and the third item, and so on. The function must take two arguments and return a single value. An optional initial value can be used to start the reduction.sum(it, start=0)
returns the sum of all items in an iterable. An optional start value can be used to add to the sum. (Usemath.fsum()
for better precision when adding floats.)
Generators
Generators are a type of iterable that allow us to create iterators using a convenient and memory-efficient approach. They are defined using functions but use the yield
keyword to produce values one at a time, only when needed, as opposed to generating an entire sequence in memory upfront.
This is possible because Generator Functions return Generator Objects which are iterable too since they implement the Iterator interface! In other words, a generator function is a generator factory.
A generator function creates a generator object encapsulating the function's body. Using next()
on the generator advances execution to the next yield
, providing the suspended value. The Python-created generator object raises StopIteration
upon function body completion, adhering to the Iterator protocol.
The re.finditer
function is a lazy version of re.findall
. Instead of a list, re.finditer
returns a generator yielding re.MatchObject
instances on demand. If there are many matches, re.finditer
saves a lot of memory. Using it here in this version, it only reads the next word from the text when it is needed!
Generators are a great shortcut, but the code can be made even more concise with a generator expression.
Generator Expressions, how to use them here?
Generator expressions can replace basic generator function, while list comprehensions construct lists, generator expressions construct generator objects.
The key distinction is in the __iter__
method. Unlike using a generator function, here a generator expression is used within the __iter__
method to create and return a generator object. While the outcome remains unchanged, the caller receives a generator object. Generator expressions are essentially a more concise version that can be substituted with generator functions.
Top Generator Functions in Python's Standard Library
The Python standard library offers a variety of generators, ranging from text file iterators that process lines one by one, to powerful functions like os.walk()
which generates filenames while navigating directories. This simplifies tasks like recursive file searches, making them as straightforward as using a basic loop.
We will start with the filtering Generator Functions:
itertools.compress(it, selector_it)
: Consumes two iterables in parallel, yields items from it whenever the corresponding item in selector_it is truthy.itertools.dropwhile(predicate, it)
: Consumes it, skipping items while predicate computes truthy, then yields every remaining item.filter(predicate, it)
: Applies predicate to each item of iterable, yielding
the item if predicate(item) is truthy, if predicate is None, only truthy items are yielded.itertools.filterfalse(predicate, it)
:Same as filter, with the predicate logic negated, i.e. yields items whenever predicate computes falsy.itertools.islice(it, start, stop,step=1)
:Yields items from a slice of it, similar to s[:stop] or s[start:stop:step] except it can be any iterable, and the operation is lazy.itertools.takewhile(predicate, it)
: Yields items while predicate computes truthy, then stops and no further checks are made.
The next group contains the mapping generators: these yield items computed from
each individual item in the input iterables:
itertools.accumulate(it, [func])
: Produces accumulated sums. If func is specified, it generates outcomes of applying it to consecutive item pairs, then to the previous result and the next item, and so forth.enumerate(iterable, start=0)
: Generates 2-tuples in the form (index, item), with index commencing from start and item taken from the provided iterable.map(func, it1, [it2, …, itN])
: Applies func to each item in it, producing results. If N iterables are given, func should accept N arguments, and the iterables will be concurrently consumed.itertools.starmap(func, it)
: Applies func to each item in it, yielding outcomes.
Next, we have the group of merging generators, all of these yield items from multiple input iterables:
itertools.chain(it1, …, itN)
: Sequentially provides all elements from it1, followed by those from it2, and so forth, creating a seamless stream.itertools.chain.from_iterable(it)
: Successively offers all elements from each iterable produced by it, forming an uninterrupted sequence. The items themselves are also iterable, for instance, it could be a list containing tuples.itertools.product(it1, …, itN, repeat=1)
: Generates the Cartesian product, i.e. it creates N-tuples by combining elements from each input iterable, akin to nested for loops. The 'repeat' parameter allows the input iterables to be reused multiple times.zip(it1, …, itN, strict=False)
: Produces N-tuples by concurrently extracting items from the given iterables, halting silently when the first iterable is depleted. If 'strict=True' is provided, the operation halts completely.itertools.zip_longest(it1, …, itN, fillvalue=None)
: Generates N-tuples by concurrently extracting items from the provided iterables, continuing until the final iterable is exhausted. Vacancies in the tuples are occupied with the specified fill value.
There are many other generator functions that are built in and offered by Python out of the box, it is essential that we make use of them without reinventing the wheel!
Sub Generator and the yield from Expression
The yield from
keyword in Python is used to send a value from a generator to its caller, and then resume execution of the generator at the next yield
statement.
There are multiple uses for this:
- Simplified delegation:
yield from
simplifies the code when we want to delegate the iteration from one generator to another. Instead of manually looping through the inner generator and yielding its values one by one, we can useyield from
to delegate the responsibility. - Transparent passthrough:
yield from
acts as a transparent passthrough. It passes the values from the inner generator directly to the caller of the outer generator. This means that we don't need to manually yield each value from the inner generator. - Handling nested Generators: When working with nested generators (a generator that yields another generator),
yield from
helps to flatten the structure and provide a more intuitive way of working with the combined output.
For example, the following code defines a generator that can be used to iterate over the Fibonacci sequence:
Classic Coroutines
Understanding classic coroutines in Python is confusing because they are actually
generators used in a different way, with that said, let's try to dive in slowly.
TL,DR: Coroutines are Generators that are defined using the async def
syntax and use the await
keyword and can run concurrently!
What is a classic Coroutine?
As we saw, generators are frequently used as iterators, yet they also serve as coroutines.
Coroutines are essentially generator functions that allow us to write code that can run concurrently with other code. They can be used to perform blocking operations, such as network I/O or file I/O since they can be paused and resumed during their execution. They allow asynchronous programming by providing a way to write code that can yield control back to the event loop or other coroutines, enabling efficient multitasking without blocking the execution of other tasks.
Coroutines are defined using the async def
syntax and use the await
keyword to pause the coroutine's execution until a certain condition is met. They work in conjunction with an event loop, typically provided by a library like asyncio
, to manage the execution flow of multiple coroutines concurrently.
For example, the following code defines a simple classic coroutine:
The await
Keyword is used to indicate the point where the coroutine should pause its execution until the awaited operation is complete.
We can also write a more advanced example of a blocking network operations like this:
The asyncio.run()
function runs the main()
function concurrently with other tasks. The main()
function creates a task to run the download_file()
coroutine. The while
loop in the main()
function checks if the task is still running. If it is, the loop sleeps for 1 second and then checks again.
When the download_file()
coroutine finishes downloading the file, the asyncio.run()
function will return.
Why use a classic coroutine?
There are multiple selling points but the most important are:
- Non-Blocking: Coroutines allow non-blocking I/O operations. While waiting for an I/O operation, the coroutine yields control back to the event loop, allowing other tasks to execute.
- Concurrent Execution: Multiple coroutines can run concurrently within the same thread or process, thanks to the event loop's management.
- Simplified Asynchronous Code: Coroutines provide a more intuitive and readable way to write asynchronous code compared to traditional callback based approaches.
Classic coroutines are an integral part of modern Python asynchronous programming, offering a more structured and readable way to manage asynchronous tasks compared to traditional callback-based approaches.
Conclusion
In the landscape of Python programming, understanding and mastering the trio of iterators, generators, and coroutines unlocks a realm of possibilities for efficient data processing, memory management, and asynchronous programming.
Iterators, the foundational concept, pave the way for controlled traversal of sequences, offering a consistent interface for diverse data structures. They provide the essential underpinning for Python's for
loops and the iterable protocol, enabling ease of use and code readability.
Generators, a natural evolution of iterators, shine as memory-efficient workhorses. With the simplicity of functions adorned with the yield
keyword, generators dynamically generate values, presenting a potent solution for handling extensive datasets and infinite series. Their lazy evaluation and ability to pause and resume execution provide a key to avoiding memory bottlenecks and enhancing code efficiency.
Coroutines, propelled by the async def
declaration and powered by the await
keyword, usher Python into the realm of asynchronous programming. They bring concurrency to the forefront, enabling non-blocking I/O operations and responsiveness, crucial for applications that juggle multiple tasks concurrently. Coroutines have revolutionized the way we approach network operations, event-driven programming, and other tasks that thrive on parallelism.
Understanding the synergy among these concepts is paramount. Iterators serve as the foundation upon which generators and coroutines are built. Generators encapsulate the logic of iterators while optimizing memory usage, and coroutines extend the capabilities of generators, introducing the paradigm of asynchronous programming.
As Python's versatility continues to propel it to the forefront of programming languages, a comprehensive grasp of iterators, generators, and coroutines empowers developers to navigate data efficiently, create lean and responsive applications, and harness the full potential of Python's dynamic capabilities. Whether traversing data structures, streamlining memory usage, or orchestrating concurrent operations, these concepts remain invaluable tools in every Python programmer's arsenal.
Further Reading
- The Fluent Python book, chapter 17.
- The itertools documentation.
- The Python Cookbook, chapter 4.