Bruno Behnken

Oct 01, 2023

Python Generators

I talked about Python Iterators in my last post, and how they work within loop structures. Now let's talk about Python Generators. The first thing worth mentioning is that a Generator is actually a special Iterator that automatically implements the methods __iter__ and __next__, so you don't have to. Second, you don't have to declare a Generator class (though you can still do it if you want to). A function that contains yield is called a Generator function, and will automatically instantiate a new Generator and return it to the caller.

yield works similarly as the return statement, but with one key difference: while return will return a value to the caller and end the function call, removing it from the memory stack; yield will return a value to the caller, transfer the execution control to the caller and save the function call, so it can be restored later.

Why would you want to restore the context of a function call after returning a value? Because you may need to return more values. Let's explain this better with the same example we used in the last post: building a custom range.

def our_range(lower_boundary, upper_boundary):
    i = lower_boundary
    while i < upper_boundary:
        yield i
        i += 1

Now let's try that in a for loop:

>>> for number in our_range(1, 10):
...     print(number, end=' ')
... 
1 2 3 4 5 6 7 8 9 

You are now probably wondering how this is working. When our_range is called, a Generator object is returned to be used by the for loop. As we learned in the last post, at every iteration of the for loop the __next__ function is called. When this happens, our_range is called. On the first call, i is set to be equal to the lower_boundary, the while condition is evaluated, and we execute yield i. At this point, 1 is going to be yielded (returned) to the for loop and be assigned to number, which is then printed. On the next for iteration, our_range is going to be called again, and instead of executing the function from the start, the execution will resume from the line next to the yield; in this case, i += 1, which will assign value 2 to i. The while condition will be evaluated again, and the yield i will be executed once more, this time yielding value 2 to the for loop, which will be assigned to number and printed. This goes on until the condition in the while loop evaluates to False. When this happens, the function will end its execution and won't yield any value, meaning the Generator is exhausted. When an exhausted Generator has its __next__ function called, it will raise a StopIteration Exception, which, in our case, will be caught by the for and cause the loop to end, finishing our execution.

In a simple Iterator, when the __next__ function ends its execution, all its context is lost, so any values that we can't afford to lose must be kept as attributes of the Iterator object. That is what I did to the i variable in the Iterator's post. When using a Generator, we can keep these values in variables inside the generator function, since the context is not lost between the calls.

The Real Generator Deal

You may be thinking “nice, but what real benefit comes with this 'saving the context' approach?” Well, the real deal is: Iterators, generally speaking, must have all their data assigned to a variable (thus, stored in memory) to iterate through them; while Generators, because of saving the context at every iteration, can generate the value for each iteration, thus not requiring to have all the values available in memory.

Again, let's explain this better with an example: we will make a script that reads values from a txt file, perform an operation (calling the function perform_operation) with each of them, and saves the values in a csv file.

def read_txt(filename):
    file = open(filename)
    return file.read().split('\n')

csv_file = open('filename.csv', 'w')
for line in read_txt('file.txt'):
    print(perform_operation(line), file=csv_file)

Pay attention to the way we are reading the data. The read function will bring the whole file to memory, so it can be stored in the lines variable. This means this script will require a memory at least the size of the file. If you are processing a big file in a restricted memory environment (for example, a container), your script may fail simply because it ran out of memory (MemoryError exception). Now let's fix that with a Generator.

def read_txt(filename):
    for row in open(filename):
        yield row

csv_file = open('filename.csv', 'w')
for line in read_txt('file.txt'):
    print(perform_operation(line), file=csv_file)

Now we are reading and yielding one row at a time, which means that instead of requiring a memory the size of the file, now we only require a memory the size of a line.

Generator Comprehension

This advantage becomes even more clear when we use generator comprehension, which works the same way as list comprehension. As an example, let's iterate through a million numbers. Using a list, this would be:

numbers = [i for i in range(1_000_000)]
for number in numbers:
    print(number)

numbers is a list containing a million elements. Let's check its size:

>>> sys.getsizeof(numbers)
8448728

If, instead of a list, we used a Generator, the code would be:

numbers = (i for i in range(1_000_000))
for number in numbers:
    print(number)

Let's now check the numbers Generator size:

>>> sys.getsizeof(numbers)
112

Both codes will behave the same, but the code using the Generator requires much less memory.

To the infinity and beyond

Because of the property of only storing the current element of a sequence, Generators are useful for representing infinite sequences. If your teacher or boss asks you to build a sequence with all the natural numbers, you can either say that it is impossible, or you can give them this Generator:

def natural_numbers():
    i = 0  # My natural numbers start with 0, yours can start with 1 if you want to (:
    while True:
        yield i
        i += 1

This Generator will never stop giving numbers, so it is a viable way of representing a sequence that never ends.

A few more Generator tricks

Generators are versatile, and here are some more things you can do with them.

Multiple yields

Unlike the return statement, yield can be used multiple times in the same function. Let's suppose you want a generator that returns a number, returns the square of this number and then increments the number. This can be done by the following code:

def numbers_and_squares():
    i = 0
    while True:
        yield i
        yield i ** 2
        i += 1

If we call the __next__ function repeatedly, the yields will be 0 0 1 1 2 4 3 9 ....

The close function

As we already know, Generators can represent infinite sequences, which means they will never stop returning numbers. What if we want them to stop? Maybe we want to prevent an infinite loop, or maybe we want to define a “big enough” value. We can use the close function for that. Let's put a stop to our natural_numbers infinite Generator.

def natural_numbers():
    i = 0
    while True:
        yield i
        i += 1

numbers = natural_numbers()
for number in numbers:
    if number >= 1:
        numbers.close()
    print(number)

On the first iteration the for loop will print 0, on the second iteration it will close the Generator and print 2, on the next call the Generator will raise a StopIteration exception that will be caught by the for, ending its loop.

The throw function

As we saw, a closed Generator will raise a StopIteration exception if called. What if you don't want this exception? Maybe you want a ValueError, or a EOFError. You can set a custom exception to be raised by a Generator using the throw function.

def natural_numbers():
    i = 0
    while True:
        yield i
        i += 1

numbers = natural_numbers()
for number in numbers:
    if number >= 1:
        numbers.throw(EOFError)
    print(number)

Again, on the first iteration the for loop will print 0, on the second iteration it will set the Generator to throw a EOFError and print 2, on the next call the Generator will raise the EOFError exception. Since we did not wrap the for loop in a try/except block, this exception will not be caught, and will break our execution:

Traceback (most recent call last):
  File "<input>", line 10, in <module>
  File "<input>", line 4, in natural_numbers
EOFError

The send function

Last, but not least, I have to tell you a secret I've been hiding until now. yield is not a statement. It is an expression, which means it can attribute values to variables. You may think that this value is the same value the yield returns to its caller, but it actually is the opposite: the caller can give the Generator a value, and this value will be the result of the yield expression. This is possible by using the send function. Let's suppose we want our natural_numbers Generator to stop generating if the caller gives it a number bigger than 10. The code would then be:

def natural_numbers():
    i = 0
    while True:
        number = yield i
        if type(number) == int and number > 10:
            break
        i += 1

Now let's test it:

>>> a = natural_numbers()
>>> next(a)
0
>>> next(a)
1
>>> a.send(5)
3
>>> next(a)
4
>>> a.send(100)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

If you have a cunning eye, you noticed that when calling the send function, the Generator yields another value. This means that when you call send, a __next__ is called, right? Wrong. Again, the opposite is happening: when you call a __next__, it actually calls a send(None). This is why we are testing if number is an int: because when we call __next__, a None will be assigned to number. Also, notice that when we send a number bigger than 10, the execution breaks right away; we don't need to call __next__ for that to happen. Keep that in mind when sending numbers to Generators.

Final thoughts

Generators are a broad topic, and I only covered it partly. If you want to know more, I recommend you read the Python Wiki page on Generators, and search for more information on Google. There are plenty of good resources out there.