Iterables, functions, and generators in Python
To learn about the
yield keyword, there are a variety of other terms and concepts that you must be comfortable with first. Before we dive into all of that, let’s keep the end in mind and talk through why
yield is beneficial.
yield will improve the memory efficiency — and subsequently, the speed/performance — when looping over a large iterable. These benefits will be less pronounced for small data sets; however, as data grows so too does the benefit of using
Now, onto the explanation. In short,
yield is a replacement for the return keyword that produces a generator from a function, which is used as a one-time iterable. If you’re like me when I first learned this, you got lost along the way. So, to unpack this concept, let’s start from the ground and work our way up.
What Is an Iterable?
A bit tongue-in-cheek, but an iterable is a data type that can be iterated over. To clarify, it’s a data type where the individual elements are intelligible data — individual characters of a string, terms in a dictionary, items in a list or tuple, etc. Functionally, data is an iterable if it works in the following code snippet:
for x in my_iterable: pass
What’s important to note is that while individual elements are accessed one at a time, the entire iterable is still stored in memory. That means that if you’re iterating over a range from 0 to 5,000,000,000, then each of those values is taking up resources. Even though as the programmer you efficiently write a set of repeated commands inside the loop, the application still needs to store that which is being iterated over.
What Is a Function?
A function is a set of commands that form a definition that’s not executed except when invoked or called. What this means is that even if your function’s processes were to take up large amounts of memory and resources, that will only happen when the function is called. The definition of the function does nothing on its own.
It’s helpful to think of a function as a black-box that takes inputs (arguments) and produces an expected output (return value). Thus, a function very much takes on the shape of a formula or an equation, except that it goes beyond arithmetic.
I described a function as a black-box because what happens inside the function really shouldn’t matter to the user—you only need to know what’s necessary to supply to the function, and what to expect as an output. With the exception of the arguments, nothing else is accessible by the function. Similarly, with the exception of the return value, nothing inside the function is accessible outside of it.
What Is a Generator?
A generator is a type of iterable that’s meant for one-time use. The design intention is that an element within the iterable is no longer needed once it has been consumed. Therefore, rather than creating and storing the entirety of an iterable, a generator produces each element one at a time. This saves resources and improves performance. To see this in action, let’s compare a list to a generator:
my_list = [n for n in range(10)] print(my_list) # [0,1,2,3,4,5,6,7,8,9]my_generator = (n for n in range(10)) print(my_generator) # <generator object <genexpr> at 0x10daaa480>
First, the syntactic difference is that the list comprehension is done with square brackets
 while the generator comprehension is performed with parenthesis
(). Notice the list is all stored and readily printable while a generator points to a generator object.
They seem different, but they behave similarly when working within a loop.
for element in my_list: print(element) """ 0 1 2 3 ... etc. """for element in my_generator: print(element) """ 0 1 2 3 ... etc. """
What Does the “Yield” Keyword Do?
Finally, the explanation for our initial question. The
yield keyword is going to replace
return in a function definition to create a generator. This can then be used as a normal iterable, reaping the benefits of both generators and functions.
def createGenerator(): for i in range(100): yield imy_generator = createGenerator() print(my_generator) # <generator object createGenerator at 0x102dd2480>for i in my_generator: print(i) # prints 0-99
So, how does this work? When the returned generator is first used—not in the assignment but the
for loop—the function definition will execute until it reaches the
yield statement. There, it will pause (see why it’s called yield) until used again. Then, it will pick up where it left off. Upon the final iteration of the generator, any code after the
yield command will execute.
Let’s add some print code to see the sequence of events:
def createGenerator(): print("Beginning of generator") for i in range(3): yield i print("After yield")print("Before assignment") my_generator = createGenerator() print("After assignment")for i in my_generator: print(i) # prints 0-99""" Before assignment After assignment Beginning of generator 0 1 2 After yield """
In summary, the
yield keyword modifies a function’s behavior to produce a generator that’s paused at each
yield command during iteration. The function isn’t executed except upon iteration, which leads to improved resource management, and subsequently, a better overall performance. Use generators (and yielded functions) for creating large data sets meant for single-use iteration.