Iterables, functions, and generators in Python
To learn about the yield
keyword, there are a variety of other terms and concepts that you must be comfortable with first. Before we dive into all of that, let’s keep the end in mind and talk through why yield
is beneficial.
Using yield
will improve the memory efficiency — and subsequently, the speed/performance — when looping over a large iterable. These benefits will be less pronounced for small data sets; however, as data grows so too does the benefit of using yield
.
Now, onto the explanation. In short, yield
is a replacement for the return keyword that produces a generator from a function, which is used as a one-time iterable. If you’re like me when I first learned this, you got lost along the way. So, to unpack this concept, let’s start from the ground and work our way up.
What Is an Iterable?
A bit tongue-in-cheek, but an iterable is a data type that can be iterated over. To clarify, it’s a data type where the individual elements are intelligible data — individual characters of a string, terms in a dictionary, items in a list or tuple, etc. Functionally, data is an iterable if it works in the following code snippet:
for x in my_iterable:
pass
What’s important to note is that while individual elements are accessed one at a time, the entire iterable is still stored in memory. That means that if you’re iterating over a range from 0 to 5,000,000,000, then each of those values is taking up resources. Even though as the programmer you efficiently write a set of repeated commands inside the loop, the application still needs to store that which is being iterated over.
What Is a Function?
A function is a set of commands that form a definition that’s not executed except when invoked or called. What this means is that even if your function’s processes were to take up large amounts of memory and resources, that will only happen when the function is called. The definition of the function does nothing on its own.
It’s helpful to think of a function as a black-box that takes inputs (arguments) and produces an expected output (return value). Thus, a function very much takes on the shape of a formula or an equation, except that it goes beyond arithmetic.
I described a function as a black-box because what happens inside the function really shouldn’t matter to the user—you only need to know what’s necessary to supply to the function, and what to expect as an output. With the exception of the arguments, nothing else is accessible by the function. Similarly, with the exception of the return value, nothing inside the function is accessible outside of it.
What Is a Generator?
A generator is a type of iterable that’s meant for one-time use. The design intention is that an element within the iterable is no longer needed once it has been consumed. Therefore, rather than creating and storing the entirety of an iterable, a generator produces each element one at a time. This saves resources and improves performance. To see this in action, let’s compare a list to a generator:
my_list = [n for n in range(10)] print(my_list) # [0,1,2,3,4,5,6,7,8,9]my_generator = (n for n in range(10)) print(my_generator) # <generator object <genexpr> at 0x10daaa480>
First, the syntactic difference is that the list comprehension is done with square brackets []
while the generator comprehension is performed with parenthesis ()
. Notice the list is all stored and readily printable while a generator points to a generator object.
They seem different, but they behave similarly when working within a loop.
for element in my_list: print(element) """ 0 1 2 3 ... etc. """for element in my_generator: print(element) """ 0 1 2 3 ... etc. """
What Does the “Yield” Keyword Do?
Finally, the explanation for our initial question. The yield
keyword is going to replace return
in a function definition to create a generator. This can then be used as a normal iterable, reaping the benefits of both generators and functions.
def createGenerator(): for i in range(100): yield imy_generator = createGenerator() print(my_generator) # <generator object createGenerator at 0x102dd2480>for i in my_generator: print(i) # prints 0-99
So, how does this work? When the returned generator is first used—not in the assignment but the for
loop—the function definition will execute until it reaches the yield
statement. There, it will pause (see why it’s called yield) until used again. Then, it will pick up where it left off. Upon the final iteration of the generator, any code after the yield
command will execute.
Let’s add some print code to see the sequence of events:
def createGenerator(): print("Beginning of generator") for i in range(3): yield i print("After yield")print("Before assignment") my_generator = createGenerator() print("After assignment")for i in my_generator: print(i) # prints 0-99""" Before assignment After assignment Beginning of generator 0 1 2 After yield """
In summary, the yield
keyword modifies a function’s behavior to produce a generator that’s paused at each yield
command during iteration. The function isn’t executed except upon iteration, which leads to improved resource management, and subsequently, a better overall performance. Use generators (and yielded functions) for creating large data sets meant for single-use iteration.
Source: medium