Python generators are a powerful and efficient way to generate and manipulate large amounts of data. They allow you to create sequence of values on the fly, without having to store all the values in memory at once. Instead, each value is generated and returned one at a time, as needed. This makes generators ideal for working with large datasets or infinite sequences, where it would be impractical or impossible to generate all the values upfront. In this blog post, we will explore the basics of Python generators, how they work, and some common use cases.
![](https://static.wixstatic.com/media/9d30f2_a6fecc5441f94e469f71b9f2f2405cd6~mv2.jpg/v1/fill/w_980,h_709,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/9d30f2_a6fecc5441f94e469f71b9f2f2405cd6~mv2.jpg)
Photo Credit - istockphoto
Lazy Vs Eager Execution
Before diving into the details of generators lets get ourselves aware with some fancy terms because we developers want to show off sometime ;)
In Python, lazy execution and eager execution refer to two different approaches for evaluating expressions or processing data.
Lazy execution, also known as "deferred execution," is an approach where expressions are only evaluated or executed when their values are actually needed. This can be especially useful when working with large datasets or complex computations where it may not be necessary to compute all the results up front. Instead, computations can be done incrementally as needed, potentially saving time and memory.
Eager execution, on the other hand, is an approach where expressions are evaluated or executed as soon as they are defined. This is the default behavior in Python and is useful in situations where it is necessary to compute all the results up front before proceeding with further processing.
So you must have understood why i brought up these fancy terms. The protagonist of this post is a great example of Lazy execution. Now lets try to understand how generators work.
Right and Left hands of Generator - __iter__ and __next__
We have all worked on loops. But mostly we have used loops in the eager execution paradigm like looping on lists, tuples, dicts etc. Now lets write our own class on which we can loop. This will be fun for people reading it for the first time.
class TestIterator:
def __init__(self, max_val):
self.next_val = 0
self.max_val = max_val
def __iter__(self):
return self
def __next__(self):
result = self.next_val
if self.next_val >= self.max_val:
raise StopIteration
self.next_val += 1
return result
This code might look a little complex but dont worry we will see it line by line.
So any object on which we can loop is called an iterable object. It is mandatory for such objects to implement an __iter__ function which returns an iterator object. This iterator object must implement the __next__ function which returns the next element in the iteration and raises a StopIteration exception once the iteration is complete. For simplification we have returned self in __iter__. More complex scenarios can include complex iterators.
In the above code we initialized next_val as 0. This will work as the counter variable whose value will be sent in each iteration. We implemented the __iter__ function which does nothing fancy but just returns self. In the __next__ function we wrote the logic to return the value starting from 0 and incrementing it each time. Also if the max_value is reached we raise. a StopIteration exception.
Now lets see the above code in action and you will soon realise you just built a predefined function of python all by yourself.
for val in TestIterator(10):
print(val) # Do something with this data
This code will print number from 0 till the number passed as argument to the class. Wait a minute. This looks like something we have used alot in our code.
Did we just implement a mini version of the famous range() function ? Ohh yess we did.
So we can see how we are getting the values one after the other rather than loading the entire data in memory first and operating on that value. This is how lazy execution works behind the scenes.
Rise of generators
All of the above code looks cool but becomes a pain when we have to write it regularly. But python being a language for lazy coders we always find a lazier way to do things. So rather than writing all the above bulky code we can simplify it as follows using generators.
def test_iterator(max_value):
num = 0
while max_value > num:
yield num
num += 1
Thats it. This code is equivalent of the TestIterator class. Now we can loop over this function just like we did earlier and get the same result.
for val in test_iterator(10):
print(val) # Do something useful with this value.
So why to use generators when we can simply use loops. Well the biggest demerit of normal loops is that the data on which you are looping must be present entirely in memory. Now let us say you are given a 10 GB log file to parse and your system has just 4 GBs of RAM. You wont be able to load this log file in memory. Here generators come to the rescue. Rather than loading the entire file in memory you can just read the file one line at a time.
Talk is cheap show me the code.
def read_log_file(file_handle):
for line in file_handle:
yield line
def main():
with open(filename, 'r') as f:
for line in read_log_file(f):
print(line) # do something useful with this line
Hope you understood the concept of generators and i would love to answer any questions of yours in the comments below.
Comments