Python Simplified

PythonSimplifiedcomLogo

Understanding Generators in Python

Generators in Python

Generators

Generators are just another way to create iterators in Python. You don’t fully understand generators if you don’t know what iterables and iterators are and how they work. I suggest you go through our blog where we have covered iterables and iterators in great detail. 

To recap –

Iterables are objects that can be iterated over and are capable of returning one item at a time. For example, lists, tuples, sets, dictionaries, strings, range, etc are iterables. Iterable is an object that implements the __iter__ dunder method.

Iterators are the objects that implement iterator protocol which consists of __iter__ and __next__ dunder methods. When you pass iterable to __iter__, it returns the iterator object and then using __next__ method you can iterate over all the elements.

In the below example, mylist is iterable because you can iterate over each element in the list using for loop. Note that during the execution of for loop, the iterable mylist gets converted to an iterator internally and then you iterate over all the elements.

				
					>>> mylist = ['C', 'C++', 'Java', 'Python', 'R']

>>> for item in mylist:
...    print(item)
C
C++
Java
Python
R
				
			

If you are still confused about iterables and iterators make sure to read our blog on iterables and iterators before you continue.

I hope you are able to recall what iterables and iterators are now. Let’s jump into generator discussion now. 

There are 2 ways you can create generators in Python — generator functions and generator expressions

Generator Functions

You know that functions contain a set of instructions that get executed and return the value to the caller. As soon as the function returns the value to the caller, all the local variables will be destroyed. When the same function is called again, new variables are created. 

Python function

But with the help of generator functions, you can resume the operation from where it left off previously. 

So, what are generator functions? Any function that contains one or more yield keywords is called a generator function. By using generator functions, just like iterators, you can access one item at a time. As mentioned in the beginning, generators are just another way of creating iterators.

Generator Vs iterator

Calling a generator function returns a generator object. This generator object implements the iterator protocol. If you remember from our previous discussion on iterables and iterators, iterator protocol means the object implements __iter__ and __next__ under methods.

How python generators work

Yield keyword

Whenever Python encounters the yield keyword in the function, it behaves differently. 

  • When the Python interpreter sees the yield keyword in the function, it creates the generator object. 
  • During the execution of the generator object, on encountering yield, it executes the yield statement and suspends the execution by preserving the local variables.
  • the next call to the generator (using __next__ method) will resume where it left off earlier.
  • after iterating over all the elements raises StopIteration.

Let’s see few examples to understand generator functions. 

In the below example, we have 3 yield statements and we know that my_func is generator function. When you call my_func(), a generator object is created.  Then when you call next() on gen object, Python executes line numbers 2-4 and stops the execution by preserving the variables. When the next() is called again, Python starts executes line numbers 5-7 and then in the next call from 8-10.

				
					>>> def my_func():
...     x = 1
...     print(x)
...     yield 'first'
...     x += 1
...     print(x)
...     yield 'second'
...     x += 1
...     print(x)
...     yield 'third'
    
>>> gen = my_func()

>>> next(gen)
1
'first'

>>> next(gen)
2
'second'

>>> next(gen)
3
'third'
				
			

This second example is self-explanatory. The generate_ints is the generator function as it contains the yield keyword. When generate_ints() is called it returns a generator object called gen. By repeatedly calling the next() method on the gen object you iterate over all the elements and at the end, StopIteration is raised.

				
					>>> def generate_ints(n):
...     for i in range(1, n):
...         yield i**2

>>> gen = generate_ints(5)
>>> gen  
<generator object generate_ints at ...>

>>> next(gen)
1

>>> next(gen)
4

>>> next(gen)
9

>>> next(gen)
16

>>> next(gen)
Traceback (most recent call last):
  File "stdin", line 1, in <module>
  File "stdin", line 2, in generate_ints
StopIteration
				
			

Here is another example of generating Fibonacci numbers using a generator. It generates all the Fibonacci numbers below n. The fib() is a generator function. Then using the for loop, you are iterating over the generator object as long as the Fibonacci value is below 100.

				
					>>> def fib(n):
...    first, second = 0, 1
...    while True:
...        if first > n:
...            break
...        yield first
...        first, second = second, first + second       

>>> gen = fib(100)
>>> for x in fib(100):
...    print(x, end=" ")
0 1 1 2 3 5 8 13 21 34 55 89
				
			

Since generators are also iterators, you can use generators the same way you use iterators. For example, you can use a generator with for loop or constructors, etc.

				
					>>> def myfunc():
...    for i in range(1, 6):
...        yield i**2

>>> gen = myfunc()
>>> for i in gen:
...    print(i)
1
4
9
16
25

>>> list(gen)
[1, 4, 9, 16, 25]

>>> gen = myfunc()
>>> max(gen)
25
				
			

Yield Vs return keywords

By now, you must be aware of the main difference between yield and return. To recap, once the return is encountered, all local variables will be destroyed and the program execution is sent back to the caller. When the yield statement is encountered, local variables are preserved and the control is sent back to the caller.

Generator Expressions

I am sure you are already aware of the list comprehensions. You can go through our detailed blog here on the list of comprehensions.

The generator expressions also look very similar to list comprehensions. The one obvious difference you will notice is that list comprehensions are enclosed within square brackets [ ] whereas generator expressions are enclosed within parenthesis ( ).

List comprehensions example: List comprehension return the result in the form of a list.

				
					>>> squares = [ num**2 for num in range(5)]
>>> squares
[0, 1, 4, 9, 16]
				
			

Generator expression example: Generator comprehensions return generator object. Once you have the generator object, it works the same way as we saw in the generator function section above. Refer to the example below.

				
					>>> squares = (num**2 for num in range(5))
>>> squares
<generator object <genexpr> at 0x0000022097492AC0>

>>> next(squares)
0

>>> next(squares)
1

>>> next(squares)
4

>>> next(squares)
9

>>> next(squares)
16

>>> next(squares)
--------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-26-e7cf8d24b3b2> in <module>
----> 1 next(squares)
StopIteration:
				
			

The differences between list comprehensions and generator expressions:

  • List comprehensions are enclosed within square brackets [] whereas generator expressions are enclosed within parenthesis ().
  • List comprehension follows eager evaluation meaning it returns the list immediately. Generator expressions follow lazy evaluation meaning it can return the value only when needed one by one.
  • List comprehensions are iterable whereas generators are iterator.
  • Generator expressions are more memory efficient than list comprehensions. Because list comprehensions create the list and store everything into memory whereas generator expressions create only generator objects and load only one element at a time.

Real-world uses of generators

Having understood about generators, let’s see a couple of real-world use cases of generators. 

(1) One of the real world example which I have personally used is when indexing the documents into Elasticsearch. The requirement was to read the data from the large csv file, process it and then index into Elasticsearch. 

Below is the sample code from Elasticsearch official repo. As you can see, the below generator function reads one record at a time from the csv file. For each record read from the csv file, it creates a document (or record) called doc which is then used to index (or insert) into Elasticsearch. Refer to the above link for the full code.

				
					def generate_actions():
     with open(DATASET_PATH, mode="r") as f:
        reader = csv.DictReader(f)
        
     for row in reader:
        doc = {
                "_id": row["CAMIS"],
                "name": row["DBA"],
                "borough": row["BORO"],
                "cuisine": row["CUISINE DESCRIPTION"],
                "grade": row["GRADE"] or None,
              }
        lat = row["Latitude"]
        lon = row["Longitude"]
        
        if lat not in ("", "0") and lon not in ("", "0"):
            doc["location"] = {"lat": float(lat), "lon": float(lon)}
            yield doc
				
			

(2). Here is another example I come across on StackOverflow. The use case would be reading the data from the database say PostgreSQL, MySQL, etc. and do some processing.

The use case here is that let’s say you have a table named domain which has millions are domain names for which you want to update the Alexa ranking. The select domain from domain will pull millions of records but your server is won’t be able to handle all the data at once. What could be the solution? Generators comes to rescue.

You can have generator function as below which can read 1000 records at time from the database using cursors. Then you can do all the processing such as getting the Alexa ranking for each domain in the table and updating in the table inside dosomethingwith() function.

				
					def ResultGenerator(cursor, batchsize=1000):
     while True:
         results = cursor.fetchmany(batchsize)
         if not results:
            break
         for result in results:
            yield result
				
			
				
					db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
    doSomethingWith(result)
db.close()
				
			

I believe these 2 examples would have given you a sense of how generators will be used in real-world applications. 

When not to use generators

  • Use a list when you need to iterate over the elements multiple times because as you know you can’t iterate over generators multiple times.
  • Use a list when you need to randomly access any elements. Because when using generators you don’t have the entire elements readily available in the memory. 
  • Use a list when you need to use list operations such as len(), pop(), sort(), reversed(), etc. Since you don’t have all the elements readily available you won’t be able to run these methods.

Conclusion

In this article, you have gained a solid understanding of generators, their types with examples, the real-world use cases of generators, when to use generators, etc. Hope you find the article useful. I am curious to know how you are using generators. Please let me know in the comments.

References

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email
Chetan Ambi

Chetan Ambi

A Software Engineer & Team Lead with over 10+ years of IT experience, a Technical Blogger with a passion for cutting edge technology. Currently working in the field of Python, Machine Learning & Data Science. Chetan Ambi holds a Bachelor of Engineering Degree in Computer Science.
Scroll to Top