How to Use Generators in Python and 3 Use-cases That Simplify Your Code

What will you learn?

What is a Generator in Python and how to use them to work with large datasets in a Pythonic fashion.

What is a Generator?

A Generator is a function that returns a lazy iterator. Said, differently, you can iterate over the iterator, but it is lazy, that is, it will first execute the code when iterated.

A simple example could be as follows.

def my_generator():
    # Do something
    yield 5
    # Do something more
    yield 8
    # Do something else
    yield 12

Then you can iterate over the generator as follows.

for item in my_generator():
    print(item)

This will print 5, 8, and 12.

At first sight, this doesn’t look very useful. But let’s undestand it a bit better what happens.

When we make the first iteration in the for-loop, then it will execute the code in the my_generator function until it reaches the first yield.

Then it stops and returns the value after yield.

In the next iteration, it will continue where it left off and execute until it reaches the next yield.

Then it stops and returns the value after yield.

And so forth until no more yield statements are there.

Now why is that powerful?

Let’s explore some use-cases.

#1 Pre-processing a work item

If you have a pipeline of work items, where there is a pre-processing step. Often you would combine the pre-processing together with the actual processing. But actually, it will make your code more readable and maintainable if you divide it up.

Explore the example.

def pre_process_items():
    for row in open('data.txt'):
        row = row.strip()
        freq = {c: row.count(c) for c in set(row)}
        yield freq

freq = {}
for item in process_items():
    for k, v in item.items():
        freq[k] = freq.get(k, 0) + v

In this case you prepare the work item in pre_process_items().

If you want to learn about the Dict Comprehension read this guide.

This way you divide your code into a piece that prepares data and another one where you process the data. This makes the code easier to understand.

#2 Filtering work items

Often you have a list of work possible work items that need to be processed, but only a few of them actually need to be processed.

A simple example is processing a Log-file, where we are only interested in a specific log-level.

def get_warnings(log_file):
    for row in open(log_file):
        if 'WARNING' in row:
            yield row

for warning in get_warnings('log_file.txt'):
    print(warning)

This example shows how this simplifies how to filter.

If you want to learn more about text processing in Python read this guide.

#3 API calls

A great use-case is if you need to make an API call. This might require setup and filtering the result and possible reformatting.

import pandas_datareader as pdr
from datetime import datetime, timedelta

def get_stocks(tickers):
    d = datetime.now() - timedelta(days=7)
    for ticker in tickers:
        data = pdr.get_data_yahoo(ticker, d)
        close_price = list(data['Close'])
        yield close_price

for prices in get_stocks(['AAPL', 'TWTR']):
    print(prices)

The advantage of this, is, that it will first make the call to the API when you need the data (lazy load). Say, you have a list of 1000s of tickers, if you had to make all the calls before you can start to process, it could be a long waiting time.

With Generators you can utilize the power of lazy-loading.

Want to learn more?

If this is something you like and you want to get started with Python, then check my 8 hours FREE video course with full explanations, projects on each levels, and guided solutions.

The course is structured with the following resources to improve your learning experience.

  • 17 video lessons teaching you everything you need to know to get started with Python.
  • 34 Jupyter Notebooks with lesson code and projects.
  • A FREE 70+ pages eBook with all the learnings from the lessons.

Leave a Reply Cancel reply

Exit mobile version