## What will we cover in this tutorial?

We will compare the speed for using Numba optimization when making calculations and modifications on frames from a video stream using OpenCV.

In this tutorial we will divide each frame into same size boxes and calculate the average color for each box. Then make a frame which colors each box to that color.

See the effect down in the video. These calculations are expensive in Python, hence we will compare the performance by using Numba.

## Step 1: Understand the process requirements

Each video frame from OpenCV is an image represented by a NumPy array. In this example we will use the webcam to capture a video stream and do the calculations and modifications live on the stream. This sets high requirements to the processing time of each frame.

To keep a fluid motion picture we need to show each frame in 1/25 of a second. That leaves at most 0.04 seconds for each frame, from capture, process, and update the window with the video stream.

While the capture and updating the window takes time, it leaves is a great uncertainty how fast the frame processing (calculations and modifications) should be, but a upper bound is 0.04 seconds per frame.

## Step 2: The calculations and modifications on each frame

Let’s have some fun. The calculations and modification we want to apply to each frame are as follows.

• Calculations. We divide each frame into small 6×16 pixels areas and calculate the average color for each area. To get the average color we calculate the average of each channel (BGR).
• Modification. For each area we will change the color for each area and fill it entirely with the average color.

This can be done by adding this function to process each frame.

```def process(frame, box_height=6, box_width=16):
height, width, _ = frame.shape
for i in range(0, height, box_height):
for j in range(0, width, box_width):
roi = frame[i:i + box_height, j:j + box_width]
b_mean = np.mean(roi[:, :, 0])
g_mean = np.mean(roi[:, :, 1])
r_mean = np.mean(roi[:, :, 2])
roi[:, :, 0] = b_mean
roi[:, :, 1] = g_mean
roi[:, :, 2] = r_mean
return frame
```

The frame will be divided into areas of the box size (box_height x box_width). For each box (roi: Region of Interest) the average (mean) value of each of the 3 color channels (b_mean, g_mean, r_mean) and overwriting the area to the average color.

## Step 3: Testing performance for this frame process

To get an estimate of the time spend in function process, the cProfile library is quite good. It gives a profiling of time spent in each function call. This is great, since we can get an measure of how much time is spent in the function process.

We can accomplish that by running this code.

```import cv2
import numpy as np
import cProfile

def process(frame, box_height=6, box_width=16):
height, width, _ = frame.shape
for i in range(0, height, box_height):
for j in range(0, width, box_width):
roi = frame[i:i + box_height, j:j + box_width]
b_mean = np.mean(roi[:, :, 0])
g_mean = np.mean(roi[:, :, 1])
r_mean = np.mean(roi[:, :, 2])
roi[:, :, 0] = b_mean
roi[:, :, 1] = g_mean
roi[:, :, 2] = r_mean
return frame

def main(iterations=300):
# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

for _ in range(iterations):
# Read the a frame from webcam
# Flip the frame
frame = cv2.flip(frame, 1)
frame = cv2.resize(frame, (640, 480))

frame = process(frame)

# Show the frame in a window
cv2.imshow('WebCam', frame)

# Check if q has been pressed to quit
if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

cProfile.run("main()")

```

Where the interesting output line is given here.

```   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
300    7.716    0.026   50.184    0.167 TEST2.py:8(process)
```

Which says we use 0.026 seconds per call in the process function. This is good, if we the overhead from the other functions in the main loop is less accumulated to 0.014 seconds.

If we investigate further the calls.

```   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
300    5.132    0.017    5.132    0.017 {method 'read' of 'cv2.VideoCapture' objects}
300    0.073    0.000    0.073    0.000 {resize}
300    2.848    0.009    2.848    0.009 {waitKey}
300    0.120    0.000    0.120    0.000 {flip}
300    0.724    0.002    0.724    0.002 {imshow}
```

Which gives an overhead of approximately 0.028 seconds (0.017 + 0.009 + 0.002) from read, resize, flip, imshow and waitKey calls in each iteration. This adds up to a total of 0.054 seconds per frame or a frame rate of 18.5 frames per seconds (FPS).

This is too slow to make it running smooth.

## Step 4: Introducing the Numba to optimize performance

The Numba library is designed to just-in-time compiling code to make NumPy loops faster. Wow. That is just what we need here. Let’s just jump right into it and see how it will do.

```import cv2
import numpy as np
from numba import jit
import cProfile

@jit(nopython=True)
def process(frame, box_height=6, box_width=16):
height, width, _ = frame.shape
for i in range(0, height, box_height):
for j in range(0, width, box_width):
roi = frame[i:i + box_height, j:j + box_width]
b_mean = np.mean(roi[:, :, 0])
g_mean = np.mean(roi[:, :, 1])
r_mean = np.mean(roi[:, :, 2])
roi[:, :, 0] = b_mean
roi[:, :, 1] = g_mean
roi[:, :, 2] = r_mean
return frame

def main(iterations=300):
# Get the webcam (default webcam is 0)
cap = cv2.VideoCapture(0)
# If your webcam does not support 640 x 480, this will find another resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

for _ in range(iterations):
# Read the a frame from webcam
# Flip the frame
frame = cv2.flip(frame, 1)
frame = cv2.resize(frame, (640, 480))

frame = process(frame)

# Show the frame in a window
cv2.imshow('WebCam', frame)

# Check if q has been pressed to quit
if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

main(iterations=1)
cProfile.run("main(iterations=300)")

```

Notice that we call the main loop with one iteration. This is done to call the process function once before we measure the performance as it will compile the code in the first call and keep it compiled.

The result is as follows.

```   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
300    1.187    0.004    1.187    0.004 TEST2.py:7(pixels)
```

Which estimates a 0.004 seconds per call. This results in a total time of 0.032 seconds per iteration (0.028 + 0.004). This is sufficient to keep the performance of more than 24 frames-per-second (FPS).

Also, this improves the performance by a factor 6.5 times (7.717 / 1.187).

## Conclusion

We got the desired speedup to have a live stream from the webcam and process it frame by frame by using Numba. The speedup was approximately 6.5 times.

## What will we cover in this tutorial?

Compare the difference of using weighted average and normal average over the last frames streaming from your webcam using OpenCV in Python.

The effect can be seen in the video below and code used to create that is provided below.

## The code

The code is straight forward and not optimized. The average is calculated by using a deque from the collection library from Python to create a circular buffer.

The two classes of AverageBuffer and WeightedAverageBuffer share the same code for the constructor and apply, but have each their implementation of get_frame which calculates the average and weighted average, respectively.

Please notice, that the code is not written for efficiency and the AverageBuffer has some easy wins in performance if calculated more efficiently.

An important point to see here, is that the frames are saved as float32 in the buffers. This is necessary when we do the actual calculations on the frames later, where we multiply them by a factor, say 4.

Example. The frames are uint8, which are integers 0 to 255. Say we multiply the frame by 4, and the value is 128. This will give 128*4 = 512, which as an uint8 is 0. Hence, we get an undesirable effect. Therefore we convert them to float32 to avoid this.

```import cv2
import numpy as np
from collections import deque

class AverageBuffer:
def __init__(self, maxlen):
self.buffer = deque(maxlen=maxlen)
self.shape = None

def apply(self, frame):
self.shape = frame.shape
self.buffer.append(frame)

def get_frame(self):
mean_frame = np.zeros(self.shape, dtype='float32')
for item in self.buffer:
mean_frame += item
mean_frame /= len(self.buffer)
return mean_frame.astype('uint8')

class WeightedAverageBuffer(AverageBuffer):
def get_frame(self):
mean_frame = np.zeros(self.shape, dtype='float32')
i = 0
for item in self.buffer:
i += 4
mean_frame += item*i
mean_frame /= (i*(i + 1))/8.0
return mean_frame.astype('uint8')

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)

average_buffer = AverageBuffer(30)
weighted_buffer = WeightedAverageBuffer(30)

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)
frame = cv2.resize(frame, (320, 240))

frame_f32 = frame.astype('float32')
average_buffer.apply(frame_f32)
weighted_buffer.apply(frame_f32)

cv2.imshow('WebCam', frame)
cv2.imshow("Average", average_buffer.get_frame())
cv2.imshow("Weighted average", weighted_buffer.get_frame())

if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

## What will we cover in this tutorial?

How to convert a webcam stream into a black and white line drawing using OpenCV and Python. Also, how to adjust the parameters while running the live stream.

See result here.

## The things you need to use

There are two things you need to use in order to get a good line drawing of your image.

1. GaussianBlur to smooth out the image, as detecting lines is sensitive to noise.
2. Canny that detects the lines.

The Gaussian blur is advised to use a 5×5 filter. The Canny then has to threshold parameters. To find the optimal values for your setting, we have inserted two trackbars where you can set them to any value as see the results.

The code is given below.

```import cv2
import numpy as np

# Setup camera
cap = cv2.VideoCapture(0)
# Set a smaller resolution
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

def nothing(x):
pass

canny = "Canny"
cv2.namedWindow(canny)
cv2.createTrackbar('Threshold 1', canny, 0, 255, nothing)
cv2.createTrackbar('Threshold 2', canny, 0, 255, nothing)

while True:
# Capture frame-by-frame
frame = cv2.flip(frame, 1)

t1 = cv2.getTrackbarPos('Threshold 1', canny)
t2 = cv2.getTrackbarPos('Threshold 2', canny)
gb = cv2.GaussianBlur(frame, (5, 5), 0)
can = cv2.Canny(gb, t1, t2)

cv2.imshow(canny, can)

frame[np.where(can)] = 255
cv2.imshow('WebCam', frame)
if cv2.waitKey(1) == ord('q'):
break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
```

## What will we cover in this tutorial?

How to find all the locations of your followers on Twitter and create a choropleth map (maps where the color of each shape is based on the value of an associated variable) with all countries. This will all be done by using Python.

This is done in connection with my interest of where the followers are from on my Twitter account. Today my result looks like this.

## Step 1: How to get the followers from your Twitter account

If you are new to Twitter API you will need to create a developer account to get your secret key. You can follow this tutorial to create you developer account and get the needed tokens.

When that is done, you can use the tweepy library to connect to the Twitter API. The library function api.followers_ids(api.me().id) will give you a list of all your followers by user-id.

```import tweepy

# Used to connect to the Twitter API
# You need your own keys/secret/tokens here
consumer_key = "--- INSERT YOUR KEY HERE ---"
consumer_secret = "--- INSERT YOUR SECRET HERE ---"
access_token = "--- INSERT YOUR TOKEN HERE ---"
access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
return api

# This function is used to process it all
def process():
# Connecting to the twitter api

# Get the list of all your followers - it only gives user-id's
# - we need to gather all user data after
# - This only returns first 5,000 followers, then you need to cursor to get more.
followers = api.get_follower_ids()
print("Followers", len(followers))

if __name__ == "__main__":
process()

```

Which will print out the number of followers you have on your account.

## Step 2: Get the location of your followers

How do we transform the twitter user-ids to a location?

We need to look them all up. Luckily, not one-by-one. We can do it in chunks of 100 users per call.

The function api.lookup_users(…) can lookup 100 users per call with users-ids or user-names.

```import tweepy

# Used to connect to the Twitter API
# You need your own keys/secret/tokens here
consumer_key = "--- INSERT YOUR KEY HERE ---"
consumer_secret = "--- INSERT YOUR SECRET HERE ---"
access_token = "--- INSERT YOUR TOKEN HERE ---"
access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
return api

# This function is used to process it all
def process():
# Connecting to the twitter api

# Get the list of all your followers - it only gives user-id's
# - we need to gather all user data after
# - This only returns first 5,000 followers, then you need to cursor to get more.
followers = api.get_follower_ids()
print("Followers", len(followers))

# We need to chunk it up in sizes of 100 (max for api.lookup_users)
followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
# Process each chunk - we can call for 100 users per call
for follower_chunk in followers_chunks:
# Get a list of users (with location data)
users = api.lookup_users(user_ids=follower_chunk)
# Process each user to get location
for user in users:
# Print user location
print(user.location)

if __name__ == "__main__":
process()
```

Before you execute this code, you should now it will print all the locations that all your followers have set.

## Step 3: Map all user locations to the same format

When users write their locations, it is done in various ways. As this example shows.

```India
Kenya
Temecula, CA
Atlanta, GA
Florida, United States
Atlanta, GA
Miami, FL
Republic of the Philippines
Tampa, FL
Sammamish, WA
Coffee-machine
```

And as the last example shows, it might not be a real location. Hence, we need to see if we can find the location by asking a service. For this purpose, we will use the GeoPy library, which is a client for several popular geocoding web services.

Hence, for each of the user specified locations (as the examples above) we will call GeoPy and use the result from it as the location. This will bring everything in the same format or clarify if the location exists.

```import tweepy
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim

# Used to connect to the Twitter API
# You need your own keys/secret/tokens here
consumer_key = "--- INSERT YOUR KEY HERE ---"
consumer_secret = "--- INSERT YOUR SECRET HERE ---"
access_token = "--- INSERT YOUR TOKEN HERE ---"
access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
return api

# Used to map the twitter user location description to a standard format
def lookup_location(location):
geo_locator = Nominatim(user_agent="LearnPython")
try:
location = geo_locator.geocode(location, language='en')
except GeocoderTimedOut:
return None
return location

# This function is used to process it all
def process():
# Connecting to the twitter api

# Get the list of all your followers - it only gives user-id's
# - we need to gather all user data after
# - This only returns first 5,000 followers, then you need to cursor to get more.
followers = api.get_follower_ids()
print("Followers", len(followers))

# Used to store all the locations from users
locations = {}

# We need to chunk it up in sizes of 100 (max for api.lookup_users)
followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
# Process each chunk - we can call for 100 users per call
for follower_chunk in followers_chunks:
# Get a list of users (with location data)
users = api.lookup_users(user_ids=follower_chunk)
# Process each user to get location
for user in users:
# Call used to transform users description of location to same format
location = lookup_location(user.location)
# Add it to our counter
if location:
location = location.split(',')[-1].strip()
if location in locations:
locations[location] += 1
else:
locations[location] = 1

if __name__ == "__main__":
process()
```

As you see, it will count the occurrences of each location found. The split and strip is used to get the country and leave out the rest of the address if any.

## Step 4: Reformat the locations into a Pandas DataFrame

We want to reformat the locations into a DataFrame to be able to join (merge) it with GeoPandas, which contains the choropleth map we want to use.

To convert the locations into a DataFrame we need to restructure it. This will also helps us to remove duplicates. As an example, United States and United States of America both appear. To handle that we will map all country names to a 3 letter code. We will use the pycountry library for that.

```import tweepy
import pycountry
import pandas as pd
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim

# Used to connect to the Twitter API
# You need your own keys/secret/tokens here
consumer_key = "--- INSERT YOUR KEY HERE ---"
consumer_secret = "--- INSERT YOUR SECRET HERE ---"
access_token = "--- INSERT YOUR TOKEN HERE ---"
access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
return api

# Helper function to map country names to alpha_3 representation
# Some are not supported - and are hard-coded in
# Function used to map country names from GeoPandas and the country names from geo_locator
def lookup_country_code(country):
try:
alpha_3 = pycountry.countries.lookup(country).alpha_3
return alpha_3
except LookupError:
if country == 'The Netherlands':
country = 'NLD'
elif country == 'Democratic Republic of the Congo':
country = 'COG'
return country

# Used to map the twitter user location description to a standard format
def lookup_location(location):
geo_locator = Nominatim(user_agent="LearnPython")
try:
location = geo_locator.geocode(location, language='en')
except GeocoderTimedOut:
return None
return location

# This function is used to process it all
def process():
# Connecting to the twitter api

# Get the list of all your followers - it only gives user-id's
# - we need to gather all user data after
# - This only returns first 5,000 followers, then you need to cursor to get more.
followers = api.get_follower_ids()
print("Followers", len(followers))

# Used to store all the locations from users
locations = {}

# We need to chunk it up in sizes of 100 (max for api.lookup_users)
followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
# Process each chunk - we can call for 100 users per call
for follower_chunk in followers_chunks:
# Get a list of users (with location data)
users = api.lookup_users(user_ids=follower_chunk)
# Process each user to get location
for user in users:
# Call used to transform users description of location to same format
location = lookup_location(user.location)
# Add it to our counter
if location:
location = location.split(',')[-1].strip()
if location in locations:
locations[location] += 1
else:
locations[location] = 1

# We reformat the output fo locations
# Done for two reasons
# - 1) Some locations have two entries (e.g., United States and United States of America)
# - 2) To map them into a simple format to join it with GeoPandas
reformat = {'alpha_3': [], 'followers': []}
for location in locations:
print(location, locations[location])
loc = lookup_country_code(location)
if loc in reformat['alpha_3']:
index = reformat['alpha_3'].index(loc)
reformat['followers'][index] += locations[location]
else:
reformat['alpha_3'].append(loc)
reformat['followers'].append(locations[location])

# Convert the reformat into a dictionary to join (merge) with GeoPandas
followers = pd.DataFrame.from_dict(reformat)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 1000)
pd.set_option('display.max_rows', 300)
print(followers.sort_values(by=['followers'], ascending=False))

if __name__ == "__main__":
process()
```

That makes it ready to join (merge) with GeoPandas.

## Step 5: Merge it with GeoPandas and show the choropleth map

Now for the fun part. We only need to load the geo data from GeoPandas and merge our newly created DataFrame with it. Finally, plot and show it using matplotlib.pyplot.

```import tweepy
import pycountry
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim

# Used to connect to the Twitter API
# You need your own keys/secret/tokens here
consumer_key = "--- INSERT YOUR KEY HERE ---"
consumer_secret = "--- INSERT YOUR SECRET HERE ---"
access_token = "--- INSERT YOUR TOKEN HERE ---"
access_token_secret = "--- INSERT YOUR TOKEN SECRET HERE ---"

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
return api

# Helper function to map country names to alpha_3 representation
# Some are not supported - and are hard-coded in
# Function used to map country names from GeoPandas and the country names from geo_locator
def lookup_country_code(country):
try:
alpha_3 = pycountry.countries.lookup(country).alpha_3
return alpha_3
except LookupError:
if country == 'The Netherlands':
country = 'NLD'
elif country == 'Democratic Republic of the Congo':
country = 'COG'
return country

# Used to map the twitter user location description to a standard format
def lookup_location(location):
geo_locator = Nominatim(user_agent="LearnPython")
try:
location = geo_locator.geocode(location, language='en')
except GeocoderTimedOut:
return None
return location

# This function is used to process it all
def process():
# Connecting to the twitter api

# Get the list of all your followers - it only gives user-id's
# - we need to gather all user data after
# - This only returns first 5,000 followers, then you need to cursor to get more.
followers = api.get_follower_ids()
print("Followers", len(followers))

# Used to store all the locations from users
locations = {}

# We need to chunk it up in sizes of 100 (max for api.lookup_users)
followers_chunks = [followers[i:i + 100] for i in range(0, len(followers), 100)]
# Process each chunk - we can call for 100 users per call
for follower_chunk in followers_chunks:
# Get a list of users (with location data)
users = api.lookup_users(user_ids=follower_chunk)
# Process each user to get location
for user in users:
# Call used to transform users description of location to same format
location = lookup_location(user.location)
# Add it to our counter
if location:
location = location.split(',')[-1].strip()
if location in locations:
locations[location] += 1
else:
locations[location] = 1

# We reformat the output fo locations
# Done for two reasons
# - 1) Some locations have two entries (e.g., United States and United States of America)
# - 2) To map them into a simple format to join it with GeoPandas
reformat = {'alpha_3': [], 'followers': []}
for location in locations:
print(location, locations[location])
loc = lookup_country_code(location)
if loc in reformat['alpha_3']:
index = reformat['alpha_3'].index(loc)
reformat['followers'][index] += locations[location]
else:
reformat['alpha_3'].append(loc)
reformat['followers'].append(locations[location])

# Convert the reformat into a dictionary to join (merge) with GeoPandas
followers = pd.DataFrame.from_dict(reformat)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 1000)
pd.set_option('display.max_rows', 300)
print(followers.sort_values(by=['followers'], ascending=False))

# Remove the columns not needed
world = world.drop(['pop_est', 'continent', 'iso_a3', 'gdp_md_est'], axis=1)
# Map the same naming convention as followers (the above DataFrame)
# - this step is needed, because the iso_a3 column was missing a few countries
world['iso_a3'] = world.apply(lambda row: lookup_country_code(row['name']), axis=1)
# Merge the tables (DataFrames)
table = world.merge(followers, how="left", left_on=['iso_a3'], right_on=['alpha_3'])

# Plot the data in a graph
table.plot(column='followers', figsize=(8, 6))
plt.show()

if __name__ == "__main__":
process()
```

Resulting in the following output (for PythonWithRune twitter account (not yours)).