deepFace + Python: Create a Look-alike Algorithm

What will we cover in this tutorial?

We will explore the deepFace library, which includes the state of the art face recognition algorithm. deepFace is a Python library as we like it you can do complicated stuff with only a few lines of code.

In this tutorial we will use the deepFace library to create a look-alike algorithm. That is, we will calculate which movie star you look most like. Your movie-star look-alike.

Step 1: Collect your library of movie stars you want to compare yourself to

Well, this requires you use pictures that are available of movie stars online on the internet.

My library consists of the following images.

My library of images

Just to clarify, the deepFace is not part of the library. I wonder if it can detect a face on it?

Let’s try that (the below code is not part of the official tutorial). If you need help to install the deepFace library read this tutorial.

from deepface import DeepFace

result = DeepFace.analyze("deepFaceLogo.png", actions=['age', 'gender', 'race', 'emotion'])
print(result)

Where we have the deepFaceLogo.png to be the following image.

ValueError: Face could not be detected. Please confirm that the picture is a face photo or consider to set enforce_detection param to False.

Bummer. The deepFace library cannot detect a face in it’s logo.

Well, back on track. Find a collection of movie stars and save them in a folder.

Step 2: Understand how easy deepFace is to use

A modern face recognition pipeline consists of 4 common stages: detectalignrepresent and verifyDeepFacehandles all these common stages in the background.

https://pypi.org/project/deepface/

A normal process would first identify where in the picture a face is located (the detect stage). This is needed to remove all unnecessary background in the image and only focus on the face. Then it would proceed to align it, so the eyes in the head are in a horizontal line (the align stage). That is, the head is not tilting to one side. This makes the face recognition algorithm work better as all faces will have the same alignment. Further, we will change the representation of the image (the aligned face part) in the model. Finally, verify if the distance is close. That is, the smaller distance from the image, the more we are certain it is the right person.

Let’s take a simple example. Let’s compare if I am Angelina Jolie. (If you need help to install the deepFace library read this tutorial).

from deepface import DeepFace

result = DeepFace.verify("rune.jpg", "pics-db/Angelina Jolie.jpg", model_name="VGG-Face")
print(result)

I have added the picture of me (rune.jpg) in the folder where I run my Python program and a pics-db with a picture of Angelina Jolie.

And the result looks like this.

{
  'verified': False,
  'distance': 0.7834609150886536,
  'max_threshold_to_verify': 0.4,
  'model': 'VGG-Face',
  'similarity_metric': 'cosine'
}

The algorithm has determined that I am not Angelina Jolie (‘verified’: False). Now to the interesting part, it has a distance (‘distance’: 0.7834609150886536), which we will use to determine which movie star you look the most like. Here it is important to understand, that the lower the distance, the more you look-alike. Also, you see that the maximum threshold to verify is 0.4. That is the distance should be less than 0.4 to conclude that it is the same person in the picture.

Step 3: Making our Look-alike algorithm

We have collected a library of images and located them in pics-db. Then I try with a picture of me (the same one I used in last step). (If you need help to install the deepFace library read this tutorial).

from deepface import DeepFace
import glob

pic = "rune.jpeg"
files = glob.glob("pics-db/*")

model = "VGG-Face"
results = []
for file in files:
    result = DeepFace.verify(pic, file, model_name=model)
    results.append((file, result['distance']))

results.sort(key=lambda x: x[1])
print("Model:", model)
for file, distance in results:
    print(file, distance)

Now I am a bit nervous, which one do I look the most like?

Model: VGG-Face
pics-db/Tom Cruise.jpg 0.4702601432800293
pics-db/The Rock.jpg 0.493824303150177
pics-db/Robert Downey Jr.jpg 0.4991753101348877
pics-db/Daniel Craig.jpg 0.5135003626346588
pics-db/Christian Bale.jpg 0.5176380276679993
pics-db/Brad Pitt.jpg 0.5225759446620941
pics-db/Will Smith.jpg 0.5245362818241119
pics-db/Michael Douglas.png 0.5407796204090118
pics-db/Keanu Reeves.jpg 0.5416552424430847
pics-db/Angelina Jolie.jpg 0.7834609150886536

Wow. I like this immediately. Tom Cruise, The Rock, Robert Downey Jr. Luckily, I look the least like Angelina Jolie, which is not that surprising (At least I would think so).

Are we done?

Maybe, it depends. I guess you can make an App or Web-service with a movie star look-alike algorithm.

You can also play around with different models. The deepFace library contains the following: “VGG-Face”, “Facenet”, “OpenFace”, “DeepFace”, “DeepID”, and “Dlib”.

The results are not the same.

Model: Facenet
pics-db/Tom Cruise.jpg 0.7826492786407471
pics-db/Brad Pitt.jpg 0.870269775390625
pics-db/The Rock.jpg 0.8774074390530586
pics-db/Keanu Reeves.jpg 0.9102083444595337
pics-db/Daniel Craig.jpg 0.914682649075985
pics-db/Christian Bale.jpg 0.9467008262872696
pics-db/Robert Downey Jr.jpg 1.0119212558493018
pics-db/Angelina Jolie.jpg 1.0243268758058548
pics-db/Michael Douglas.png 1.067187249660492
pics-db/Will Smith.jpg 1.2313244044780731

The Facenet model says I look less like Will Smith and more like Angelina Jolie. Not sure I trust this one.

If you enjoy this I recommend you read the following tutorial by the author of deepface Sefik Ilkin Serengil.

How to Get Started with DeepFace using PyCharm

What will we cover in this tutorial?

In this tutorial we will show you how to setup your virtual environment in PyCharm to use Deepface. Then run a small program from DeepFace.

This tutorial has been done for both Mac and Windows. See the additional notes for Windows at the end if you experience problems.

Step 1: Importing the DeepFace library and run our first program

You know how it works.

from deepface import DeepFace


demography = DeepFace.analyze("angelina.jpg", actions=['age', 'gender', 'race', 'emotion'])
print("Age: ", demography["age"])
print("Gender: ", demography["gender"])
print("Emotion: ", demography["dominant_emotion"])
print("Race: ", demography["dominant_race"])

You see that deepface is not available and click to install it.

It seems to work. But then…

Okay. Let’s do it the easy way. Just add a import dlib in your code.

But it fails.

Step 2: Install CMake to make it work

If you search the internet you will see you need also to install CMake to make DeepFace work.

So let’s import that as well.

We end up with this code.

from deepface import DeepFace
import cmake
import dlib


demography = DeepFace.analyze("angelina.jpg", actions=['age', 'gender', 'race', 'emotion'])
print("Age: ", demography["age"])
print("Gender: ", demography["gender"])
print("Emotion: ", demography["dominant_emotion"])
print("Race: ", demography["dominant_race"])

Where you first install cmake by putting your mouse on top of the red line under cmake and choose install. Then you do the same for dlib.

And by magic it works.

Step 3: Run our little program

You need to get the picture of Angelina (angelina.jpg). Apparently, the authors of this DeepFace have a thing for her and use her as an example.

Angelina

Then when you run the program you will get the following output, as it needs to download a lot of stuff.

Actions to do:  ['age', 'gender', 'race', 'emotion']
facial_expression_model_weights.h5 will be downloaded...
Downloading...
From: https://drive.google.com/uc?id=13iUHHP3SlNg53qSuQZDdHDSDNdBP9nwy
To: /Users/admin/.deepface/weights/facial_expression_model_weights.zip
5.54MB [00:00, 10.4MB/s]
age_model_weights.h5 will be downloaded...
Downloading...
From: https://drive.google.com/uc?id=1YCox_4kJ-BYeXq27uUbasu--yz28zUMV
To: /Users/admin/.deepface/weights/age_model_weights.h5
539MB [01:04, 8.36MB/s]
Downloading...
From: https://drive.google.com/uc?id=1wUXRVlbsni2FN9-jkS_f4UTUrm1bRLyk
To: /Users/admin/.deepface/weights/gender_model_weights.h5
79.7MB [00:08, 10.1MB/s]gender_model_weights.h5 will be downloaded...
537MB [00:58, 9.16MB/s]
Downloading...
From: https://drive.google.com/uc?id=1nz-WDhghGQBC4biwShQ9kYjvQMpO6smj
To: /Users/admin/.deepface/weights/race_model_single_batch.zip
78.3MB [00:08, 9.37MB/s]race_model_single_batch.h5 will be downloaded...
511MB [00:54, 9.35MB/s]
Analyzing:   0%|          | 0/1 [00:00<?, ?it/s]
Finding actions:   0%|          | 0/4 [00:00<?, ?it/s]
Action: age:   0%|          | 0/4 [00:00<?, ?it/s]    
Action: age:  25%|██▌       | 1/4 [00:01<00:05,  1.79s/it]
Action: gender:  25%|██▌       | 1/4 [00:01<00:05,  1.79s/it]
Action: gender:  50%|█████     | 2/4 [00:02<00:03,  1.54s/it]
Action: race:  50%|█████     | 2/4 [00:02<00:03,  1.54s/it]  
Action: race:  75%|███████▌  | 3/4 [00:03<00:01,  1.23s/it]
Action: emotion:  75%|███████▌  | 3/4 [00:03<00:01,  1.23s/it]
Action: emotion: 100%|██████████| 4/4 [00:03<00:00,  1.13it/s]
Analyzing:   0%|          | 0/1 [00:03<?, ?it/s]

Age:  33.10586443589396
Gender:  Woman
Emotion:  neutral
Race:  white

Hmm… Apparently she is 33.1 years old on that picture. Angelina is also a woman and has neutral emotions. Finally, she is white.

Let’s try to see what it says about me.

Me

I am happy to see this.

Age:  27.25606073226045
Gender:  Man
Emotion:  happy
Race:  white

I am still in my 20ies. How can you not love that. DeepFace is considered to be top of the art, so let’s not doubt that.

Additional notes for Windows

To ensure that it also worked on Windows, I tried it out and ran into a few challenges there.

First of all, I was using a 32-bit version of Python, which was causing tensorflow not to be installed our found a matching library. Hence, make sure to install Python 64-bit version on Windows. You can have both versions running in parallel. The easiest way to get PyCharm to use your 64-bit version is to create a new project and make the default Python using the 64-bit version.

Secondly, it was still having troubles. It was not having a new enough version of C/C++ compiler. In this case I needed to update my Visual Studio.

Then the above tutorial just ran like a charm on PyCharm.

How to Get Started with Yolo in Python

What will we cover in this tutorial?

How do you start with YOLO in Python? What to download? This tutorial will also cover a simple guide to how to use it in Python. The code has is as simple as possible with explanation.

Step 1: Download the Yolo stuff

The easy was to get things working is to just download the repository from GitHub as a zip file. You find the darknet repository here.

You can also reach and download it as a zip directly form here. The zip-file should be unpacked in the folder, where you develop you code. I renamed the resulting folder to yolo.

The next thing you need is the trained model, which you find on https://pjreddie.com/darknet/yolo/. Look for the following on the page and click on the weights.

We will use the YOLOv3-tiny, which you also can get directly from here.

The downloaded file should be placed in the folder where you develop your code.

Step 2: Load the network and apply it on an image

The code below is structured as follows. First you configure the location of the downloaded repository. Remember, I put it in the folder where I run my program and renamed it to yolo.

It then loads the labels of the possible objects, which a located in a file called coco.names. This is simply because the labels the network will give are indices into the names of coco.names. Further, it assigns some random colors to the labels, such that different labels have different colors.

After that it will read the network. Then it divides it into layers. It is a it unintuitive, but in the case of yolov3-tiny.cfg, it needs only two layers which it gets there.

It loads the image (from the repository), transforms it into a blob that the network understands and runs it on it.

import numpy as np
import time
import cv2
import os


DARKNET_PATH = 'yolo'

# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()

# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()

Then we need to parse the result in layer_outputs.

Step 3: Parse the result form layer_outputs (Yolo output)

This is at first a bit tricky. You need first to understand the overall flow.

First, you will run through all the results in the layers (we have two layers). Second, you will remove overlapped results, as there might be multiple boxes that identify the same object, just from a bit different boundary boxes. Third, and finally, you need to draw the remaining boxes with labels (and colors) on the image.

To go through that process we need three lists to keep track of it. One for the actual boxes that encapsulates the identified object (boxes). Then the corresponding confidence (confidence), that is, how sure is the algorithm. Finally, the class id, which is used to identify the name we have in the labels (class_ids).

The detection is the result, which in the 4 first entries has the position and size of the identified object. Then the following entries contains the confidence score on all the possible objects in the network.

# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []

# Loop over the layers
for output in layer_outputs:
    # For the layer loop over all detections
    for detection in output:
        # The detection first 4 entries contains the object position and size
        scores = detection[5:]
        # Then it has detection scores - it takes the one with maximal score
        class_id = np.argmax(scores).item()
        # The maximal score is the confidence
        confidence = scores[class_id].item()

        # Ensure we have some reasonable confidence, else ignorre
        if confidence > 0.3:
            # The first four entries have the location and size (center, size)
            # It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
            box = detection[0:4] * np.array([w, h, w, h])
            center_x, center_y, width, height = box.astype(int).tolist()

            # Calculate the upper corner
            x = center_x - width//2
            y = center_y - height//2

            # Add our findings to the lists
            boxes.append([x, y, width, height])
            confidences.append(confidence)
            class_ids.append(class_id)

# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)

# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
    # Loop over the indexes we are keeping
    for i in idxs.flatten():
        # Get the box information
        x, y, w, h = boxes[i]

        # Make a rectangle
        cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
        # Make and add text
        text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
        cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
                    0.5, colors[class_ids[i]], 2)

# Write the image with boxes and text
cv2.imwrite("example.png", image)
Resulting image

The full code together

The full source code put together.

import numpy as np
import time
import cv2
import os


DARKNET_PATH = 'yolo'

# Read labels that are used on object
labels = open(os.path.join(DARKNET_PATH, "data", "coco.names")).read().splitlines()
# Make random colors with a seed, such that they are the same next time
np.random.seed(0)
colors = np.random.randint(0, 255, size=(len(labels), 3)).tolist()

# Give the configuration and weight files for the model and load the network.
net = cv2.dnn.readNetFromDarknet(os.path.join(DARKNET_PATH, "cfg", "yolov3-tiny.cfg"), "yolov3-tiny.weights")
# Determine the output layer, now this piece is not intuitive
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load the image
image = cv2.imread(os.path.join(DARKNET_PATH, "data", "dog.jpg"))
# Get the shape
h, w = image.shape[:2]
# Load it as a blob and feed it to the network
blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
start = time.time()
# Get the output
layer_outputs = net.forward(ln)
end = time.time()


# Initialize the lists we need to interpret the results
boxes = []
confidences = []
class_ids = []

# Loop over the layers
for output in layer_outputs:
    # For the layer loop over all detections
    for detection in output:
        # The detection first 4 entries contains the object position and size
        scores = detection[5:]
        # Then it has detection scores - it takes the one with maximal score
        class_id = np.argmax(scores).item()
        # The maximal score is the confidence
        confidence = scores[class_id].item()

        # Ensure we have some reasonable confidence, else ignorre
        if confidence > 0.3:
            # The first four entries have the location and size (center, size)
            # It needs to be scaled up as the result is given in relative size (0.0 to 1.0)
            box = detection[0:4] * np.array([w, h, w, h])
            center_x, center_y, width, height = box.astype(int).tolist()

            # Calculate the upper corner
            x = center_x - width//2
            y = center_y - height//2

            # Add our findings to the lists
            boxes.append([x, y, width, height])
            confidences.append(confidence)
            class_ids.append(class_id)

# Only keep the best boxes of the overlapping ones
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.3, 0.3)

# Ensure at least one detection exists - needed otherwise flatten will fail
if len(idxs) > 0:
    # Loop over the indexes we are keeping
    for i in idxs.flatten():
        # Get the box information
        x, y, w, h = boxes[i]

        # Make a rectangle
        cv2.rectangle(image, (x, y), (x + w, y + h), colors[class_ids[i]], 2)
        # Make and add text
        text = "{}: {:.4f}".format(labels[class_ids[i]], confidences[i])
        cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
                    0.5, colors[class_ids[i]], 2)

# Write the image with boxes and text
cv2.imwrite("example.png", image)

OpenCV: A Simple Approach to Counting Cars

KISS – Keep it simple s…

In this tutorial we will make a simple car counter using OpenCV from Python. It will not be a perfect solution, but it will be easy to understand and in some cases better.

The counter will take advantage of the simple assumptions that objects that move through a defined box on the right side of road are cars driving in one direction. And objects moving through a defined box of the left side of the road are cars driving the other direction.

This is of course not a perfect assumption, but it makes things easier. There is no need to identify if it is car or not. This is actually an advantage, since by the default car cascade classifiers might not recognize cars from the angle your camera is set. At least, I had problems with that. I could train my own cascade classifier, but why not try to do something smart.

Step 1: Get a live feed from the webcam in OpenCV

First you need to ensure you have installed OpenCV. If you use PyCharm we can recommend you read this tutorial on how to set it up.

To get a live feed from your webcam can be achieved by the following lines of code.

import cv2


cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

while cap.isOpened():
    _, frame = cap.read()

    cv2.imshow("Car counter", frame)

    if cv2.waitKey(1) &amp; 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

The cv2.VideoCapture(0) assumes that you only have one webcam. If you have more, you might need to change 0 to something else.

The cap.set(…) are setting the width and height of the camera frames. In order to get good performance it is good to scale down. This can also be achieved with scaling the picture you make processing on after down.

Then cap.read() reads the next frame. It also returns a return value, but we ignore that value with the underscore (_). The cv2.imshow(…) will create a window with showing the frame. Finally, the cv2.waitkey(1) waits 1 millisecond and check if q was pressed. If so, it will break out and release the camera and destroy the window.

Step 2: Identify moving objects with OpenCV

The simple idea is that to compare each frame with the previous one. If there is a difference, we have a moving object. Of course, a bit more complex, as we also want to identify where the objects are and avoid identifying differences due to noise in the picture.

As most processing on moving images we will start by converting them to gray tones (cv2.cvtColor(…)). Then we will use blurring to minimize details in the picture (cv2.GaussianBlur(…)). This helps us with falsely identifying moving things that are just because of noise and minor changes.

When that is done, we compare that converted frame with the one from previous frame (cv2.absdiff(…)). This gives you an idea of what has changed. We keep a threshold (cv2.threshold(…)) on it and then dilate (cv2.dilate(…)) change to make it easier to identify with cv2.findContours(…).

It boils down to the following code.

import cv2
import imutils


cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# We will keep the last frame in order to see if there has been any movement
last_frame = None

while cap.isOpened():
    _, frame = cap.read()

    # Processing of frames are done in gray
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # We blur it to minimize reaction to small details
    gray = cv2.GaussianBlur(gray, (21, 21), 0)

    # Need to check if we have a last_frame, if not get it
    if last_frame is None:
        last_frame = gray
        continue

    # Get the difference from last_frame
    delta_frame = cv2.absdiff(last_frame, gray)
    last_frame = gray
    # Have some threshold on what is enough movement
    thresh = cv2.threshold(delta_frame, 25, 255, cv2.THRESH_BINARY)[1]
    # This dilates with two iterations
    thresh = cv2.dilate(thresh, None, iterations=2)
    # Returns a list of objects
    contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    # Converts it
    contours = imutils.grab_contours(contours)

    # Loops over all objects found
    for contour in contours:
        # Get's a bounding box and puts it on the frame
        (x, y, w, h) = cv2.boundingRect(contour)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

    # Let's show the frame in our window
    cv2.imshow("Car counter", frame)

    if cv2.waitKey(1) &amp; 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

If you don’t look out for what is happening it could turn out to a picture like this one (I am sure you take more care than me).

Example frame of moving objects.

One thing to notice is, that we could make a lower limit on the sizes of the moving objects. This can be achieved by inserting a check before we make the green boxes.

Step 3: Creating a helper class to track counts

To make our life easier we introduce a helper class to represent a box on the screen that keeps track on how many objects have been moving though it.

class Box:
    def __init__(self, start_point, width_height):
        self.start_point = start_point
        self.end_point = (start_point[0] + width_height[0], start_point[1] + width_height[1])
        self.counter = 0
        self.frame_countdown = 0

    def overlap(self, start_point, end_point):
        if self.start_point[0] >= end_point[0] or self.end_point[0] <= start_point[0] or \
                self.start_point[1] >= end_point[1] or self.end_point[1] <= start_point[1]:
            return False
        else:
            return True

The class will take the staring point (start_point) and the width and height (width_height) to the constructor. As we will need start_point and end_point when drawing the box in the frame we calculate that immediately in the constructor (__init__(…)).

Further, we will have a counter to keep track on how many object have passed through the box. There is also a frame_countdown, which is used to minimize multiple counts of the same moving object. What can happen is that in one frame the moving object is identified, while in the next it is not, but then it is identified again. If that all happens within the box, it will count the object twice. Hence, we will have countdown that says we need at minimum number of frames between identified moving objects before we can assume it is a new one.

Step 4: Using the helper class and start the counting

We need to add all the code together here.

It requires a few things. Before we enter the main while loop, we need to setup the boxes we want to count moving objects in. Here we setup two, which will be one for each direction the cars can drive. Inside the contours loop, we set a lower limit of the contour sizes. Then we go through all the boxes and update the appropriate variables and build the string text. After that, it will print the text in the frame as well as add all the boxes to it.

import cv2
import imutils


class Box:
    def __init__(self, start_point, width_height):
        self.start_point = start_point
        self.end_point = (start_point[0] + width_height[0], start_point[1] + width_height[1])
        self.counter = 0
        self.frame_countdown = 0

    def overlap(self, start_point, end_point):
        if self.start_point[0] >= end_point[0] or self.end_point[0] <= start_point[0] or \
                self.start_point[1] >= end_point[1] or self.end_point[1] <= start_point[1]:
            return False
        else:
            return True


cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# We will keep the last frame in order to see if there has been any movement
last_frame = None

# To build a text string with counting status
text = ""

# The boxes we want to count moving objects in
boxes = []
boxes.append(Box((100, 200), (10, 80)))
boxes.append(Box((300, 350), (10, 80)))

while cap.isOpened():
    _, frame = cap.read()

    # Processing of frames are done in gray
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    # We blur it to minimize reaction to small details
    gray = cv2.GaussianBlur(gray, (5, 5), 0)

    # Need to check if we have a lasqt_frame, if not get it
    if last_frame is None or last_frame.shape != gray.shape:
        last_frame = gray
        continue

    # Get the difference from last_frame
    delta_frame = cv2.absdiff(last_frame, gray)
    last_frame = gray
    # Have some threshold on what is enough movement
    thresh = cv2.threshold(delta_frame, 25, 255, cv2.THRESH_BINARY)[1]
    # This dilates with two iterations
    thresh = cv2.dilate(thresh, None, iterations=2)
    # Returns a list of objects
    contours = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    # Converts it
    contours = imutils.grab_contours(contours)

    # Loops over all objects found
    for contour in contours:
        # Skip if contour is small (can be adjusted)
        if cv2.contourArea(contour) < 500:
            continue

        # Get's a bounding box and puts it on the frame
        (x, y, w, h) = cv2.boundingRect(contour)
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # The text string we will build up
        text = "Cars:"
        # Go through all the boxes
        for box in boxes:
            box.frame_countdown -= 1
            if box.overlap((x, y), (x + w, y + h)):
                if box.frame_countdown <= 0:
                    box.counter += 1
                # The number might be adjusted, it is just set based on my settings
                box.frame_countdown = 20
            text += " (" + str(box.counter) + " ," + str(box.frame_countdown) + ")"

    # Set the text string we build up
    cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_PLAIN, 2, (0, 255, 0), 2)

    # Let's also insert the boxes
    for box in boxes:
        cv2.rectangle(frame, box.start_point, box.end_point, (255, 255, 255), 2)

    # Let's show the frame in our window
    cv2.imshow("Car counter", frame)

    if cv2.waitKey(1) &amp; 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Step 5: Real life test on counting cars (not just moving objects)

The real question is, does it work or did we oversimplify the problem. If it works we have created a very small piece of code (comparing to other implementations), which can count cars all day long.

I adjusted the parameters a bit and got the following with my first real trial.

Counting correctly.

Please notice, that it does not start from zero in the video. But it counts the number of cars in each direction correctly. As expected, it counts when each car reaches the white bar.

The number of cars is the first number, while the second is just visible for me to see if my guess of skipping frames was useable.

Are we done?

Not at all. This was just to see if we could make some simple and fast to count cars. My first problem was, that given the trained sets of car recognition (car cascade classifiers) was not happy about the angle on the cars from my window. I first thought of training my own cascade classifier, but I thought it was fun to try something more simple.

There are a lot of parameters which can be tuned to make it more reliable, but the main fact is, that it was counting correctly in the given test. I can see one challenge, if a big truck drives by from the left to the right, it might get in the way of the other counter. This could be a potential challenge with this simple approach.

Master Markowitz Portfolio Optimization (Efficient Frontier) in Python using Pandas

What is Markowitz Portfolios Optimization (Efficient Frontier)?

The Efficient Frontier takes a portfolio of investments and optimizes the expected return in regards to the risk. That is to find the optimal return for a risk.

According to investopedia.org the return is based on the expected Compound Annual Growth Rate (CAGR) and risk metric is the standard deviation of the return.

But what does all that mean? We will learn that in this tutorial.

Step 1: Get the time series of your stock portfolio

We will use the following portfolio of 4 stocks of Apple (AAPL), Microsoft (MSFT), IBM (IBM) and Nvidia (NVDA).

To get the time series we will use the Yahoo! Finance API through the Pandas-datareader.

We will look 5 years back.

import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from dateutil.relativedelta import relativedelta

years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL','MSFT','IBM','NVDA']
for ticker in tickers:
  tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
  close_price[ticker] = tmp['Close']

print(close_price)

Resulting in the following output (or the first few lines).

                  AAPL        MSFT         IBM        NVDA
Date                                                      
2015-08-25  103.739998   40.470001  140.960007   20.280001
2015-08-26  109.690002   42.709999  146.699997   21.809999
2015-08-27  112.919998   43.900002  148.539993   22.629999
2015-08-28  113.290001   43.930000  147.979996   22.730000
2015-08-31  112.760002   43.520000  147.889999   22.480000

It will contain all the date time series for the last 5 years from current date.

Step 2: Calculate the CAGR, returns, and covariance

To calculate the expected return, we use the Compound Average Growth Rate (CAGR) based on the last 5 years. The CAGR is used as investopedia suggest. An alternative that also is being used is the mean of the returns. The key thing is to have some common measure of the return.

The CAGR is calculated as follows.

CAGR = (end-price/start-price)^(1/years) – 1

We will also calculate the covariance as we will use that the calculate the variance of a weighted portfolio. Remember that the standard deviation is given by the following.

sigma = sqrt(variance)

A portfolio is a vector w with the balances of each stock. For example, given w = [0.2, 0.3, 0.4, 0.1], will say that we have 20% in the first stock, 30% in the second, 40% in the third, and 10% in the final stock. It all sums up to 100%.

Given a weight w of the portfolio, you can calculate the variance of the stocks by using the covariance matrix.

variance = w^T Cov w

Where Cov is the covariance matrix.

This results in the following pre-computations.

returns = close_price/close_price.shift(1)
cagr = (close_price.iloc[-1]/close_price.iloc[0])**(1/years) - 1
cov = returns.cov()

print(cagr)
print(cov)

Where you can see the output here.

# CACR:
AAPL    0.371509
MSFT    0.394859
IBM    -0.022686
NVDA    0.905011
dtype: float64

# Covariance
          AAPL      MSFT       IBM      NVDA
AAPL  0.000340  0.000227  0.000152  0.000297
MSFT  0.000227  0.000303  0.000164  0.000306
IBM   0.000152  0.000164  0.000260  0.000210
NVDA  0.000297  0.000306  0.000210  0.000879

Step 3: Plot the return and risk

This is where the power of computing comes into the picture. The idea is to just try a random portfolio and see how it rates with regards to expected return and risk.

It is that simple. Make a random weighted distribution of your portfolio and plot the point of expected return (based on our CAGR) and the risk based on the standard deviation calculated by the covariance.

import matplotlib.pyplot as plt
import numpy as np

def random_weights(n):
    k = np.random.rand(n)
    return k / sum(k)

exp_return = []
sigma = []
for _ in range(20000):
  w = random_weights(len(tickers))
  exp_return.append(np.dot(w, cagr.T))
  sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))

plt.plot(sigma, exp_return, 'ro', alpha=0.1) 
plt.show()

We introduce a helper function random_weights, which returns a weighted portfolio. That is, it returns a vector with entries that sum up to one. This will give a way to distribute our portfolio of stocks.

Then we iterate 20.000 times (could be any value, just want to have enough to plot our graph), where we make a random weight w, then calculate the expected return by the dot-product of w and cagr-transposed. This is done by using NumPy’s dot-product function.

What a dot-product of np.dot(w, cagr.T) does is to take elements pairwise from w and cagr and multiply them and sum up. The transpose is only about the orientation of it to make it work.

The standard deviation (assigned to sigma) is calculated similar by the formula given in the last step: variance = w^T Cov w (which has dot-products between).

This results in the following graph.

Returns vs risks

This shows a graph which outlines a parabola. The optimal values lie along the upper half of the parabola line. Hence, given a risk, the optimal portfolio is one corresponding on the upper boarder of the filled parabola.

Considerations

The Efficient Frontier gives you a way to balance your portfolio. The above code can by trial an error find such a portfolio, but it still leaves out some consideratoins.

How often should you re-balance? It has a cost to do that.

The theory behind has some assumptions that may not be a reality. As investopedia points out, it assumes that asset returns follow a normal distribution, but in reality returns can be more the 3 standard deviations away. Also, the theory builds upon that investors are rational in their investment, which is by most considered a flawed assumption, as more factors play into the investments.

The full source code

Below here you find the full source code from the tutorial.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import numpy as np


years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL', 'MSFT', 'IBM', 'NVDA']
for ticker in tickers:
    tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
    close_price[ticker] = tmp['Close']

returns = close_price / close_price.shift(1)
cagr = (close_price.iloc[-1] / close_price.iloc[0]) ** (1 / years) - 1
cov = returns.cov()

def random_weights(n):
    k = np.random.rand(n)
    return k / sum(k)

exp_return = []
sigma = []
for _ in range(20000):
    w = random_weights(len(tickers))
    exp_return.append(np.dot(w, cagr.T))
    sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))

plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()

Install OpenCV 4 in PyCharm

What will we cover?

You want to start you first OpenCV project in PyCharm.

import cv2

And you get.

From PyCharm

You press Install package cv2, but you get.

Error message from PyChar (lower right corner).

What to do? No worries. We will cover that in this survival guide and it is not complex.

Step 1: Understand how PyCharm works with a Virtual Environment

When you create a new project in PyCharm you get promoted by this screen (PyCharm 2020.2).

Creating a project OpenCV in PyCharm

It says Python interpreter: New Virtualenv environment. What does that mean?

Well, it creates a isolated environment to have your project in. Then each project can have it own dependencies and libraries without impacting other projects.

Remember kindergarten? There was only one sandbox, and there was not enough for multiple projects in it. Like building a sand castle, making a river, and what ever you did as kid. The problem was, if you wanted to build a castle while your fellow kindergarten friends wanted to play mountain collapse (you know when a mountain collapses). Then their game would destroy your well engineered 5 feet tall castle. It was the beginning of a riot.

Think of a kindergarten where there is one sandbox for each project you could image. One for castle building. One for mountain collapse. You see. Now everyone can play in their own world or environment.

The virtual environment is like that. You can go crazy in it without destroying other awesome projects you do. Hence, if you feel like making a mountain collapse project, you should not fear it will destroy your well engineered castle project.

Step 2: How does this virtual environment work, and why does it matter for OpenCV?

Good question.

If you follow some manual online you might end up installing OpenCV on your base system and not in the virtual environment in your project.

But where is the virtual environment located. It depends on two things. First, where are your projects located. Second, what is the name of your project.

I used the default location when I installed PyCharm, which is PyCharmProjects in my home folder. Further, in this case I called the project OpenCV.

If I open a command line I can type the following to get to the location.

Command line terminal

Then you will see a folder called venv, which is short for virtual environment. Go into that folder and follow down in the bin (binary) folder.

Command line terminal

Now you are located where you can install the OpenCV library.

Step 3: Installing the OpenCV library in your virtual environment

Use pip is the package manager system for Python. You want to ensure you use the pip from the above library.

./pip install opencv-python
From command line terminal

You might get a bit different output, as I already had the library cached.

Back in PyCharm it will update and look like this.

Back in PyCharm the red line disappeared

Now you are ready for your first test.

Step 4: Testing that OpenCV works

Let’s find a picture.

Castle

Download the above image and save it as Castle.png in your project folder.

import cv2

img = cv2.imread("Castle.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2.imshow("Over the Clouds", img)
cv2.imshow("Over the Clouds - gray", gray)

cv2.waitKey(0)
cv2.destroyAllWindows()

Which should result in something like this.

The end result

RandomForestClassifier: Predict Stock Market Direction

What will we cover in this tutorial?

A Forest Classifier is an approach to minimize the heavy bias a Decision Tree can get. A forest classifier simply contains a set of decision trees and uses majority voting to make the prediction.

In this tutorial we will try to use that on the stock market, by creating a few indicators. This tutorial will give a framework to explore if it can predict the direction of a stock. Given a set of indicators, will the stock go up or down the next trading day.

This is a simplified problem of predicting the actual stock value the next day.

Step 1: Getting data and calculate some indicators

If you are new to stock indicators, we can highly recommend you to read about the MACD, RSI, Stochastic Oscillator, where the MACD also includes how to calculate the EMA. Here we assume familiarity to those indicators. Also, that you are familiar with Pandas DataFrames and Pandad-datareader.

import pandas_datareader as pdr
import datetime as dt
import numpy as np

ticker = "^GSPC" # The S&amp;P 500 index
data = pdr.get_data_yahoo(ticker, dt.datetime(2010,1,1), dt.datetime.now(), interval='d')

# Calculate the EMA10 > EMA30 signal
ema10 = data['Close'].ewm(span=10).mean()
ema30 = data['Close'].ewm(span=30).mean()
data['EMA10gtEMA30'] = np.where(ema10 > ema30, 1, -1)

# Calculate where Close is > EMA10
data['ClGtEMA10'] = np.where(data['Close'] > ema10, 1, -1)

# Calculate the MACD signal
exp1 = data['Close'].ewm(span=12).mean()
exp2 = data['Close'].ewm(span=26).mean()
macd = exp1 - exp2
macd_signal = macd.ewm(span=9).mean()
data['MACD'] = macd_signal - macd

# Calculate RSI
delta = data['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
data['RSI'] = 100 - (100/(1 + rs))

# Stochastic Oscillator
high14= data['High'].rolling(14).max()
low14 = data['Low'].rolling(14).min()
data['%K'] = (data['Close'] - low14)*100/(high14 - low14)

# Williams Percentage Range
data['%R'] = -100*(high14 - data['Close'])/(high14 - low14)

days = 6

# Price Rate of Change
ct_n = data['Close'].shift(days)
data['PROC'] = (data['Close'] - ct_n)/ct_n

print(data)

The choice of indicators is arbitrary but among some popular ones. It should be up to you to change them to other indicators and experiment with them.

                  High         Low        Open       Close       Volume   Adj Close  EMA10gtEMA30  ClGtEMA10      MACD         RSI         %K        %R      PROC
Date                                                                                                                                                             

2020-08-17  3387.590088  3379.219971  3380.860107  3381.989990  3671290000  3381.989990             1          1 -2.498718   68.294286  96.789344  -3.210656  0.009164
2020-08-18  3395.060059  3370.149902  3387.040039  3389.780029  3881310000  3389.780029             1          1 -1.925573   69.176468  97.234576  -2.765424  0.008722
2020-08-19  3399.540039  3369.659912  3392.510010  3374.850098  3884480000  3374.850098             1          1 -0.034842   65.419555  86.228281 -13.771719  0.012347
2020-08-20  3390.800049  3354.689941  3360.479980  3385.510010  3642850000  3385.510010             1          1  0.949607   66.805725  87.801036 -12.198964  0.001526
2020-08-21  3399.959961  3379.310059  3386.010010  3397.159912  3705420000  3397.159912             1          1  1.249066   68.301209  97.534948  -2.465052  0.007034

Step 2: Understand the how the Decision Tree works

Trees are the foundation in the Forest. Or Decision Trees are the foundation in a Forest Classifier. Hence, it is a good starting point to understand how a Decision Tree works. Luckily, they are quite easy to understand.

Let’s try to investigate a Decision Tree that is based on two of the indicators above. We take the RSI (Relative Strength Index) and %K (Stochastic Oscillator). A Decision Tree could look like this (depending on the training data).

Decision Tree for %K and RSI

When we get a new data row with %K and RSI indicators, it will start at the top of the Decision Tree.

  • At the first node it will check if %K <= 4.615, if so, take the left child otherwise the right child.
  • The gini tells us how a randomly chosen element would be incorrectly labeled. Hence, a low value close to 0 is good.
  • Samples tells us how many of the samples of the training set reached this node.
  • Finally, the value tells us how the values are distributed. In the final decision nodes, the category of most element is the prediction.

Looking at the above Decision Tree, it does not seem to be very good. The majority of samples end up the fifth node with a gini on 0.498, close to random, right? And it will label it 1, growth.

But this is the idea with Forest Classifiers, it will take a bunch of Decision Trees, that might not be good, and use majority of them to classify it.

Step 3: Create the Forest Classifier

Now we understand how the Decision Tree and the Forest Classifier work, we just need to run the magic. As this is done by calling a library function.

import pandas_datareader as pdr
import datetime as dt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import RandomForestClassifier


ticker = "^GSPC"
data = pdr.get_data_yahoo(ticker, dt.datetime(2010,1,1), dt.datetime.now(), interval='d')

# Calculate the EMA10 > EMA30 signal
ema10 = data['Close'].ewm(span=10).mean()
ema30 = data['Close'].ewm(span=30).mean()
data['EMA10gtEMA30'] = np.where(ema10 > ema30, 1, -1)

# Calculate where Close is > EMA10
data['ClGtEMA10'] = np.where(data['Close'] > ema10, 1, -1)

# Calculate the MACD signal
exp1 = data['Close'].ewm(span=12).mean()
exp2 = data['Close'].ewm(span=26).mean()
macd = exp1 - exp2
macd_signal = macd.ewm(span=9).mean()
data['MACD'] = macd_signal - macd

# Calculate RSI
delta = data['Close'].diff()
up = delta.clip(lower=0)
down = -1*delta.clip(upper=0)
ema_up = up.ewm(com=13, adjust=False).mean()
ema_down = down.ewm(com=13, adjust=False).mean()
rs = ema_up/ema_down
data['RSI'] = 100 - (100/(1 + rs))

# Stochastic Oscillator
high14= data['High'].rolling(14).max()
low14 = data['Low'].rolling(14).min()
data['%K'] = (data['Close'] - low14)*100/(high14 - low14)

# Williams Percentage Range
data['%R'] = -100*(high14 - data['Close'])/(high14 - low14)

days = 6

# Price Rate of Change
ct_n = data['Close'].shift(days)
data['PROC'] = (data['Close'] - ct_n)/ct_n

# Set class labels to classify
data['Return'] = data['Close'].pct_change(1).shift(-1)
data['class'] = np.where(data['Return'] > 0, 1, 0)

# Clean for NAN rows
data = data.dropna()
# Minimize dataset
data = data.iloc[-200:]


# Data to predict
predictors = ['EMA10gtEMA30', 'ClGtEMA10', 'MACD', 'RSI', '%K', '%R', 'PROC']
X = data[predictors]
y = data['class']

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

# Train the model
rfc = RandomForestClassifier(random_state=0)
rfc = rfc.fit(X_train, y_train)

# Test the model by doing some predictions
y_pred = rfc.predict(X_test)

# See how accurate the predictions are
report = classification_report(y_test, y_pred)
print('Model accuracy', accuracy_score(y_test, y_pred, normalize=True))
print(report)

First some notes on a few lines. The train_test_split, divides the data into training set and test set. The test set is set to be 30% of the data. It does it in a randomized way.

Next we create a RandomForestClassifier and fit it.

Then we use our newly created classifier (rfc) to predict on the test set (X_test).

Finally, we calculate the accuracy and the report.

Model accuracy 0.6333333333333333
              precision    recall  f1-score   support

           0       0.56      0.38      0.45        24
           1       0.66      0.81      0.73        36

    accuracy                           0.63        60
   macro avg       0.61      0.59      0.59        60
weighted avg       0.62      0.63      0.62        60

The model accuracy is 0.63, which seems quite good. It is better than random, at least. You can also see that the precision of 1 (growth) is higher than 0 (loss, or negative growth), with 0.66 and 0.56, respectively.

Does that mean it is all good and we can beat the market?

No, far from. Also, notice I chose to only use the last 200 stock days in my experiment out of the 2.500+ possible stock days.

Running a few experiments it showed that it the prediction was close to 50% if all days were used. That means, basically it was not possible to predict.

Step 4: A few more tests on stocks

I have run a few experiments on different stocks and also varying the number of days used.

Stock100 days200 Days400 Days
S&P 5000.530.630.52
AAPL0.530.620.54
F0.670.570.54
KO0.470.520.53
IBM0.570.520.57
MSFT0.500.500.48
AMZN0.570.470.58
TSLA0.500.600.53
NVDA0.570.530.54
The accuracy

Looking in the above table I am not convinced about my hypotheses. First, the 200 days to be better, might have be specific on the stock. Also, if you re-run tests you get new numbers, as the training and test dataset are different from time to time.

I did try a few with the full dataset, and I still think it performed worse (all close to 0.50).

The above looks fine, as it mostly can predict better than just guessing. But still there are a few cases where it is not the case.

Next steps

A few things to remember here.

Firstly, the indicators are chose at random from among the common ones. A further investigation on this could be an idea. It can highly bias the results if it is used does not help the prediction.

Secondly, I might have falsely hypothesized that it was more accurate when we limited to data to a smaller set than the original set.

Thirdly, it could be that the stocks are also having a bias in one direction. If we limit to a smaller period, a bull market will primarily have growth days, hence a biased guess on growth will be better than 0.50. This factor should be investigated further, to see if this favors the predictions.

From HTML Tables to Excel with Pandas: Free Cash Flow and Revenue of Microsoft

What will we cover in this tutorial?

Yes, you can do it manually. Copy from an HTML table and paste into an Excel spread sheet. Or you can dive into how to pull data directly from the internet into Excel. Sometimes it is not convenient, as some data needs to be transformed and you need to do it often.

In this tutorial we will show how this can be easily automated with Python using Pandas.

That is we go from data that needs to be transformed, like, $102,000 into 102000. Also, how to join (or merge) different datasources before we create a Excel spread sheet.

Step 1: The first data source: Revenue of Microsoft

There are many sources where you can get this data, but Macrotrends has it nicely in a table and for more than 10 years old data.

First thing first, let’s try to take a look at the data. You can use Pandas read_html to get the data from the tables given a URL.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
print(revenue)

Where we know it is in the first table on the page. A first few lines of the output is given here.

    Microsoft Annual Revenue(Millions of US $) Microsoft Annual Revenue(Millions of US $).1
0                                         2020                                     $143,015
1                                         2019                                     $125,843
2                                         2018                                     $110,360
3                                         2017                                      $96,571
4                                         2016                                      $91,154

First thing to manage are the column names and setting the year to the index.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
print(revenue)

A first few lines.

      Revenue
Year          
2020  $143,015
2019  $125,843
2018  $110,360
2017   $96,571
2016   $91,154

That helped. But then we need to convert the Revenue column to integers. This is a bit tricky and can be done in various ways. We first need to remove the $-sign, then the comma-sign, before we convert it.

revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

And that covers it.

Step 2: Getting another data source: Free Cash Flow for Microsoft

We want to combine this data with the Free Cash Flow (FCF) of Microsoft.

The data can be gathered the same way and column and index can be set similar.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')
print(fcf)

The first few lines are.

     FCF
Year
2020 45234.0
2019 38260.0
2018 32252.0
2017 31378.0
2016 24982.0

All ready to be joined with the other data.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

# print(revenue)

url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')

data = revenue.join(fcf)

# Let's reorder it
data = data.iloc[::-1].copy()

Where we also reorder it, to have it from the early ears in the top. Notice the copy(), which is not strictly necessary, but makes a hard-copy of the data and not just a view.

      Revenue      FCF
Year                  
2005    39788  15793.0
2006    44282  12826.0
2007    51122  15532.0
2008    60420  18430.0
2009    58437  15918.0

Wow. Ready to export.

Step 3: Exporting it to Excel

This is too easy to have an entire section for it.

data.to_excel('Output.xlsx')

Isn’t it beautiful. Of course you need to execute this after all the lines above.

In total.

import pandas as pd


url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/revenue"
tables = pd.read_html(url)

revenue = tables[0]
revenue.columns = ['Year', 'Revenue']
revenue = revenue.set_index('Year')
revenue['Revenue'] = pd.to_numeric(revenue['Revenue'].str[1:].str.replace(',',''), errors='coerce')

# print(revenue)

url = "https://www.macrotrends.net/stocks/charts/MSFT/microsoft/free-cash-flow"
tables = pd.read_html(url)
fcf = tables[0]
fcf.columns = ['Year', 'FCF']
fcf = fcf.set_index('Year')

data = revenue.join(fcf)

# Let's reorder it
data = data.iloc[::-1].copy()

# Export to Excel
data.to_excel('Output.xlsx')

Which will result in an Excel spread sheet called Output.xlsx.

The Excel spread sheet. I added the graph.

There are many things you might find easier in Excel, like playing around with different types of visualization. On the other hand, there might be many aspects you find easier in Python. I know, I do. Almost all of them. Not kidding. Still, Excel is a powerful tool which is utilized by many specialists. Still it seems like the skills of Python are in request in connection with Excel.

Multiple Time Frame Analysis on a Stock using Pandas

What will we investigate in this tutorial?

A key element to success in trading is to understand the market and the trend of the stock before you buy it. In this tutorial we will not cover how to read the market, but take a top-down analysis approach to stock prices. We will use what is called Multiple Time Frame Analysis on a stock starting with a 1-month, 1-week, and 1-day perspective. Finally, we will compare that with a Simple Moving Average with a monthly view.

Step 1: Gather the data with different time frames

We will use the Pandas-datareader library to collect the time series of a stock. The library has an endpoint to read data from Yahoo! Finance, which we will use as it does not require registration and can deliver the data we need.

import pandas_datareader as pdr
import datetime as dt


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')

Where the key is to set the interval to ‘d’ (Day), ‘wk’ (Week), and ‘mo’ (Month).

This will give us 3 DataFrames, each indexed with different intervals.

Dayly.

                  High         Low  ...      Volume   Adj Close
Date                                ...                        
2019-01-02  101.750000   98.940002  ...  35329300.0   98.860214
2019-01-03  100.190002   97.199997  ...  42579100.0   95.223351
2019-01-04  102.510002   98.930000  ...  44060600.0   99.652115
2019-01-07  103.269997  100.980003  ...  35656100.0   99.779205
2019-01-08  103.970001  101.709999  ...  31514400.0  100.502670

Weekly.

                  High         Low  ...       Volume   Adj Close
Date                                ...                         
2019-01-01  103.269997   97.199997  ...  157625100.0   99.779205
2019-01-08  104.879997  101.260002  ...  150614100.0   99.769432
2019-01-15  107.900002  101.879997  ...  127262100.0  105.302940
2019-01-22  107.879997  104.660004  ...  142112700.0  102.731720
2019-01-29  106.379997  102.169998  ...  203449600.0  103.376968

Monthly.

                  High         Low  ...        Volume   Adj Close
Date                                ...                          
2019-01-01  107.900002   97.199997  ...  7.142128e+08  102.096245
2019-02-01  113.239998  102.349998  ...  4.690959e+08  109.526405
2019-03-01  120.820000  108.800003  ...  5.890958e+08  115.796768
2019-04-01  131.369995  118.099998  ...  4.331577e+08  128.226700
2019-05-01  130.649994  123.040001  ...  5.472188e+08  121.432449
2019-06-01  138.399994  119.010002  ...  5.083165e+08  132.012497

Step 2: Combine data and interpolate missing points

The challenge to connect the DataFrames is that they have different index entries. If we add the data points from Daily with Weekly, there will be a lot of missing entries that Daily has, but Weekly does not have.

                   day        week
Date                              
2019-01-02  101.120003         NaN
2019-01-03   97.400002         NaN
2019-01-04  101.930000         NaN
2019-01-07  102.059998         NaN
2019-01-08  102.800003  102.050003
...                ...         ...
2020-08-13  208.699997         NaN
2020-08-14  208.899994         NaN
2020-08-17  210.279999         NaN
2020-08-18  211.490005  209.699997
2020-08-19  209.699997  209.699997

To deal with that we can choose to interpolate by using the DataFrame interpolate function.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')

data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='linear')
print(data)

Which results in the following output.

                   day        week
Date                              
2019-01-02  101.120003         NaN
2019-01-03   97.400002         NaN
2019-01-04  101.930000         NaN
2019-01-07  102.059998         NaN
2019-01-08  102.800003  102.050003
...                ...         ...
2020-08-13  208.699997  210.047998
2020-08-14  208.899994  209.931998
2020-08-17  210.279999  209.815997
2020-08-18  211.490005  209.699997
2020-08-19  209.699997  209.699997

Where the missing points (except the first entry) will be linearly put between. This can be done for months as well, but we need to be more careful because of three things. First, some dates (1st of the month) do not exist in the data DataFrame. To solve that we use an outer-join, which will include them. Second, this introduces some extra dates, which are not trading dates. Hence, we need to delete them afterwards, which we can do by deleting the column (drop) and removing rows with NA value (dropna). Thirdly, we also need to understand that the monthly view looks backwards. Hence, the 1st of January is first finalized the last day of January. Therefore we shift it back in the join.

import pandas_datareader as pdr
import datetime as dt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()

Step 3: Visualize the output and take a look at it

To visualize it is straight forward by using matplotlib.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()

data.plot()
plt.show()

Which results in the following graph.

As expected the monthly price is adjusted to be the closing day-price the day before. Hence, it looks like the monthly-curve is crossing the day-curve on the 1st every month (which is almost true).

To really appreciate the Multiple Time Frames Analysis, it is better to keep the graphs separate and interpret them each isolated.

Step 4: How to use these different Multiple Time Frame Analysis

Given the picture it is a good idea to start top down. First look at the monthly picture, which shows the overall trend.

Month view of MFST.

In the case of MSFT it is a clear growing trend, with the exception of two declines. But the overall impression is a company in growth that does not seem to slow down. Even the Dow theory (see this tutorial on it) suggest that there will be secondary movements in a general bull trend.

Secondly, we will look at the weekly view.

Weekly view of MFST

Here your impression is a bit more volatile. It shows many smaller ups and downs, with a big one in March, 2020. It could also indicate a small decline in the growth right and the end. Also, the Dow theory could suggest that it will turn. Though it is not certain.

Finally, on the daily view it gives a more volatile picture, which can be used to when to enter the market.

Day view of MFST

Here you could also be a bit worried. Is this the start of a smaller bull market.

To sum up. In the month-view, we have concluded a growth. The week-view shows signs of possible change. Finally, the day-view is also showing signs of possible decline.

As an investor, and based on the above, I would not enter the market right now. If both the month-view and week-view showed growth, while the day-view decline, that would be a good indicator. You want the top level to show growth, while a day-view might show a small decline.

Finally, remember that you should not just use one way to interpret to enter the market or not.

Step 5: Is monthly the same as a Simple Moving Average?

Good question, I am glad you asked. The Simple Moving Average (SMA) can be calculated easy with DataFrames using rolling and mean function.

Best way is to just try it.

import pandas_datareader as pdr
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd


ticker = "MSFT"
start = dt.datetime(2019, 1, 1)
end = dt.datetime.now()
day = pdr.get_data_yahoo(ticker, start, end, interval='d')
week = pdr.get_data_yahoo(ticker, start, end, interval='wk')
month = pdr.get_data_yahoo(ticker, start, end, interval='mo')


data = pd.DataFrame()
data['day'] = day['Close']
data['week'] = week['Close']
data['week'] = data['week'].interpolate(method='index')
data = data.join(month['Close'].shift(), how='outer')
data['month'] = data['Close'].interpolate(method='index')
data = data.drop(columns=['Close']).dropna()
data['SMA20'] = data['day'].rolling(20).mean()

data.plot()
plt.show()

As you see, the SMA is not as reactive on the in crisis in March, 2020, as the monthly view is. This shows a difference in them. This does not exclude the one from the other, but shows a difference in how they react.

Comparing the month-view with a Simple Moving Average of a month (20 trade days)

Please remember, that the monthly view is first updated at the end of a month, while SMA is updated on a daily basis.

Other differences is that SMA is an average of the 20 last days, while the monthly is the actual value of the last day of a month (as we look at Close). This implies that the monthly view can be much more volatile than the SMA.

Conclusion

It is advised to make analysis from bigger time frames and zoom in. This way you first look at overall trends, and get a bigger picture of the market. This should eliminate not to fall into being focused on a small detail in the market, but understand it on a higher level.