How To Get Started with a Predictive Machine Learning Program in Python in 5 Easy Steps

What will you learn?

  • How to predict from a dataset with Machine Learning
  • How to implement that in Python
  • How to get data from Twitter
  • How to install the necessary libraries to do Machine Learning in Python

Step 1: Install the necessary libraries

The sklearn library is a simple and efficient tools for predictive data analysis.

You can install it by typing in the following in your command line.

pip install sklearn

It will most likely install a couple of more needed libraries.

Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Collecting scikit-learn
  Downloading scikit_learn-0.23.1-cp38-cp38-macosx_10_9_x86_64.whl (7.2 MB)
     |████████████████████████████████| 7.2 MB 5.0 MB/s 
Collecting numpy>=1.13.3
  Downloading numpy-1.18.4-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
     |████████████████████████████████| 15.2 MB 12.6 MB/s 
Collecting joblib>=0.11
  Downloading joblib-0.15.1-py3-none-any.whl (298 kB)
     |████████████████████████████████| 298 kB 8.1 MB/s 
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Collecting scipy>=0.19.1
  Downloading scipy-1.4.1-cp38-cp38-macosx_10_9_x86_64.whl (28.8 MB)
     |████████████████████████████████| 28.8 MB 5.8 MB/s 
Using legacy setup.py install for sklearn, since package 'wheel' is not installed.
Installing collected packages: numpy, joblib, threadpoolctl, scipy, scikit-learn, sklearn
    Running setup.py install for sklearn ... done
Successfully installed joblib-0.15.1 numpy-1.18.4 scikit-learn-0.23.1 scipy-1.4.1 sklearn-0.0 threadpoolctl-2.1.0

As in my installation with numpy, joblib, threadpoolctl, scipy, and scikit-learn.

Step 2: The dataset

The machine learning algorithm needs a dataset to train on. To make this tutorial simple, I only used a limited set. I looked through the top tweets from CNN Breaking and categorised them in positive and negative tweets (I know it can be subjective).

negative = [
    "Protesters who were marching from Minneapolis to St. Paul were tear gassed by police as they tried to cross the Lake Street Marshall Bridge ",
    "The National Guard has been activated in Washington, D.C. to assist police handling protests around the White House",
    "Police have been firing tear gas at the protesters near the 5th Precinct in Minneapolis, where some in the crowd have responded with projectiles of their own",
    "Texas and Colorado have activated the National Guard respond to protests",
    "The mayor of Rochester, New York, has declared a state of emergency and ordered a curfew from 9 p.m. Saturday to 7 a.m. Sunday",
    "Cleveland, Ohio, has enacted a curfew that will go into effect at 8 p.m. Saturday and last through 8 a.m. Sunday",
    "A police car appears to be on fire in Los Angeles. Police officers are holding back a line of demonstrators to prevent them from getting close to the car."
            ]

positive = [
    "Two NASA astronauts make history with their successful launch into space aboard a SpaceX rocket",
    "After questionable weather, officials give the all clear for the SpaceX launch",
    "NASA astronauts Bob Behnken and Doug Hurley climb aboard SpaceX's Crew Dragon spacecraft as they prepare for a mission to the International Space Station",
    "New York Gov. Andrew Cuomo signs a bill giving death benefits to families of frontline workers who died battling the coronavirus pandemic"
]

Step 3: Train the model

The data needs to be categorised to be fed into the training algorithm. Hence, we will make the required structure of the data set.

def prepare_data(positive, negative):
    data = positive + negative
    target = [0]*len(positive) + [1]*len(negative)
    return {'data': data, 'target': target}

The actual training is done by using the sklearn library.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

def train_data_set(data_set):
    x_train, x_test, y_train, y_test = train_test_split(data_set['data'], data_set['target'])

    transfer = TfidfVectorizer()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = MultinomialNB()
    estimator.fit(x_train, y_train)

    score = estimator.score(x_test, y_test)
    print("score:\n", score)
    return {'transfer': transfer, 'estimator': estimator}

Step 4: Get some tweets from CNN Breaking and predict

In order for this step to work you need to set up tokens for the twitter api. You can follow this tutorial in order to do that.

When you have that you can use the following code to get it running.

import tweepy


def setup_twitter():
    consumer_key = "REPLACE WITH YOUR KEY"
    consumer_secret = "REPLACE WITH YOUR SECRET"
    access_token = "REPLACE WITH YOUR TOKEN"
    access_token_secret = "REPLACE WITH YOUR TOKEN SECRET"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    return api


def mood_on_cnn(api, predictor):
    stat = [0, 0]
    for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
        sentence_x = predictor['transfer'].transform([status.full_text])
        y_predict = predictor['estimator'].predict(sentence_x)

        stat[y_predict[0]] += 1

    return stat

Step 5: Putting it all together

That is it.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import tweepy


negative = [
    "Protesters who were marching from Minneapolis to St. Paul were tear gassed by police as they tried to cross the Lake Street Marshall Bridge ",
    "The National Guard has been activated in Washington, D.C. to assist police handling protests around the White House",
    "Police have been firing tear gas at the protesters near the 5th Precinct in Minneapolis, where some in the crowd have responded with projectiles of their own",
    "Texas and Colorado have activated the National Guard respond to protests",
    "The mayor of Rochester, New York, has declared a state of emergency and ordered a curfew from 9 p.m. Saturday to 7 a.m. Sunday",
    "Cleveland, Ohio, has enacted a curfew that will go into effect at 8 p.m. Saturday and last through 8 a.m. Sunday",
    "A police car appears to be on fire in Los Angeles. Police officers are holding back a line of demonstrators to prevent them from getting close to the car."
            ]

positive = [
    "Two NASA astronauts make history with their successful launch into space aboard a SpaceX rocket",
    "After questionable weather, officials give the all clear for the SpaceX launch",
    "NASA astronauts Bob Behnken and Doug Hurley climb aboard SpaceX's Crew Dragon spacecraft as they prepare for a mission to the International Space Station",
    "New York Gov. Andrew Cuomo signs a bill giving death benefits to families of frontline workers who died battling the coronavirus pandemic"
]


def prepare_data(positive, negative):
    data = positive + negative
    target = [0]*len(positive) + [1]*len(negative)
    return {'data': data, 'target': target}


def train_data_set(data_set):
    x_train, x_test, y_train, y_test = train_test_split(data_set['data'], data_set['target'])

    transfer = TfidfVectorizer()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = MultinomialNB()
    estimator.fit(x_train, y_train)

    score = estimator.score(x_test, y_test)
    print("score:\n", score)
    return {'transfer': transfer, 'estimator': estimator}


def setup_twitter():
    consumer_key = "REPLACE WITH YOUR KEY"
    consumer_secret = "REPLACE WITH YOUR SECRET"
    access_token = "REPLACE WITH YOUR TOKEN"
    access_token_secret = "REPLACE WITH YOUR TOKEN SECRET"

    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    return api


def mood_on_cnn(api, predictor):
    stat = [0, 0]
    for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
        sentence_x = predictor['transfer'].transform([status.full_text])
        y_predict = predictor['estimator'].predict(sentence_x)

        stat[y_predict[0]] += 1

    return stat


data_set = prepare_data(positive, negative)
predictor = train_data_set(data_set)

api = setup_twitter()
stat = mood_on_cnn(api, predictor)

print(stat)
print("Mood (0 good, 1 bad)", stat[1]/(stat[0] + stat[1]))

I got the following output on the day of writing this tutorial.

score:
 1.0
[751, 2455]
Mood (0 good, 1 bad) 0.765751715533375

I found that the breaking news items are quite negative in taste. Hence, it seems to predict that.

How to Reformat a Text File in Python

The input file and the desired output

The task is to reformat the following input format.

Computing
“I do not fear computers. I fear lack of them.”
— Isaac Asimov

“A computer once beat me at chess, but it was no match for me at kick boxing.”
— Emo Philips

“Computer Science is no more about computers than astronomy is about telescopes.”
— Edsger W. Dijkstra

To the following output format.

“I do not fear computers. I fear lack of them.” (Isaac Asimov)
“A computer once beat me at chess, but it was no match for me at kick boxing.” (Emo Philips)
“Computer Science is no more about computers than astronomy is about telescopes.” (Edsger W. Dijkstra)

The Python code doing the job

The following simple code could do the reformatting in less than a second for a file that contained multiple hundreds quotes.

file = open("input")
content = file.readlines()
file.close()

lines = []
next_line = ""
for line in content:
    line = line.strip()
    if len(line) > 0 and len(line.split()) > 1:
        if line[0] == '“':
            next_line = line
        elif line[0] == '—':
            next_line += " (" + line[2:] + ")"
            lines.append(next_line)
            next_line = ""


file = open("output", "w")
for line in lines:
    file.write(line + "\n")
file.close()

How to Fetch CNN Breaking Tweets and Make Simple Statistics Automated with Python

What will we cover

  • We will use the tweepy library
  • Read the newest tweets from CNN Breaking
  • Make simple word statistics on the news tweets
  • See if we can learn anything from it

Preliminaries

The Code that does the magic

import tweepy

# personal details insert your key, secret, token and token_secret here
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

# authentication of access token and secret
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Creation of the actual interface, using authentication
api = tweepy.API(auth)

# Use a dictionary to count the appearances of words
stat = {}

# Read the tweets from @cnnbrk and make the statistics
for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
    for word in status.full_text.split():
        if word in stat:
            stat[word] += 1
        else:
            stat[word] = 1

# Let's just print the top 10
top = 10

# Let us sort them on the value in reverse order to get the highest first
for word in sorted(stat, key=stat.get, reverse=True):
    # leave out all the small words
    if len(word) > 6:
        print(word, stat[word])
        top -= 1
        if top < 0:
            break

The result of the above (done May 30th, 2020)

coronavirus 441
@CNNPolitics: 439
President 380
updates: 290
impeachment 148
officials 130
according 100
Trump's 98
Democratic 96
against 88
Department 83

The coronavirus is still the most breaking subject of today.

Next steps

  • It should be extended to have a more intelligent interpretation of the data.

Understand the Password Validation in Mac in 3 Steps – Implement the Validation in Python

What will you learn?

  • The password validation process in Mac
  • How to extract the password validation values
  • Implementing the check in Python
  • Understand why the values are as they are
  • The importance of using a salt value with the password
  • Learn why the hash function is iterated multiple times

The Mac password validation process

Every time you log into your Mac it needs to verify that you used the correct password before giving you access.

The validation process reads hash, salt and iteration values from storage and uses them to validate your password.

The 3 steps below helps you to locate your values and how the validation process is done.

Step 1: Locating and extracting the hash, salt and iteration values

You need to use a terminal to extract the values. By using the following command you should get it printed in a readable way.

sudo defaults read /var/db/dslocal/nodes/Default/users/<username>.plist ShadowHashData | tr -dc 0-9a-f | xxd -r -p | plutil -convert xml1 - -o -

Where you need to exchange <username> with your actual user name. The command will prompt you for admin password.

This should result in an output similar to this.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>SALTED-SHA512-PBKDF2</key>
	<dict>
		<key>entropy</key>
		<data>
                1meJW2W6Zugz3rKm/n0yysV+5kvTccA7EuGejmyIX8X/MFoPxmmbCf3BE62h
                6wGyWk/TXR7pvXKg\njrWjZyI+Fc3aKfv1LNQ0/Qrod3lVJcWd9V6Ygt+MYU
                8Eptv3uwDcYf6Z5UuF+Hg67rpoDAWhJrC1\nPEfL3vcN7IoBqC5NkIU=
		</data>
		<key>iterations</key>
		<integer>45454</integer>
		<key>salt</key>
		<data>
		6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=
		</data>
	</dict>
</dict>
</plist>

Step 2: Understand the output

The output consists of four pieces.

  • Key value: SALTED-SHA512-PBKDF2
  • Entropy: Base64 encoded data.
  • Number of iteration: 45454
  • Salt: Base64 encoded data

The Key value is the tells you which algorithm is used (SHA512) and how it is used (PBKDF2).

The entropy is the actual result of the validation algorithm determined by the key value . This “value” is not an encryption of the password, which means you cannot recover the password from that value, but you can validate if the password matches this value.

Confused? I know. But you will understand when we implement the solution

The number of iterations, here 45454, is the number of times the hash function is called. Also, why would you call the hash function multiple times? Follow along and you will see.

Finally, we have the salt value. That is to ensure that you cannot determine the password from the entropy value itself. This will also get explained with example below.

Step 3: Validating the password with Python

Before we explain the above, we need to be have Python do the check of the password.

import hashlib
import base64

iterations = 45454
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())
password = "password".encode()

value = hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128)
print(base64.b64encode(value))

Which will generate the following output

b'1meJW2W6Zugz3rKm/n0yysV+5kvTccA7EuGejmyIX8X/MFoPxmmbCf3BE62h6wGyWk/TXR7pvXKgjrWjZyI+Fc3aKfv1LNQ0/Qrod3lVJcWd9V6Ygt+MYU8Eptv3uwDcYf6Z5UuF+Hg67rpoDAWhJrC1PEfL3vcN7IoBqC5NkIU='

That matches the entropy content of the file.

So what happened in the above Python code?

We use the hashlib library to do all the work for us. It takes the algorithm (sha512), the password (Yes, I used the password ‘password’ in this example, you should not actually use that for anything you want to keep secret from the public), the salt and the number of iterations.

Now we are ready to explore the questions.

Why use a Hash value and not an encryption of the password?

If the password was encrypted, then an admin on your network would be able to decrypt it and misuse it.

Hence, to keep it safe from that, an iterated hash value of your password is used.

A hash function is a one-way function that can map any input to a fixed sized output. A hash function will have these important properties in regards to passwords.

  • It will always map the same input to the same output. Hence, your password will always be mapped to the same value.
  • A small change in the input will give a big change in output. Hence, if you change one character in the password (say, from ‘password’ to ‘passward’) the hash value will be totally different.
  • It is not easy to find the given input to a hash value. Hence, it is not easily feasible to find your password given the hash value.

Why use multiple iterations of the hash function?

To slow it down.

Basically, the way your find passwords is by trying all possibilities. You try ‘a’ and map it to check if that gives the password. Then you try ‘b’ and see.

If that process is slow, you decrease the odds of someone finding your password.

To demonstrate this we can use the cProfile library to investigate the difference in run-time. First let us try it with the 45454 iterations in the hash function.

import hashlib
import base64
import cProfile


def crack_password(entropy, iterations, salt):
    alphabet = "abcdefghijklmnopqrtsuvwxyz"
    for c1 in alphabet:
        for c2 in alphabet:
            password = str.encode(c1 + c2)
            value = base64.b64encode(hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128))
            if value == entropy:
                return password


entropy = "kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs=".encode()
iterations = 45454
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())

cProfile.run("crack_password(entropy, iterations, salt)")

This results in a run time of.

        1    0.011    0.011   58.883   58.883 ShadowFile.py:6(crack_password)

About 1 minute.

If we change the number of iterations to 1.

import hashlib
import base64
import cProfile


def crack_password(entropy, iterations, salt):
    alphabet = "abcdefghijklmnopqrtsuvwxyz"
    for c1 in alphabet:
        for c2 in alphabet:
            password = str.encode(c1 + c2)
            value = base64.b64encode(hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128))
            if value == entropy:
                return password


entropy = "kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs=".encode()
iterations = 1
salt = base64.b64decode("6VuJKkHVTdDelbNMPBxzw7INW2NkYlR/LoW4OL7kVAI=".encode())

cProfile.run("crack_password(entropy, iterations, salt)")

I guess you are not surprised it takes less than 1 second.

        1    0.002    0.002    0.010    0.010 ShadowFile.py:6(crack_password)

Hence, you can check way more passwords if only iterated 1 time.

Why use a Salt?

This is interesting.

Well, say that another user used the password ‘password’ and there was no salt.

import hashlib
import base64

iterations = 45454
salt = base64.b64decode("".encode())
password = "password".encode()

value = hashlib.pbkdf2_hmac('sha512', password, salt, iterations, 128)
print(base64.b64encode(value))
b'kRqabDBsvkyAhpzzVWJtdqbtqgkgNPwr5gqWG6jvw73hxc7CCvC4E33WyR5bxKmAXG5vAG9/ue+DC7BYLHRfOTE/dLKSMdpE9RFH7ZlTp7GHdH5b5vaqQCcKlXAwkky786zvpucDIgGGTOyw6kKB5hqIXLX9chDvcPQksVrjmUs='

Then you would get the same hash value.

Hence, for each user password, there is a new random salt used.

How to proceed from here?

If you want to crack passwords, then I would recommend you use Hashcat.

How Caesar Cipher Teaches us the Most Valuable Lesson – Learn Kerckhoff’s Principle in 5 Steps with Python Code

What will we cover?

  • Understand the challenge to send a secret message
  • Understand the Caesar Cipher
  • How to create an implementation of that in Python
  • How to break the Caesar Cipher
  • Understand the importance of Kerckhoff’s Principle

Step 1: Understand the challenge to send a secret message

In cryptography you have three people involved in almost any scenario. We have Alice that wants to send a message to Bob. But Alice want to send it in a way, such that she ensures that Eve (the evil person) cannot understand it.

But let’s break with tradition and introduce an addition person, Mike. Mike is the messenger. Because we are back in the times of Caesar. Alice represent one of Caesar close generals that needs to send a message to the front lines of the army. Bob is in the front line and waits for a command from Alice. DO ATTACK or NO ATTACK.

Alice will use Mike, the messenger, to send that message to Bob.

Alice is of course afraid of that Eve, the evil enemy, will capture Mike along the way.

Of course, as Alice is smart, she knows that Mike should not understand the message he is delivering, and Eve should not be able to understand it as well. It should only add value to Bob, when Mike gives him the message.

That is the problem that Caesar wanted to solve with his cipher system.

Step 2: Understand the Caesar Cipher

Let’s do this a bit backwards.

You receive the message. BRX DUH DZHVRPH

That is pretty impossible to understand. But if you were told that this is the Caesar Cipher using the shift of 3 characters. Then maybe it makes sense.

As you can see, then green letters are the plaintext characters and the red letters are the encrypted cipher text letters. Hence, A will be a D. That is the letter A is shifted 3 characters down the row.

Reversing this, you see the the encrypted B, will map to the plaintext Y.

If you continue this process you will get.

That is a nice message to get.

Step 3: How to create an implementation of that in Python

Well, that is easy. There are many ways to do it. I will make use of the dictionary to make my life easy.

def generate_key(n):
    letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    key = {}
    cnt = 0
    for c in letters:
        key[ c] = letters[(cnt + n) % len(letters)]
        cnt += 1
    return key


def get_decryption_key(key):
    dkey = {}
    for c in key:
        dkey[key[ c]] = c
    return dkey

    
def encrypt(key, message):
    cipher = ""
    for c in message:
        if c in key:
            cipher += key[ c]
        else:
            cipher += c
    return cipher


# This is setting up your Caesar Cipher key
key = generate_key(3)
# Hmm... I guess this will print the key
print(key)
# This will encrypt the message you have chose with your key
message = "YOU ARE AWESOME"
cipher = encrypt(key, message)
# I guess we should print out your AWESOME message
print(cipher)

Step 4: How to break the Caesar Cipher

If you look at it like this. There is a flaw in the system. Can you see what?

Yes, of course you can. We are in the 2020ies and not back in the times of Caesar.

The key space is too small.

Breaking it basically takes the following code.

# this is us breaking the cipher
print(cipher)
for i in range(26):
    dkey = generate_key(i)
    message = encrypt(dkey, cipher)
    print(message)

You read the code correct. There are only 26 keys. That means, that even back in the days of Caesar this could be done in hand.

This leads us to the most valuable lesson in cryptography and most important principle.

Step 5: Understand the importance of Kerckhoff’s Principle

Let’s just recap what happened here.

Alice sent a message to Bob that Eve captured. Eve did not understand it.

But the reason why Eve did not understand it, was not because she did not have the key.

No, if she knew the algorithm.

Yes, if Eve knew the algorithm of Caesar Cipher, she would not need the secret key to break it.

This leads to the most important lesson in cryptography. Kerckhoff’s Principle.

Eve should not be able to break the ciphers even when she knows the cipher.

Kerckhoff’s Principle

That is seems counterintuitive, right? Yes, but think about it, if you system is secure against any attack even if you reveal your algorithm, then it would give you more confidence that it is secure.

You security should not be based on keeping the algorithm secret. No it should be based on the secret key.

Is that principle followed?

No.

Most government ciphers are kept secret.

Many secret encryption algorithms that leaked were broken.

This also includes the one used for mobile traffic in the old G2 network. A5/1 and the export version A5/2.

Learn the Basics in PyCharm – How to Program as a Professional with Python

What is PyCharm?

PyCharm is an integrated development environment (IDE) used in computer programming, specifically for the Python language.

Learn more about it here. Where to download it?

Is it free? New to Python?

Get Started in PyCharm and Create Your First Program in less than 5 Minutes

How do you start in PyCharm? Create a project? What is that? How get from first start to running your first program in PyCharm. Want to learn more about Python?

Learn the Basics in PyCharm Debugger in 6 Minutes

In this video we are going to learn the basics in the PyCharm Debugger.

There are a lot of nice things you can do. But basically you just need a small percentage of those in order to get started. Follow me in a simple walk through debugging a Python program.

Want to learn more about debugging? Debugging is one of those tasks you hate and love. You hate when your program doesn’t do as you expect. But you love when you figure out why.

A debugger helps you in getting from HATE to LOVE.

New to Python and Programming? Check out the online course below.

Check out my Beginners Level Course on Python

Queue vs Python list – Comparing the Performance – Can a simple Queue beat the default Python list?

How to profile a program in Python

In this video we will see how cProfile (default Python library) can help you to get run-times from your Python program.

Queue vs Python lists

In this video we will compare the performance of a simple Queue implemented directly into Python (no optimisations) with the default Python list.

Can it compare with it on performance?

This is where time complexity analysis come into the picture. A Queue insert and deletion is O(1) time complexity. A Python list used as a queue has O(n) time complexity.

But does the performance and run-time show the same? Here we compare the run-time by using cProfile in Python.

Want to learn more about Linked-lists, Stacks and Queues?

Check out my Course on Linked Lists, Stacks and Queues

Find the Nearest Smaller Element on Left Side in an Array – Understand the Challenge to Solve it Efficiently

The Nearest Smaller Element problem explained:

Given an array (that is a list) of integers, for each element find all the nearest element smaller on the left side of it.

The naive solution has time complexity O(n^2). Can you solve it in O(n)? Well, you need to have a Stack to do that.

The naive solution is for each element to check all the elements on the left of it, to find the first one which is smaller.

The worst case run time for that would be O(n^2). For an array of length n, it would take: 0 + 1 + 2 + 3 + … + (n-1) comparisons. = (n-1)*n/2 = O(n^2) comparisons.

But with a stack we can improve that.

Want to learn more about Stacks?

Check out my Course on Linked Lists, Stacks and Queues

Automate Posting on Facebook in Python – Follow these 7 easy steps

Overview

After this these steps you will be able to automate the process of posting on Facebook by a Python script. In this example I will show how it is done on a Facebook brand page, Learn Python With Rune.

What you need.

  • A graph API token, which you by registering as a developer on facebook and creating an App there.
  • Make a simple Python program using the facebook library

Step 1: Registering as developer at Facebook

To register as a developer at Facebook you need to log in to developer.facebook.com

You press the Log In in the top right corner and log in with you Facebook credentials.

Step 2: Create an App

You need to create an App to get the graph API token.

Under My Apps you press Create App.

Press the Manage Pages, Ads or Groups.

You enter App Display Name, which will be the name that is used when posting from this App. Hence, chose a name that you like people to see in the post.

Fill out your email (probably it is automatically there) and press Create App ID.

Step 3: Create Graph API token

Under tools choose Graph API explorer

Ensure that the right Facebook App is chosen. Then under User or Page chose get Page Access Token.

It will prompt you to log in to your Facebook account and ask permission for sharing your page.

Agree with that.

Then you will get back to this screen.

Where you want to add pages_manage_posts, that will grant you access to create posts.

Then click Generate Access Token and you will be prompted to agree with the new access rights on your Facebook page.

Step 4: Prolong you graph API token

The graph API token is quite short lived, so you want to extend it.

Press the info at the graph API token.

Then press the Open in Access Token Tool.

Where you in the bottom will find Extend Access Token. Press that.

Step 6: Install facebook-sdk library

To make you life easy in Python, you need to install the facebook-sdk library.

pip install facebook-sdk

Step 7: The Python magic

You need to insert you Access Token in the code.

Also, insert the page ID you want. You can find your Page ID with this page.

import facebook

page_access_token = "" # Replace with you access token
graph = facebook.GraphAPI(page_access_token)
facebook_page_id = "" # insert you page ID here.
graph.put_object(facebook_page_id, "feed", message='test message')

That’s it. Enjoy.

Automate a Quotation Image for Twitter in Python in 5 Easy Steps

The challenge

You have a quote and an image.

    quote = "Mostly, when you see programmers, they aren’t doing anything.  One of the attractive things about programmers is that you cannot tell whether or not they are working simply by looking at them.  Very often they’re sitting there seemingly drinking coffee and gossiping, or just staring into space.  What the programmer is trying to do is get a handle on all the individual and unrelated ideas that are scampering around in his head."
    quote_by = "Charles M. Strauss"

len(quote) = 430

The quote is long (430 chars) and the picture might be too bright to write in white text. Also, it will take time to find the right font size.

Can you automate that getting a result that fits the Twitter recommended picture size?

Notice the following.

  • The picture is dimmed to make the white text easy readable.
  • The font size is automatically adjusted to fit the picture.
  • The picture is cropped to fit recommended Twitter size (1024 x 512).
  • There is added a logo by your choice.

If this is what you are looking for, then this will automate that.

Step 1: Find your background picture and quote

In this tutorial I will use the background image and quote given above. But you can change that to your needs.

If you chose a shorter quote the font size will adjust accordingly.

Example given here.

Notice that the following.

  • The font size of the “quoted by” (here Learn Python With Rune) is adjusted to not fill more that half of the picture width.
  • You can modify the margins as you like – say you want more space between the text and the logo.

Step 2: Install the PILLOW library

The what? Install the pillow library. You can find installation documentation in their docs.

Or just type

pip install pillow

The pillow library is the PIL fork, and PIL is the Python Imaging Library. Basically, you need it for processing images.

Step 3: Download a font you want to use

You can find various fonts in font.google.com.

For this purpose, I used the Balsamiq Sans font.

I have located the bold and the italic version in a font folder.

Step 4: The actual Python code

This is the fun part. The actual code.

from PIL import Image, ImageDraw, ImageFont, ImageEnhance

# picture setup - it is set up for Twitter recommendations
WIDTH = 1024
HEIGHT = 512
# the margin are set by my preferences
MARGIN = 50
MARGIN_TOP = 50
MARGIN_BOTTOM = 150
LOGO_MARGIN = 25

# font variables
FONT_SIZES = [110, 100, 90, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]
FONT_QUOTE = 'font-text'
FONT_QUOTED_BY = 'font-quoted-by'
FONT_SIZE = 'font-size'
FONT_QUOTED_BY_SIZE = 'font-quoted-by-size'

# Font colors
WHITE = 'rgb(255, 255, 255)'
GREY = 'rgb(200, 200, 200)'

# output text
OUTPUT_QUOTE = 'quote'
OUTPUT_QUOTED_BY = 'quoted-by'
OUTPUT_LINES = 'lines'


def text_wrap_and_font_size(output, font_style, max_width, max_height):
    for font_size in FONT_SIZES:
        output[OUTPUT_LINES] = []
        font = ImageFont.truetype(font_style[FONT_QUOTE], size=font_size, encoding="unic")
        output[OUTPUT_QUOTE] = " ".join(output[OUTPUT_QUOTE].split())
        if font.getsize(output[OUTPUT_QUOTE])[0] <= max_width:
            output[OUTPUT_LINES].append(output[OUTPUT_QUOTE])
        else:
            words = output[OUTPUT_QUOTE].split()
            line = ""
            for word in words:
                if font.getsize(line + " " + word)[0] <= max_width:
                    line += " " + word
                else:
                    output[OUTPUT_LINES].append(line)
                    line = word
            output[OUTPUT_LINES].append(line)
        line_height = font.getsize('lp')[1]

        quoted_by_font_size = font_size
        quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=quoted_by_font_size, encoding="unic")
        while quoted_by_font.getsize(output[OUTPUT_QUOTED_BY])[0] > max_width//2:
            quoted_by_font_size -= 1
            quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=quoted_by_font_size, encoding="unic")

        if line_height*len(output[OUTPUT_LINES]) + quoted_by_font.getsize('lp')[1] < max_height:
            font_style[FONT_SIZE] = font_size
            font_style[FONT_QUOTED_BY_SIZE] = quoted_by_font_size
            return True

    # we didn't succeed find a font size that would match within the block of text
    return False


def draw_text(image, output, font_style):

    draw = ImageDraw.Draw(image)
    lines = output[OUTPUT_LINES]
    font = ImageFont.truetype(font_style[FONT_QUOTE], size=font_style[FONT_SIZE], encoding="unic")
    line_height = font.getsize('lp')[1]

    y = MARGIN_TOP
    for line in lines:
        x = (WIDTH - font.getsize(line)[0]) // 2
        draw.text((x, y), line, fill=WHITE, font=font)

        y = y + line_height

    quoted_by = output[OUTPUT_QUOTED_BY]
    quoted_by_font = ImageFont.truetype(font_style[FONT_QUOTED_BY], size=font_style[FONT_QUOTED_BY_SIZE], encoding="unic")
    # position the quoted_by in the far right, but within margin
    x = WIDTH - quoted_by_font.getsize(quoted_by)[0] - MARGIN
    draw.text((x, y), quoted_by, fill=GREY, font=quoted_by_font)
    return image


def generate_image_with_quote(input_image, quote, quote_by, font_style, output_image):
    image = Image.open(input_image)

    # darken the image to make output more visible
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(0.5)

    # resize the image to fit Twitter
    image = image.resize((WIDTH, HEIGHT))

    # set logo on image
    logo_im = Image.open("pics/logo.png")
    l_width, l_height = logo_im.size
    image.paste(logo_im, (WIDTH - l_width - LOGO_MARGIN, HEIGHT - l_height - LOGO_MARGIN), logo_im)

    output = {OUTPUT_QUOTE: quote, OUTPUT_QUOTED_BY: quote_by}

    # we should check if it returns true, but it is ignorred here
    text_wrap_and_font_size(output, font_style, WIDTH - 2*MARGIN, HEIGHT - MARGIN_TOP - MARGIN_BOTTOM)

    # now it is time to draw the quote on our image and save it
    image = draw_text(image, output, font_style)
    image.save(output_image)


def main():
    # setup input and output image
    input_image = "pics/background.jpg"
    output_image = "quote_of_the_day.png"

    # setup font type
    font_style = {FONT_QUOTE: "font/BalsamiqSans-Bold.ttf", FONT_QUOTED_BY: "font/BalsamiqSans-Italic.ttf"}

    quote = "Mostly, when you see programmers, they aren’t doing anything.  One of the attractive things about programmers is that you cannot tell whether or not they are working simply by looking at them.  Very often they’re sitting there seemingly drinking coffee and gossiping, or just staring into space.  What the programmer is trying to do is get a handle on all the individual and unrelated ideas that are scampering around in his head."
    # quote = "YOU ARE AWESOME!"
    quote_by = "Charles M. Strauss"
    # quote_by = "Learn Python With Rune"

    # generates the quote image
    generate_image_with_quote(input_image, quote, quote_by, font_style, output_image)


if __name__ == "__main__":
    main()

Notice that you can change the input and output image names and locations in the main() function. Also, there you can setup the font for the quote and the quoted-by.

Finally, and obviously, you can change the quote and quote-by in the main() function.

https://www.learnpythonwithrune.org/beginnerpython/