A Simple 7 Step Guide to Implement a Prediction Model to Filter Tweets Based on Dataset Interactively Read from Twitter

What will we learn in this tutorial

  • How Machine Learning works and predicts.
  • What you need to install to implement your Prediction Model in Python
  • A simple way to implement a Prediction Model in Python with persistence
  • How to simplify the connection to the Twitter API using tweepy
  • Collect the training dataset from twitter interactively in a Python program
  • Use the persistent model to predict the tweets you like

Step 1: Quick introduction to Machine Learning

Machine Learning: Input to Learner is Features X (data set) with Targets Y. The Learner outputs a Model, which can predict (Y) future inputs (X).
Machine Learning: Input to Learner is Features X (data set) with Targets Y. The Learner outputs a Model, which can predict (Y) future inputs (X).
  • The Leaner (or Machine Learning Algorithm) is the program that creates a machine learning model from the input data.
  • The Features X is the dataset used by the Learner to generate the Model.
  • The Target Y contains the categories for each data item in the Feature X dataset.
  • The Model takes new inputs X (similar to those in Features) and predicts a target Y, from the categories in Target Y.

We will implement a simple model, that can predict Twitter feeds into two categories: allow and refuse.

Step 2: Install sklearn library (skip if you already have it)

The Python code will be using the sklearn library.

You can install it, simply write the following in the command line (also see here).

pip install scikit-learn

Alternatively, you might want to install it locally in your user space.

pip install scikit-learn --user

Step 3: Create a simple Prediction Model in Python to Train and Predict on tweets

The implementation accomplishes the the machine learning model in a class. The class has the following features.

  • create_dataset: It creates a dataset by taking a list of data that are representing allow, and a list of data that represent the reject. The dataset is divided into features and targets
  • train_dataset: When your dataset is loaded it should be trained to create the model, consisting of the predictor (transfer and estimator)
  • predict: Is called after the model is trained. It can predict an input if it is in the allow category.
  • persist: Is called to save the model for later use, such that we do not need to collect data and train it again. It should only be called after dataset has been created and the model has been train (after create_dataset and train_dataset)
  • load: This will load a saved model and be ready to predict new input.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import joblib


class PredictionModel:
    def __init__(self):
        self.predictor = {}
        self.dataset = {'features': [], 'targets': []}
        self.allow_id = 0
        self.reject_id = 1

    def create_dataset(self, allow_data, reject_data):
        features_y = allow_data + reject_data
        targets_x = [self.allow_id]*len(allow_data) + [self.reject_id]*len(reject_data)
        self.dataset = {'features': features_y, 'targets': targets_x}

    def train_dataset(self):
        x_train, x_test, y_train, y_test = train_test_split(self.dataset['features'], self.dataset['targets'])

        transfer = TfidfVectorizer()
        x_train = transfer.fit_transform(x_train)
        x_test = transfer.transform(x_test)

        estimator = MultinomialNB()
        estimator.fit(x_train, y_train)

        score = estimator.score(x_test, y_test)
        self.predictor = {'transfer': transfer, 'estimator': estimator}

    def predict(self, text):
        sentence_x = self.predictor['transfer'].transform()
        y_predict = self.predictor['estimator'].predict(sentence_x)
        return y_predict[0] == self.allow_id

    def persist(self, output_name):
        joblib.dump(self.predictor['transfer'], output_name+".transfer")
        joblib.dump(self.predictor['estimator'], output_name+".estimator")

    def load(self, input_name):
        self.predictor['transfer'] = joblib.load(input_name+'.transfer')
        self.predictor['estimator'] = joblib.load(input_name+'.estimator')

Step 4: Get a Twitter API access

Go to https://developer.twitter.com/en and get your consumer_key, consumer_secret, access_token, and access_token_secret.

api_key = {
    'consumer_key': "",
    'consumer_secret': "",
    'access_token': "",
    'access_token_secret': ""
}

Also see here for a deeper tutorial on how to get them if in doubt.

Step 5: Simplify your Twitter connection

If you do not already have the tweepy library, then install it by.

pip install tweepy

As you will only read tweets from users, the following class will help you to simplify your code.

import tweepy


class TwitterConnection:
    def __init__(self, api_key):
        # authentication of consumer key and secret
        auth = tweepy.OAuthHandler(api_key['consumer_key'], api_key['consumer_secret'])

        # authentication of access token and secret
        auth.set_access_token(api_key['access_token'], api_key['access_token_secret'])
        self.api = tweepy.API(auth)

    def get_tweets(self, user_name, number=0):
        if number > 0:
            return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items(number)
        else:
            return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items()
  • __init__: The class sets up the Twitter API in the init-function.
  • get_tweets: Returns the tweets from a user_name (screen_name).

Step 6: Collect the dataset (Features X and Target Y) from Twitter

To simplify your life you will use the above TwitterConnection class and and PredictionModel class.

def get_features(auth, user_name, output_name):
    positives = []
    negatives = []
    twitter_con = TwitterConnection(auth)
    tweets = twitter_con.get_tweets(user_name)
    for tweet in tweets:
        print(tweet.full_text)
        print("a/r/e (allow/reject/end)? ", end='')
        response = input()
        if response.lower() == 'y':
            positives.append(tweet.full_text)
        elif response.lower() == 'e':
            break
        else:
            negatives.append(tweet.full_text)
    model = PredictionModel()
    model.create_dataset(positives, negatives)
    model.train_dataset()
    model.persist(output_name)

The function reads the tweets from user_name and prompts for each one of them whether it should be added to tweets you allow or reject.

When you do not feel like “training” your set more (i.e. collect more training data), then you can press e.

Then it will create the dataset and train it to finally persist it.

Step 7: See how good it predicts your tweets based on your model

The following code will print the first number tweets that your model will allow by user_name.

def fetch_tweets_prediction(auth, user_name, input_name, number):
    model = PredictionModel()
    model.load(input_name)
    twitter_con = TwitterConnection(auth)
    tweets = twitter_con.get_tweets(user_name)
    for tweet in tweets:
        if model.predict(tweet.full_text):
            print(tweet.full_text)
            number -= 1
        if number < 0:
            break

Then your final piece is to call it. Remember to fill out your values for the api_key.

api_key = {
    'consumer_key': "",
    'consumer_secret': "",
    'access_token': "",
    'access_token_secret': ""
}

get_features(api_key, "@cnnbrk", "cnnbrk")
fetch_tweets_prediction(api_key, "@cnnbrk", "cnnbrk", 10)

Conclusion

I trained my set by 30-40 tweets with the above code. From the training set it did not have any false positives (that is an allow which was a reject int eh dataset), but it did have false rejects.

The full code is here.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import joblib
import tweepy


class PredictionModel:
    def __init__(self):
        self.predictor = {}
        self.dataset = {'features': [], 'targets': []}
        self.allow_id = 0
        self.reject_id = 1

    def create_dataset(self, allow_data, reject_data):
        features_y = allow_data + reject_data
        targets_x = [self.allow_id]*len(allow_data) + [self.reject_id]*len(reject_data)
        self.dataset = {'features': features_y, 'targets': targets_x}

    def train_dataset(self):
        x_train, x_test, y_train, y_test = train_test_split(self.dataset['features'], self.dataset['targets'])

        transfer = TfidfVectorizer()
        x_train = transfer.fit_transform(x_train)
        x_test = transfer.transform(x_test)

        estimator = MultinomialNB()
        estimator.fit(x_train, y_train)

        score = estimator.score(x_test, y_test)
        self.predictor = {'transfer': transfer, 'estimator': estimator}

    def predict(self, text):
        sentence_x = self.predictor['transfer'].transform()
        y_predict = self.predictor['estimator'].predict(sentence_x)
        return y_predict[0] == self.allow_id

    def persist(self, output_name):
        joblib.dump(self.predictor['transfer'], output_name+".transfer")
        joblib.dump(self.predictor['estimator'], output_name+".estimator")

    def load(self, input_name):
        self.predictor['transfer'] = joblib.load(input_name+'.transfer')
        self.predictor['estimator'] = joblib.load(input_name+'.estimator')


class TwitterConnection:
    def __init__(self, api_key):
        # authentication of consumer key and secret
        auth = tweepy.OAuthHandler(api_key['consumer_key'], api_key['consumer_secret'])

        # authentication of access token and secret
        auth.set_access_token(api_key['access_token'], api_key['access_token_secret'])
        self.api = tweepy.API(auth)

    def get_tweets(self, user_name, number=0):
        if number > 0:
            return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items(number)
        else:
            return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items()


def get_features(auth, user_name, output_name):
    positives = []
    negatives = []
    twitter_con = TwitterConnection(auth)
    tweets = twitter_con.get_tweets(user_name)
    for tweet in tweets:
        print(tweet.full_text)
        print("y/n/e (positive/negative/end)? ", end='')
        response = input()
        if response.lower() == 'y':
            positives.append(tweet.full_text)
        elif response.lower() == 'e':
            break
        else:
            negatives.append(tweet.full_text)
    model = PredictionModel()
    model.create_dataset(positives, negatives)
    model.train_dataset()
    model.persist(output_name)


def fetch_tweets_prediction(auth, user_name, input_name, number):
    model = PredictionModel()
    model.load(input_name)
    twitter_con = TwitterConnection(auth)
    tweets = twitter_con.get_tweets(user_name)
    for tweet in tweets:
        if model.predict(tweet.full_text):
            print("POS", tweet.full_text)
            number -= 1
        else:
            pass
            # print("NEG", tweet.full_text)
        if number < 0:
            break

api_key = {
    'consumer_key': "_",
    'consumer_secret': "_",
    'access_token': "_-_",
    'access_token_secret': "_"
}

get_features(api_key, "@cnnbrk", "cnnbrk")
fetch_tweets_prediction(api_key, "@cnnbrk", "cnnbrk", 10)

Automate Posting on Facebook in Python – Follow these 7 easy steps

Overview

After this these steps you will be able to automate the process of posting on Facebook by a Python script. In this example I will show how it is done on a Facebook brand page, Learn Python With Rune.

What you need.

  • A graph API token, which you by registering as a developer on facebook and creating an App there.
  • Make a simple Python program using the facebook library

Step 1: Registering as developer at Facebook

To register as a developer at Facebook you need to log in to developer.facebook.com

You press the Log In in the top right corner and log in with you Facebook credentials.

Step 2: Create an App

You need to create an App to get the graph API token.

Under My Apps you press Create App.

Press the Manage Pages, Ads or Groups.

You enter App Display Name, which will be the name that is used when posting from this App. Hence, chose a name that you like people to see in the post.

Fill out your email (probably it is automatically there) and press Create App ID.

Step 3: Create Graph API token

Under tools choose Graph API explorer

Ensure that the right Facebook App is chosen. Then under User or Page chose get Page Access Token.

It will prompt you to log in to your Facebook account and ask permission for sharing your page.

Agree with that.

Then you will get back to this screen.

Where you want to add pages_manage_posts, that will grant you access to create posts.

Then click Generate Access Token and you will be prompted to agree with the new access rights on your Facebook page.

Step 4: Prolong you graph API token

The graph API token is quite short lived, so you want to extend it.

Press the info at the graph API token.

Then press the Open in Access Token Tool.

Where you in the bottom will find Extend Access Token. Press that.

Step 6: Install facebook-sdk library

To make you life easy in Python, you need to install the facebook-sdk library.

pip install facebook-sdk

Step 7: The Python magic

You need to insert you Access Token in the code.

Also, insert the page ID you want. You can find your Page ID with this page.

import facebook

page_access_token = "" # Replace with you access token
graph = facebook.GraphAPI(page_access_token)
facebook_page_id = "" # insert you page ID here.
graph.put_object(facebook_page_id, "feed", message='test message')

That’s it. Enjoy.