Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    A Simple 7 Step Guide to Implement a Prediction Model to Filter Tweets Based on Dataset Interactively Read from Twitter

    What will we learn in this tutorial

    • How Machine Learning works and predicts.
    • What you need to install to implement your Prediction Model in Python
    • A simple way to implement a Prediction Model in Python with persistence
    • How to simplify the connection to the Twitter API using tweepy
    • Collect the training dataset from twitter interactively in a Python program
    • Use the persistent model to predict the tweets you like

    Step 1: Quick introduction to Machine Learning

    Machine Learning: Input to Learner is Features X (data set) with Targets Y. The Learner outputs a Model, which can predict (Y) future inputs (X).
    Machine Learning: Input to Learner is Features X (data set) with Targets Y. The Learner outputs a Model, which can predict (Y) future inputs (X).
    • The Leaner (or Machine Learning Algorithm) is the program that creates a machine learning model from the input data.
    • The Features X is the dataset used by the Learner to generate the Model.
    • The Target Y contains the categories for each data item in the Feature X dataset.
    • The Model takes new inputs X (similar to those in Features) and predicts a target Y, from the categories in Target Y.

    We will implement a simple model, that can predict Twitter feeds into two categories: allow and refuse.

    Step 2: Install sklearn library (skip if you already have it)

    The Python code will be using the sklearn library.

    You can install it, simply write the following in the command line (also see here).

    pip install scikit-learn
    

    Alternatively, you might want to install it locally in your user space.

    pip install scikit-learn --user
    

    Step 3: Create a simple Prediction Model in Python to Train and Predict on tweets

    The implementation accomplishes the the machine learning model in a class. The class has the following features.

    • create_dataset: It creates a dataset by taking a list of data that are representing allow, and a list of data that represent the reject. The dataset is divided into features and targets
    • train_dataset: When your dataset is loaded it should be trained to create the model, consisting of the predictor (transfer and estimator)
    • predict: Is called after the model is trained. It can predict an input if it is in the allow category.
    • persist: Is called to save the model for later use, such that we do not need to collect data and train it again. It should only be called after dataset has been created and the model has been train (after create_dataset and train_dataset)
    • load: This will load a saved model and be ready to predict new input.
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.model_selection import train_test_split
    import joblib
    
    class PredictionModel:
        def __init__(self):
            self.predictor = {}
            self.dataset = {'features': [], 'targets': []}
            self.allow_id = 0
            self.reject_id = 1
        def create_dataset(self, allow_data, reject_data):
            features_y = allow_data + reject_data
            targets_x = [self.allow_id]*len(allow_data) + [self.reject_id]*len(reject_data)
            self.dataset = {'features': features_y, 'targets': targets_x}
        def train_dataset(self):
            x_train, x_test, y_train, y_test = train_test_split(self.dataset['features'], self.dataset['targets'])
            transfer = TfidfVectorizer()
            x_train = transfer.fit_transform(x_train)
            x_test = transfer.transform(x_test)
            estimator = MultinomialNB()
            estimator.fit(x_train, y_train)
            score = estimator.score(x_test, y_test)
            self.predictor = {'transfer': transfer, 'estimator': estimator}
        def predict(self, text):
            sentence_x = self.predictor['transfer'].transform([text])
            y_predict = self.predictor['estimator'].predict(sentence_x)
            return y_predict[0] == self.allow_id
        def persist(self, output_name):
            joblib.dump(self.predictor['transfer'], output_name+".transfer")
            joblib.dump(self.predictor['estimator'], output_name+".estimator")
        def load(self, input_name):
            self.predictor['transfer'] = joblib.load(input_name+'.transfer')
            self.predictor['estimator'] = joblib.load(input_name+'.estimator')
    

    Step 4: Get a Twitter API access

    Go to https://developer.twitter.com/en and get your consumer_key, consumer_secret, access_token, and access_token_secret.

    api_key = {
        'consumer_key': "",
        'consumer_secret': "",
        'access_token': "",
        'access_token_secret': ""
    }
    

    Also see here for a deeper tutorial on how to get them if in doubt.

    Step 5: Simplify your Twitter connection

    If you do not already have the tweepy library, then install it by.

    pip install tweepy
    

    As you will only read tweets from users, the following class will help you to simplify your code.

    import tweepy
    
    class TwitterConnection:
        def __init__(self, api_key):
            # authentication of consumer key and secret
            auth = tweepy.OAuthHandler(api_key['consumer_key'], api_key['consumer_secret'])
            # authentication of access token and secret
            auth.set_access_token(api_key['access_token'], api_key['access_token_secret'])
            self.api = tweepy.API(auth)
        def get_tweets(self, user_name, number=0):
            if number > 0:
                return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items(number)
            else:
                return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items()
    
    • __init__: The class sets up the Twitter API in the init-function.
    • get_tweets: Returns the tweets from a user_name (screen_name).

    Step 6: Collect the dataset (Features X and Target Y) from Twitter

    To simplify your life you will use the above TwitterConnection class and and PredictionModel class.

    def get_features(auth, user_name, output_name):
        positives = []
        negatives = []
        twitter_con = TwitterConnection(auth)
        tweets = twitter_con.get_tweets(user_name)
        for tweet in tweets:
            print(tweet.full_text)
            print("a/r/e (allow/reject/end)? ", end='')
            response = input()
            if response.lower() == 'y':
                positives.append(tweet.full_text)
            elif response.lower() == 'e':
                break
            else:
                negatives.append(tweet.full_text)
        model = PredictionModel()
        model.create_dataset(positives, negatives)
        model.train_dataset()
        model.persist(output_name)
    

    The function reads the tweets from user_name and prompts for each one of them whether it should be added to tweets you allow or reject.

    When you do not feel like “training” your set more (i.e. collect more training data), then you can press e.

    Then it will create the dataset and train it to finally persist it.

    Step 7: See how good it predicts your tweets based on your model

    The following code will print the first number tweets that your model will allow by user_name.

    def fetch_tweets_prediction(auth, user_name, input_name, number):
        model = PredictionModel()
        model.load(input_name)
        twitter_con = TwitterConnection(auth)
        tweets = twitter_con.get_tweets(user_name)
        for tweet in tweets:
            if model.predict(tweet.full_text):
                print(tweet.full_text)
                number -= 1
            if number < 0:
                break
    

    Then your final piece is to call it. Remember to fill out your values for the api_key.

    api_key = {
        'consumer_key': "",
        'consumer_secret': "",
        'access_token': "",
        'access_token_secret': ""
    }
    get_features(api_key, "@cnnbrk", "cnnbrk")
    fetch_tweets_prediction(api_key, "@cnnbrk", "cnnbrk", 10)
    

    Conclusion

    I trained my set by 30-40 tweets with the above code. From the training set it did not have any false positives (that is an allow which was a reject int eh dataset), but it did have false rejects.

    The full code is here.

    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.model_selection import train_test_split
    import joblib
    import tweepy
    
    class PredictionModel:
        def __init__(self):
            self.predictor = {}
            self.dataset = {'features': [], 'targets': []}
            self.allow_id = 0
            self.reject_id = 1
        def create_dataset(self, allow_data, reject_data):
            features_y = allow_data + reject_data
            targets_x = [self.allow_id]*len(allow_data) + [self.reject_id]*len(reject_data)
            self.dataset = {'features': features_y, 'targets': targets_x}
        def train_dataset(self):
            x_train, x_test, y_train, y_test = train_test_split(self.dataset['features'], self.dataset['targets'])
            transfer = TfidfVectorizer()
            x_train = transfer.fit_transform(x_train)
            x_test = transfer.transform(x_test)
            estimator = MultinomialNB()
            estimator.fit(x_train, y_train)
            score = estimator.score(x_test, y_test)
            self.predictor = {'transfer': transfer, 'estimator': estimator}
        def predict(self, text):
            sentence_x = self.predictor['transfer'].transform([text])
            y_predict = self.predictor['estimator'].predict(sentence_x)
            return y_predict[0] == self.allow_id
        def persist(self, output_name):
            joblib.dump(self.predictor['transfer'], output_name+".transfer")
            joblib.dump(self.predictor['estimator'], output_name+".estimator")
        def load(self, input_name):
            self.predictor['transfer'] = joblib.load(input_name+'.transfer')
            self.predictor['estimator'] = joblib.load(input_name+'.estimator')
    
    class TwitterConnection:
        def __init__(self, api_key):
            # authentication of consumer key and secret
            auth = tweepy.OAuthHandler(api_key['consumer_key'], api_key['consumer_secret'])
            # authentication of access token and secret
            auth.set_access_token(api_key['access_token'], api_key['access_token_secret'])
            self.api = tweepy.API(auth)
        def get_tweets(self, user_name, number=0):
            if number > 0:
                return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items(number)
            else:
                return tweepy.Cursor(self.api.user_timeline, screen_name=user_name, tweet_mode="extended").items()
    
    def get_features(auth, user_name, output_name):
        positives = []
        negatives = []
        twitter_con = TwitterConnection(auth)
        tweets = twitter_con.get_tweets(user_name)
        for tweet in tweets:
            print(tweet.full_text)
            print("y/n/e (positive/negative/end)? ", end='')
            response = input()
            if response.lower() == 'y':
                positives.append(tweet.full_text)
            elif response.lower() == 'e':
                break
            else:
                negatives.append(tweet.full_text)
        model = PredictionModel()
        model.create_dataset(positives, negatives)
        model.train_dataset()
        model.persist(output_name)
    
    def fetch_tweets_prediction(auth, user_name, input_name, number):
        model = PredictionModel()
        model.load(input_name)
        twitter_con = TwitterConnection(auth)
        tweets = twitter_con.get_tweets(user_name)
        for tweet in tweets:
            if model.predict(tweet.full_text):
                print("POS", tweet.full_text)
                number -= 1
            else:
                pass
                # print("NEG", tweet.full_text)
            if number < 0:
                break
    api_key = {
        'consumer_key': "_",
        'consumer_secret': "_",
        'access_token': "_-_",
        'access_token_secret': "_"
    }
    get_features(api_key, "@cnnbrk", "cnnbrk")
    fetch_tweets_prediction(api_key, "@cnnbrk", "cnnbrk", 10)
    

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment