Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    How To Get Started with a Predictive Machine Learning Program in Python in 5 Easy Steps

    What will you learn?

    • How to predict from a dataset with Machine Learning
    • How to implement that in Python
    • How to get data from Twitter
    • How to install the necessary libraries to do Machine Learning in Python

    Step 1: Install the necessary libraries

    The sklearn library is a simple and efficient tools for predictive data analysis.

    You can install it by typing in the following in your command line.

    pip install sklearn
    

    It will most likely install a couple of more needed libraries.

    Collecting sklearn
      Downloading sklearn-0.0.tar.gz (1.1 kB)
    Collecting scikit-learn
      Downloading scikit_learn-0.23.1-cp38-cp38-macosx_10_9_x86_64.whl (7.2 MB)
         |████████████████████████████████| 7.2 MB 5.0 MB/s 
    Collecting numpy>=1.13.3
      Downloading numpy-1.18.4-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
         |████████████████████████████████| 15.2 MB 12.6 MB/s 
    Collecting joblib>=0.11
      Downloading joblib-0.15.1-py3-none-any.whl (298 kB)
         |████████████████████████████████| 298 kB 8.1 MB/s 
    Collecting threadpoolctl>=2.0.0
      Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
    Collecting scipy>=0.19.1
      Downloading scipy-1.4.1-cp38-cp38-macosx_10_9_x86_64.whl (28.8 MB)
         |████████████████████████████████| 28.8 MB 5.8 MB/s 
    Using legacy setup.py install for sklearn, since package 'wheel' is not installed.
    Installing collected packages: numpy, joblib, threadpoolctl, scipy, scikit-learn, sklearn
        Running setup.py install for sklearn ... done
    Successfully installed joblib-0.15.1 numpy-1.18.4 scikit-learn-0.23.1 scipy-1.4.1 sklearn-0.0 threadpoolctl-2.1.0
    

    As in my installation with numpy, joblib, threadpoolctl, scipy, and scikit-learn.

    Step 2: The dataset

    The machine learning algorithm needs a dataset to train on. To make this tutorial simple, I only used a limited set. I looked through the top tweets from CNN Breaking and categorised them in positive and negative tweets (I know it can be subjective).

    negative = [
        "Protesters who were marching from Minneapolis to St. Paul were tear gassed by police as they tried to cross the Lake Street Marshall Bridge ",
        "The National Guard has been activated in Washington, D.C. to assist police handling protests around the White House",
        "Police have been firing tear gas at the protesters near the 5th Precinct in Minneapolis, where some in the crowd have responded with projectiles of their own",
        "Texas and Colorado have activated the National Guard respond to protests",
        "The mayor of Rochester, New York, has declared a state of emergency and ordered a curfew from 9 p.m. Saturday to 7 a.m. Sunday",
        "Cleveland, Ohio, has enacted a curfew that will go into effect at 8 p.m. Saturday and last through 8 a.m. Sunday",
        "A police car appears to be on fire in Los Angeles. Police officers are holding back a line of demonstrators to prevent them from getting close to the car."
                ]
    positive = [
        "Two NASA astronauts make history with their successful launch into space aboard a SpaceX rocket",
        "After questionable weather, officials give the all clear for the SpaceX launch",
        "NASA astronauts Bob Behnken and Doug Hurley climb aboard SpaceX's Crew Dragon spacecraft as they prepare for a mission to the International Space Station",
        "New York Gov. Andrew Cuomo signs a bill giving death benefits to families of frontline workers who died battling the coronavirus pandemic"
    ]
    

    Step 3: Train the model

    The data needs to be categorised to be fed into the training algorithm. Hence, we will make the required structure of the data set.

    def prepare_data(positive, negative):
        data = positive + negative
        target = [0]*len(positive) + [1]*len(negative)
        return {'data': data, 'target': target}
    

    The actual training is done by using the sklearn library.

    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.model_selection import train_test_split
    def train_data_set(data_set):
        x_train, x_test, y_train, y_test = train_test_split(data_set['data'], data_set['target'])
        transfer = TfidfVectorizer()
        x_train = transfer.fit_transform(x_train)
        x_test = transfer.transform(x_test)
        estimator = MultinomialNB()
        estimator.fit(x_train, y_train)
        score = estimator.score(x_test, y_test)
        print("score:\n", score)
        return {'transfer': transfer, 'estimator': estimator}
    

    Step 4: Get some tweets from CNN Breaking and predict

    In order for this step to work you need to set up tokens for the twitter api. You can follow this tutorial in order to do that.

    When you have that you can use the following code to get it running.

    import tweepy
    
    def setup_twitter():
        consumer_key = "REPLACE WITH YOUR KEY"
        consumer_secret = "REPLACE WITH YOUR SECRET"
        access_token = "REPLACE WITH YOUR TOKEN"
        access_token_secret = "REPLACE WITH YOUR TOKEN SECRET"
        # authentication of consumer key and secret
        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
        # authentication of access token and secret
        auth.set_access_token(access_token, access_token_secret)
        api = tweepy.API(auth)
        return api
    
    def mood_on_cnn(api, predictor):
        stat = [0, 0]
        for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
            sentence_x = predictor['transfer'].transform([status.full_text])
            y_predict = predictor['estimator'].predict(sentence_x)
            stat[y_predict[0]] += 1
        return stat
    

    Step 5: Putting it all together

    That is it.

    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.model_selection import train_test_split
    import tweepy
    
    negative = [
        "Protesters who were marching from Minneapolis to St. Paul were tear gassed by police as they tried to cross the Lake Street Marshall Bridge ",
        "The National Guard has been activated in Washington, D.C. to assist police handling protests around the White House",
        "Police have been firing tear gas at the protesters near the 5th Precinct in Minneapolis, where some in the crowd have responded with projectiles of their own",
        "Texas and Colorado have activated the National Guard respond to protests",
        "The mayor of Rochester, New York, has declared a state of emergency and ordered a curfew from 9 p.m. Saturday to 7 a.m. Sunday",
        "Cleveland, Ohio, has enacted a curfew that will go into effect at 8 p.m. Saturday and last through 8 a.m. Sunday",
        "A police car appears to be on fire in Los Angeles. Police officers are holding back a line of demonstrators to prevent them from getting close to the car."
                ]
    positive = [
        "Two NASA astronauts make history with their successful launch into space aboard a SpaceX rocket",
        "After questionable weather, officials give the all clear for the SpaceX launch",
        "NASA astronauts Bob Behnken and Doug Hurley climb aboard SpaceX's Crew Dragon spacecraft as they prepare for a mission to the International Space Station",
        "New York Gov. Andrew Cuomo signs a bill giving death benefits to families of frontline workers who died battling the coronavirus pandemic"
    ]
    
    def prepare_data(positive, negative):
        data = positive + negative
        target = [0]*len(positive) + [1]*len(negative)
        return {'data': data, 'target': target}
    
    def train_data_set(data_set):
        x_train, x_test, y_train, y_test = train_test_split(data_set['data'], data_set['target'])
        transfer = TfidfVectorizer()
        x_train = transfer.fit_transform(x_train)
        x_test = transfer.transform(x_test)
        estimator = MultinomialNB()
        estimator.fit(x_train, y_train)
        score = estimator.score(x_test, y_test)
        print("score:\n", score)
        return {'transfer': transfer, 'estimator': estimator}
    
    def setup_twitter():
        consumer_key = "REPLACE WITH YOUR KEY"
        consumer_secret = "REPLACE WITH YOUR SECRET"
        access_token = "REPLACE WITH YOUR TOKEN"
        access_token_secret = "REPLACE WITH YOUR TOKEN SECRET"
        # authentication of consumer key and secret
        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
        # authentication of access token and secret
        auth.set_access_token(access_token, access_token_secret)
        api = tweepy.API(auth)
        return api
    
    def mood_on_cnn(api, predictor):
        stat = [0, 0]
        for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
            sentence_x = predictor['transfer'].transform([status.full_text])
            y_predict = predictor['estimator'].predict(sentence_x)
            stat[y_predict[0]] += 1
        return stat
    
    data_set = prepare_data(positive, negative)
    predictor = train_data_set(data_set)
    api = setup_twitter()
    stat = mood_on_cnn(api, predictor)
    print(stat)
    print("Mood (0 good, 1 bad)", stat[1]/(stat[0] + stat[1]))
    

    I got the following output on the day of writing this tutorial.

    score:
     1.0
    [751, 2455]
    Mood (0 good, 1 bad) 0.765751715533375
    

    I found that the breaking news items are quite negative in taste. Hence, it seems to predict that.

    Python for Finance: Unlock Financial Freedom and Build Your Dream Life

    Discover the key to financial freedom and secure your dream life with Python for Finance!

    Say goodbye to financial anxiety and embrace a future filled with confidence and success. If you’re tired of struggling to pay bills and longing for a life of leisure, it’s time to take action.

    Imagine breaking free from that dead-end job and opening doors to endless opportunities. With Python for Finance, you can acquire the invaluable skill of financial analysis that will revolutionize your life.

    Make informed investment decisions, unlock the secrets of business financial performance, and maximize your money like never before. Gain the knowledge sought after by companies worldwide and become an indispensable asset in today’s competitive market.

    Don’t let your dreams slip away. Master Python for Finance and pave your way to a profitable and fulfilling career. Start building the future you deserve today!

    Python for Finance a 21 hours course that teaches investing with Python.

    Learn pandas, NumPy, Matplotlib for Financial Analysis & learn how to Automate Value Investing.

    “Excellent course for anyone trying to learn coding and investing.” – Lorenzo B.

    Leave a Comment