How to Fetch CNN Breaking Tweets and Make Simple Statistics Automated with Python

What will we cover

  • We will use the tweepy library
  • Read the newest tweets from CNN Breaking
  • Make simple word statistics on the news tweets
  • See if we can learn anything from it

Preliminaries

The Code that does the magic

import tweepy

# personal details insert your key, secret, token and token_secret here
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""

# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

# authentication of access token and secret
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# Creation of the actual interface, using authentication
api = tweepy.API(auth)

# Use a dictionary to count the appearances of words
stat = {}

# Read the tweets from @cnnbrk and make the statistics
for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
    for word in status.full_text.split():
        if word in stat:
            stat[word] += 1
        else:
            stat[word] = 1

# Let's just print the top 10
top = 10

# Let us sort them on the value in reverse order to get the highest first
for word in sorted(stat, key=stat.get, reverse=True):
    # leave out all the small words
    if len(word) > 6:
        print(word, stat[word])
        top -= 1
        if top < 0:
            break

The result of the above (done May 30th, 2020)

coronavirus 441
@CNNPolitics: 439
President 380
updates: 290
impeachment 148
officials 130
according 100
Trump's 98
Democratic 96
against 88
Department 83

The coronavirus is still the most breaking subject of today.

Next steps

  • It should be extended to have a more intelligent interpretation of the data.

Leave a Reply