What will we cover
- We will use the tweepy library
- Read the newest tweets from CNN Breaking
- Make simple word statistics on the news tweets
- See if we can learn anything from it
Preliminaries
- Simple Python understanding
- Setup twitter development API. See this tutorial on how to do it.
The Code that does the magic
import tweepy
# personal details insert your key, secret, token and token_secret here
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
# authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# authentication of access token and secret
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
# Use a dictionary to count the appearances of words
stat = {}
# Read the tweets from @cnnbrk and make the statistics
for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items():
for word in status.full_text.split():
if word in stat:
stat[word] += 1
else:
stat[word] = 1
# Let's just print the top 10
top = 10
# Let us sort them on the value in reverse order to get the highest first
for word in sorted(stat, key=stat.get, reverse=True):
# leave out all the small words
if len(word) > 6:
print(word, stat[word])
top -= 1
if top < 0:
break
The result of the above (done May 30th, 2020)
coronavirus 441
@CNNPolitics: 439
President 380
updates: 290
impeachment 148
officials 130
according 100
Trump's 98
Democratic 96
against 88
Department 83
The coronavirus is still the most breaking subject of today.
Next steps
- It should be extended to have a more intelligent interpretation of the data.