What will we cover
- We will use the tweepy library
- Read the newest tweets from CNN Breaking
- Make simple word statistics on the news tweets
- See if we can learn anything from it
Preliminaries
- Simple Python understanding
- Setup twitter development API. See this tutorial on how to do it.
The Code that does the magic
import tweepy # personal details insert your key, secret, token and token_secret here consumer_key = "" consumer_secret = "" access_token = "" access_token_secret = "" # authentication of consumer key and secret auth = tweepy.OAuthHandler(consumer_key, consumer_secret) # authentication of access token and secret auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth) # Creation of the actual interface, using authentication api = tweepy.API(auth) # Use a dictionary to count the appearances of words stat = {} # Read the tweets from @cnnbrk and make the statistics for status in tweepy.Cursor(api.user_timeline, screen_name='@cnnbrk', tweet_mode="extended").items(): for word in status.full_text.split(): if word in stat: stat[word] += 1 else: stat[word] = 1 # Let's just print the top 10 top = 10 # Let us sort them on the value in reverse order to get the highest first for word in sorted(stat, key=stat.get, reverse=True): # leave out all the small words if len(word) > 6: print(word, stat[word]) top -= 1 if top < 0: break
The result of the above (done May 30th, 2020)
coronavirus 441 @CNNPolitics: 439 President 380 updates: 290 impeachment 148 officials 130 according 100 Trump's 98 Democratic 96 against 88 Department 83
The coronavirus is still the most breaking subject of today.
Next steps
- It should be extended to have a more intelligent interpretation of the data.