Plot Tweets Locations on a Leaflet Map using Python in 3 Easy Steps

What will we cover?

  • How to plot locations of tweets on a leaflet map using Python
  • Setup your access to the Twitter API
  • How to collect location data from Twitter and tweets.
  • Finally, how to plot it on an interactive leaflet map.

Step 1: Getting ready to collect data from Twitter

Twitter is an amazing place to explore data as the API is easy to get access to and the data is public available to everyone. This is also the case if you want to plot Tweet Locations on a Leaflet Map using Python.

Using Python to interact with Twitter is easy and does require a lot to get started. I prefer to use the tweepy library, which is, as they say, “an easy-to-use Python library to accessing the Twitter API”.

To install the tweepy library, simply type the following in a command shell.

pip install tweepy

The next step is to gather your key values to access the API.

You can get them from

If you need help to get them, I can suggest you follow this tutorial first, which will help you set everything up correctly.

Step 2: Collect the locations from the Tweets

Exploring the data available on a tweet, it has a coordinates and place field.

If you read the first word then you realize.

  • Coordinates: Nullable. Represent the geographic location of this Tweet as reported by the user or client application.
  • Place: Nullable. When present, indicates that the tweet is associated (but not necessarily originating from) a Place.

Nullable, which mean that it can be null, i.e., have no value.

But let us see how often they are set.

import tweepy
def get_twitter_api():
    # personal details
    consumer_key = "___INSERT_YOUR_VALUE_HERE___"
    consumer_secret = "___INSERT_YOUR_VALUE_HERE___"
    access_token = "___INSERT_YOUR_VALUE_HERE___"
    access_token_secret = "___INSERT_YOUR_VALUE_HERE___"
    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
    return api
def get_twitter_location(search):
    api = get_twitter_api()
    count = 0
    for tweet in tweepy.Cursor(, q=search).items(500):
        if hasattr(tweet, 'coordinates') and tweet.coordinates is not None:
            count += 1
            print("Coordinates", tweet.coordinates)
        if hasattr(tweet, 'location') and tweet.location is not None:
            count += 1
            print("Coordinates", tweet.location)

Which resulted in 0. I would not expect this to be the case, but you never know.

Hence, the second best thing you can use, is then the location of the user. Most users have a location given in the user object you see the following.

User Object from
User Object from

This results in the following way to collect it. We need to check for the object being None.

def get_tweets(search):
    api = get_twitter_api()
    location_data = []
    for tweet in tweepy.Cursor(, q=search).items(500):
        if hasattr(tweet, 'user') and hasattr(tweet.user, 'screen_name') and hasattr(tweet.user, 'location'):
            if tweet.user.location:
                location_data.append((tweet.user.screen_name, tweet.user.location))
    return location_data

Here we collect all the locations of the users of the tweets and return a list of them.

Step 3: Plot the data on an interactive map

The folium library is amazing to plot data on an interactive leaflet map.

To install the folium library simply type the following command in a terminal.

pip install folium

Or read more here, on how to install it.

We also need to find the coordinates from each location. This can be done by using the library geopy. It can be installed by typing the following command in a terminal.

pip install geopy

Or read more here.

Given that the plotting is done by the following lines of code. Please notice, I put a try-except around the geocode call, as it tends to get an timeout.

import folium
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim

def put_markers(map, data):
    geo_locator = Nominatim(user_agent="LearnPython")
    for (name, location) in data:
        if location:
                location = geo_locator.geocode(location)
            except GeocoderTimedOut:
            if location:
                folium.Marker([location.latitude, location.longitude], popup=name).add_to(map)

if __name__ == "__main__":
    map = folium.Map(location=[0, 0], zoom_start=2)
    location_data = get_tweets("#100DaysOfCode")
    put_markers(map, location_data)"index.html")

This results in the following beautiful map.

Interactive map.
Interactive map.

Want to learn more Python? Also, check out my online course on Python.

How to Create a Sentiment Analysis model to Predict the Mood of Tweets with Python – 4 Steps to Compare the Mood of Python vs Java

What will we cover in this tutorial?

  • We will learn how the supervised Machine Learning algorithm Sentiment Analysis can be used on twitter data (also, called tweets).
  • The model we use will be Naive Bayes Classifier.
  • The tutorial will help install the necessary Python libraries to get started and how to download training data.
  • Then it will give you a full script to train the model.
  • Finally, we will use the trained model to compare the “mood” of Python with Java.

Step 1: Install the Natural Language Toolkit Library and Download Collections

We will use the Natural Language Toolkit (nltk) library in this tutorial.

NLTK is a leading platform for building Python programs to work with human language data.

To install the library you should run the following command in a terminal or see here for other alternatives.

pip install nltk

To have the data available that you need to run the following program or see installing NLTK Data.

import nltk

This will prompt you with a screen similar to this. And select all packages you want to install (I took them all).

Download all packages to NLTK (Natural Language Toolkit)
Download all packages to NLTK (Natural Language Toolkit)

After download you can use the twitter_samples as you need in the example.

Step 2: Reminder of the Sentiment Analysis learning process (Machine Learning)

On a high level you can divide Machine Learning into two phases.

  • Phase 1: Learning
  • Phase 2: Prediction

The Sentiment Analysis model is supervised learning process. The process is defined in the picture below.

The Sentiment Analysis model (Machine Learning) Learning phase
The Sentiment Analysis model (Supervised Machine Learning) Learning phase

On a high level the the learning process of Sentiment Analysis model has the following steps.

  • Training & test data
    • The Sentiment Analysis model is a supervised learning and needs data representing the data that the model should predict. We will use tweets.
    • The data should be categorized into the groups it should be able to distinguish. In our example it will be in positive tweets and negative tweets.
  • Pre-processing
    • First you need to remove “noise”. In our case we remove URL links and Twitter user names.
    • Then you Lemmatize the data to have the words in the same form.
    • Further, you remove stop words as they have no impact of the mood in the tweet.
    • The data then needs to be formatted for the algorithm.
    • Finally, you need to divide it into a training data and testing data.
  • Learning
    • This is where the algorithm builds the model using the training data.
  • Testing
    • Then we test the accuracy of the model with the categorized test data.

Step 3: Train the Sample Data

The twitter_sample contains 5000 positive and 5000 negative tweets, all ready and classified to use in for your training model.

import random
import pickle
from nltk.corpus import twitter_samples
from nltk.stem import WordNetLemmatizer
from nltk.tag import pos_tag
from nltk.corpus import stopwords
from nltk import NaiveBayesClassifier
from nltk import classify

def clean_data(token):
    return [item for item in token if not item.startswith("http") and not item.startswith("@")]

def lemmatization(token):
    lemmatizer = WordNetLemmatizer()
    result = []
    for token, tag in pos_tag(token):
        tag = tag[0].lower()
        token = token.lower()
        if tag in "nva":
            result.append(lemmatizer.lemmatize(token, pos=tag))
    return result

def remove_stop_words(token, stop_words):
    return [item for item in token if item not in stop_words]

def transform(token):
    result = {}
    for item in token:
        result[item] = True
    return result

def main():
    # Step 1: Gather data
    positive_tweets_tokens = twitter_samples.tokenized('positive_tweets.json')
    negative_tweets_tokens = twitter_samples.tokenized('negative_tweets.json')
    # Step 2: Clean, Lemmatize, and remove Stop Words
    stop_words = stopwords.words('english')
    positive_tweets_tokens_cleaned = [remove_stop_words(lemmatization(clean_data(token)), stop_words) for token in positive_tweets_tokens]
    negative_tweets_tokens_cleaned = [remove_stop_words(lemmatization(clean_data(token)), stop_words) for token in negative_tweets_tokens]
    # Step 3: Transform data
    positive_tweets_tokens_transformed = [(transform(token), "Positive") for token in positive_tweets_tokens_cleaned]
    negative_tweets_tokens_transformed = [(transform(token), "Negative") for token in negative_tweets_tokens_cleaned]

    # Step 4: Create data set
    dataset = positive_tweets_tokens_transformed + negative_tweets_tokens_transformed
    train_data = dataset[:7000]
    test_data = dataset[7000:]
    # Step 5: Train data
    classifier = NaiveBayesClassifier.train(train_data)
    # Step 6: Test accuracy
    print("Accuracy is:", classify.accuracy(classifier, test_data))
    # Step 7: Save the pickle
    f = open('my_classifier.pickle', 'wb')
    pickle.dump(classifier, f)

if __name__ == "__main__":

The code is structured in steps. If you are not comfortable how a the flow of a general machine learning flow is, I can recommend to read this tutorial here or this one.

  • Step 1: Collect and categorize It reads the 5000 positive and 5000 negative twitter samples we downloaded with the call.
  • Step 2: The data needs to be cleaned, Lemmatized and removed for stop words.
    • The clean_data call removes links and twitter users.
    • The call to lemmatization puts words in their base form.
    • The call to remove_stop_words removes all the stop words that have no affect on the mood of the sentence.
  • Step 3: Format data This step transforms the data to the desired format for the NaiveBayesClassifier module.
  • Step 4: Divide data Creates the full data set. Makes a shuffle to take them in different order. Then takes 70% as training data and 30% test data.
    • This data is mixed different from run to run. Hence, it might happen that you will not get the same accuracy like I will in my run.
    • The training data is used to make the model to predict from.
    • The test data is used to compute the accuracy of the model to predict.
  • Step 5: Training model This is the training of the NaiveBayesClassifier model.
    • This is where all the magic happens.
  • Step 6: Accuracy This is testing the accuracy of the model.
  • Step 7: Persist To save the model for use.

I got the following output from the above program.

Accuracy is: 0.9973333333333333
Most Informative Features
                      :) = True           Positi : Negati =   1010.7 : 1.0
                     sad = True           Negati : Positi =     25.4 : 1.0
                     bam = True           Positi : Negati =     20.2 : 1.0
                  arrive = True           Positi : Negati =     18.3 : 1.0
                     x15 = True           Negati : Positi =     17.2 : 1.0
               community = True           Positi : Negati =     14.7 : 1.0
                    glad = True           Positi : Negati =     12.6 : 1.0
                   enjoy = True           Positi : Negati =     12.0 : 1.0
                    kill = True           Negati : Positi =     12.0 : 1.0
                     ugh = True           Negati : Positi =     11.3 : 1.0

Step 4: Use the Sentiment Analysis prediction model

Now we can determine the mood of a tweet. To have some fun let us try to figure out the mood of tweets with Python and compare it with Java.

To do that, you need to have setup your twitter developer account. If you do not have that already, then see the this tutorial on how to do that.

In the code below you need to fill out your consumer_key, consumer_secret, access_token, and access_token_secret.

import pickle
import tweepy

def get_twitter_api():
    # personal details
    consumer_key = "___INSERT YOUR DATA HERE___"
    consumer_secret = "___INSERT YOUR DATA HERE___"
    access_token = "___INSERT YOUR DATA HERE___"
    access_token_secret = "___INSERT YOUR DATA HERE___"
    # authentication of consumer key and secret
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    # authentication of access token and secret
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    return api

# This function uses the functions from the learner code above
def tokenize(tweet):
    return remove_noise(word_tokenize(tweet))

def get_classifier(pickle_name):
    f = open(pickle_name, 'rb')
    classifier = pickle.load(f)
    return classifier

def find_mood(search):
    classifier = get_classifier('my_classifier.pickle')
    api = get_twitter_api()
    stat = {
        "Positive": 0,
        "Negative": 0
    for tweet in tweepy.Cursor(, q=search).items(1000):
        custom_tokens = tokenize(tweet.text)
        category = classifier.classify(dict([token, True] for token in custom_tokens))
        stat[category] += 1
    print("The mood of", search)
    print(" - Positive", stat["Positive"], round(stat["Positive"]*100/(stat["Positive"] + stat["Negative"]), 1))
    print(" - Negative", stat["Negative"], round(stat["Negative"]*100/(stat["Positive"] + stat["Negative"]), 1))

if __name__ == "__main__":

That is it. Obviously the mood of Python is better. It is easier than Java.

The mood of #java
 - Positive 524 70.4
 - Negative 220 29.6
The mood of #python
 - Positive 753 75.3
 - Negative 247 24.7

If you want to learn more about Python I can encourage you to take my course here.

3 Steps to Plot Shooting Incident in NY on a Map Using Python

What will you learn in this tutorial?

  • Where to find interesting data contained in CSV files.
  • How to extract a map to plot the data on.
  • Use Python to easily plot the data from the CSV file no the map.

Step 1: Collect the data in CSV format

You can find various interesting data in CSV format on that you can play around with in Python.

In this tutorial we will focus on Shooting Incidents in NYPD from the last year. You can find the data on with NYPD Shooting Incident Data (Year To Date) with NYPD Shooting Incident Data (Year To Date)

You can download the CSV file containing all the data by pressing on the download link.

To download CSV file press the download.
To download CSV file press the download.

Looking at the data you see that each incident has latitude and longitude coordinates.

{'INCIDENT_KEY': '184659172', 'OCCUR_DATE': '06/30/2018 12:00:00 AM', 'OCCUR_TIME': '23:41:00', 'BORO': 'BROOKLYN', 'PRECINCT': '75', 'JURISDICTION_CODE': '0', 'LOCATION_DESC': 'PVT HOUSE                     ', 'STATISTICAL_MURDER_FLAG': 'false', 'PERP_AGE_GROUP': '', 'PERP_SEX': '', 'PERP_RACE': '', 'VIC_AGE_GROUP': '25-44', 'VIC_SEX': 'M', 'VIC_RACE': 'BLACK', 'X_COORD_CD': '1020263', 'Y_COORD_CD': '184219', 'Latitude': '40.672250312', 'Longitude': '-73.870176252'}

That means we can plot on a map. Let us try to do that.

Step 2: Export a map to plot the data

We want to plot all the shooting incidents on a map. You can use OpenStreetMap to get an image of a map.

We want a map of New York, which you can find by locating it on OpenStreetMap or pressing the link.

OpenStreetMap (sorry for the Danish language)

You should press the blue Download in the low right corner of the picture.

Also, remember to get the coordinates of the image in the left side bar, we will need them for the plot.

map_box = [-74.4461, -73.5123, 40.4166, 41.0359]

Step 3: Writing the Python code that adds data to the map

Importing data from a CVS file is easy and can be done through the standard library csv. Making plot on a graph can be done in matplotlib. If you do not have it installed already, you can do that by typing the following in a command line (or see here).

pip install matplotlib

First you need to transform the CVS data of the longitude and latitude to floats.

import csv

# The name of the input file might need to be adjusted, or the location needs to be added if it is not located in the same folder as this file.
csv_file = open('nypd-shooting-incident-data-year-to-date-1.csv')
csv_reader = csv.DictReader(csv_file)
longitude = []
latitude = []
for row in csv_reader:

Now you have two lists (longitude and latitude), which contains the coordinates to plot.

Then for the actual plotting into the image.

import matplotlib.pyplot as plt

# The boundaries of the image map
map_box = [-74.4461, -73.5123, 40.4166, 41.0359]
# The name of the image of the New York map might be different.
map_img = plt.imread('map.png')
fig, ax = plt.subplots()
ax.scatter(longitude, latitude)
ax.set_ylim(map_box[2], map_box[3])
ax.set_xlim(map_box[0], map_box[1])
ax.imshow(map_img, extent=map_box, alpha=0.9)


This will result in the following beautiful map of New York, which highlights where the shooting in the last year has occurred.

Shootings in New York in the last year. Plot by Python using matplotlib.
Shootings in New York in the last year. Plot by Python using matplotlib.

Now that is awesome. If you want to learn more, this and more is covered in my online course. Check it out.

You can also read about how to plot the mood of tweets on a leaflet map.