Start Data Science with Python

Start your Data Science journey here with Python

Do you want to master Data Science? Don’t know where to begin?

Well, you are not alone. The internet is full of things and tutorials.

It can be hard to keep focus and not be distracted by all the offers and promises out there on the internet.

If you want to master it and you like a practical approach, where you start out simple and build more and more upon that. Then you are at the right place.

Start simple. Code it yourself. Add more stuff. Code it yourself. Continue until you are the next guru in the field.

Below you will find tutorials that will guide from the a simple practical start and build more and more as you go along.

Also consider signing up for my online course.

Step 1: The first simple step! What is Data Science?

There is no good definition for that. On wikipedia they write the following.

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.

https://en.wikipedia.org/wiki/Data_science

We live in a world of an abundance of data. Everything worth knowing is out there on the internet somewhere. Companies collect data about customers to increase their profits. User surveys. Profile data. Data about data, or meta data, is collected about all we do. Data is collected for the sake of data. Data. Data.

But it is often structured in different ways. It is often not easy to get any value out of the data. This leads to how I see the art of a Data Scientist.

A Data Scientist is a person that given a pile of data can extract something valuable out of it.

What is value? Well, that depends. If you work for a big corporation, well, value equals a way to increase profits. If you work for yourself or science, it might be just something that interests you.

Step 2: Let’s get started with the first project!

When it comes to Data Science the best way to learn is by starting projects and learn along the way.

But don’t you need to master advanced math and statistics and Python?

Yes, you do. On the other hand, with limited math, limited statistics, and limited Python skills you can get started. Then you can learn the more advanced stuff along the way.

In this first tutorial you will get familiar with Pandas and how it can help you scrape HTML tables from the internet and make summations.

Step 3: Group data by country

What if you want to group data by country? Good question.

In this tutorial we will do that.

Also, notice all the cleaning of data. Data is not straight forward to use. Do not get too caught up in the details at this stage. It will come along the way.

Step 4: How to correlate data?

Without going into depth of the math, you can actually learn if data is correlated or not.

Notice all the cleaning and adjustments needed in this code. It might seem a bit difficult at first. Don’t worry too much about the details so far. Just try it out on your own computer.

Sorry, I know, you use a laptop.

Step 5: Discover things you did not know?

The best part of being a Data Scientist is to play around with data. Where does the majority of the world population live?

You need to build an additional tool for that. Combining stuff together can be a lot of fun.

Try it out in this tutorial.

Step 6: Understand the difference between vectorized and lambda functions

A big advantage of Pandas is the vectorized approach to do things.

This saves a lot of for-loops. On top of that, it saves time. It is faster to do it vectorized.

The reason for that is it can be done in a function call to a library written in C or C++.

The same is true for lambda-functions. But when to use what?

Step 7: Merge in all ways possible

As a Data Scientist you will often need to merge data together.

This is a key skill to have.

As you have seen, it is not straight forward. But now you will see you can merge data in different ways.

Step 8: NumPy on real data

NumPy is amazing when it comes to numerical data sets.

In this practical tutorial we will see how you can utilize the power of NumPy.

Step 9: NumPy on survey data

Surveys are common thing to make analysis on.

In this tutorial we will explore how to explore data from a big survey and how to visualize results from it.

Step 10: How to create awesome interactive maps

Another thing a Data Scientist can do is to visualize data on maps.

The first step is to understand how that can be done easily.

Step 11: Another great example with Video tutorial

You can follow the same video inside the tutorial. It will walk you through the code.

Step 12: Use the right scale

Another common problem when visualizing data is the scale. Maybe all data is gathered in one end of the scale. This makes the coloring quite monotone.

Learn an approach to deal with that.

Step 13-16: A big exploration!

Now you are ready for the big thing.

Let’s try to make a longer exploration and divide it down.

Step 17: Creating videos with data visualization

A key skill as a Data Scientist is to make huge amount of data easy to digest in short time.

Sometimes that can be done elegantly with a video. Here we show how the situation evolves over time.

Next step?

If you are serious about this then you should check out my Online Course on Data Science.

It goes in deeper details and you will master when to use the different data types.