Data Science with Python is a 12+ hours FREE course – a journey from zero to mastery.
This is a 12 hours full Expert Data Science course. We focus on getting you started with Data Science with the most efficient tools and the right understanding of what adds value in a Data Science Project.
Most use too much time to cover too many technologies without adding value and end up creating poor quality Data Science Project. You don’t want to end up like that! Follow the Secret Data Science Blueprint, which will give a focused template covering all you need to create successful Data Science Projects.
This course consist of the following content.
Mastering the Data Science Workflow is crucial, along with the right tools, to become an Expert Data Scientist. This course will cover all you need to start your journey towards Data Science Mastery.
At the end of the course you will get a template covering all aspects to ensure your Data Science Project follows this flow and is done effectively with Python code using the right libraries.
At the end of the course you are provided with a template covering all aspects of the Data Science Workflow.
In the first lesson you learn about the following.
All the code examples are available from the GitHub.
This will give you an understanding of what Data Science is and how to become successful. How to focus your effort to get the fastest results and not waste time on learning all possible technologies for Data Science.
See the Video below and read this tutorial.
Data Visualization for Data Science is not just how to present your data, it is about data quality, and exploring data to get an understanding of the nature of the data.
Data visualization helps you to understand data fast. Our human brain is not good at understanding rows of numeric data. But when we are visually presented with data we absorb information quickly. It improves our insights in data and enables us to make faster decisions.
In this lesson we will learn how to use pandas DataFrames integration with Matplotlib to visualize data in charts to understand the power of visualization for the three purposes: Data Quality, Data Exploration, and Data Presentation.
See the video and read this tutorial.
When working with tabular data (spreadsheets, databases, etc) pandas is the right tool with a built in Data Structure.
pandas makes it easy to acquire, explore, clean, process, analyze, and visualize your data inside the DataFrame (pandas Data Structure).
pandas comes with a big framework of tools – which can be intimidating. In this lesson we will break down what you need, how to find help, and how to work with pandas DataFrames.
Later in this course we will cover how to get data from various sources: including Web Scraping, Databases, CSV files, Parquet files, and Excel files. We will also cover how to combine data from various sources, as well as calculating descriptive statistics.
See the video and read this guide.
Web Scraping is not only fun – it enables you to get data from the any webpage and do your own analysis on the data.
With the pandas library in Python you can do that in two steps and have the data prepared for further processing.
First – get the URL of the page and use pandas read_html to parse the data from the webpage into a list of DataFrames.
Second – Data Wrangling – which means to transform the data to be in the right format for further processing. That can be extracting numeric values from entries like: “$ 1,234,567” or converting dates to date-objects.
In this lesson you will learn how to do that easily with the pandas library using DataFrames.
See the video and read this guide.
Learn how to get data from Databases and into a pandas DataFrame. We will also learn how to join data from tables into single DataFrames.
We will cover what Relational Databases are, how it models data in rows in columns in a series of tables. You will learn how databases resemble a collection of DataFrames or Excel sheets of data.
You will get the most common SQL statements (Structured Query Language) needed to get data from the database to DataFrame. It is actually less than you think.
We will work on real SQLite databases, but you will know how to connect to other databases.
In the project we will work with Dallas Shooting dataset and visualize in an interactive map using the Folium framework.
See the video and read this guide.
You will learn how to read data from CSV, Excel and Parquet files into pandas DataFrames.
This includes what the file formats contain, how they differ, and what format you should store your data in – which is context dependent.
Also, you will learn the most commonly used arguments for each method to read data into pandas DataFrame.
Finally, you will get a collection of great places to find data online.
In the project we will be given a Data Science problem and we need to find a great place to find data to answer the question. We will learn a great lesson in how different representations of the data can give different views.
See the video and read this guide.
You will learn how to combine data from two pandas DataFrames into one using Merge, Join, and Concat.
This will teach you how to add additional data to an existing dataset. As an example you will try to combine metadata to the total population dataset by country from the World Bank.
The dataset comes with an additional metadata, which you will merge into the main dataset. This adds metadata like “region” and “income group” to each country.
In the project you will explore if GDP is correlated to SPI (Statistical Performance Indicators). In general you will look into what the SPI tells us, and if there are regional differences of the SPI score, before looking at the correlation.
See the video and read this guide.
In this lesson you will learn the most common statistics you need for Data Science.
Statistics in one of the areas where most get lost and scared, because there is so much to learn and it is difficult to know what is relevant.
Here we will show you what you need to understand – and surprisingly, it is not that difficult to understand. At the end of this video you will know what is the most important statistics, what mean is, how to use mean and groupby in DataFrames, know what the standard deviation is and how to use it, what insights describe on a DataFrame gives, how you read box plots, and some insights into correlations.
The project will teach you to use statistics to find out what the start salaries are for Data Scientist and how they evolve over time. But also, how to ensure you get the best starting salary as a Data Scientist. Yes, you will use available data to find the best strategy to ensure yourself the best starting salary.
See the video and read this guide.
Do you like soccer? Do you want to learn how to predict Soccer Player Ratings?
To do that we need to learn about Linear Regression, which is a Machine Learning model used to predict continuous values. Linear Regression describes the relationship between variables.
We will use Sklearn (scikit-learn) for the Linear Regression and demonstrate how it should be understood and visualize how it works. Also, we will look into how to measure the quality of the Linear Regression model.
The project will explore the European Soccer Database, which is a very popular dataset on Kaggle. We will make a Linear Regression model to predict Player Ratings, based on the other features.
See the video and read this guide.
What to do with NaN? Just use dropna?
In this lesson you will learn the impact of just deleting rows with missing data and how to deal with it. You will learn to replace missing data with mean values and how you can use interpolation on time series data.
This is part of cleaning data and starts with an understanding of data quality. Common issues with data quality can be data outliers and duplicates. We will also explore and show how to deal with that.
In the project we will look at how we can use interpolation on a weather dataset to restore missing values. This will show you how big impact it can have on the accuracy of your model.
See the video and read this guide.
Why is Machine Learning so brilliant?
How would like to learn something new? Would you like to get a list a 100 specific instructions to follow specifically? Or would you prefer to figure out yourself to get the desired outcome?
The first approach is actually classical computing, while the second is the Machine Learning approach. It seems that computers are better at figuring out how to solve some problems than we humans are at describing how to do it
In this lesson you will learn how Machine Learning works, we will specifically dive into a Supervised Learning and Classify the Iris Flower Dataset.
In the project we will classify a dataset with little knowledge on it. How good can we do that? Well, let’s find out.
See the video and read this guide.
What is Feature Scaling? And how can it help you?
In this lesson you will learn about the two types of Feature Scaling: Normalization and Standardization. You will see the difference visually to understand the different approaches. Also, you will learn when you need to feature scale.
Feature scaling is also a great idea, when you want to compare results.
You will see box-plots of the different approaches and see the impact on real life weather data. The data will demonstrate the problem easy and visually, and finally on the accuracy of the models you will create in this lesson.
In the project you will look at the European Soccer Database and see if you can predict if a player is left-footed. This will require to use the two types of Feature Scaling to improve the accuracy of the prediction.
See the video and read this guide.
Feature Selection seems like an advanced topic most beginners skip while training Machine Learning Models.
This is a wrong approach.
First of all, Feature Selection is not difficult when you know how to do it. Second of all, this will give you higher accuracy, simpler models, and reduce the risk of overfitting.
See the video and read this guide.
How do you know which model to use for your Machine Learning project? If you choose the wrong one, will it make you look inexperienced?
Most want to find out what is the BEST model for the project. We need to understand that all models have predictive errors.
What we should seek is a model that is good enough. On a top level the problem type we want to solve will guide us from which category of models we should choose. The next level will be using a model selection technique to find the right one from the subset of models.
In this video you will learn about probabilistic measures and resampling methods for selecting models.
In the project you will help a real estate dealer to find out what 10 features matter most to for houses divided into 3 categories: cheap, mid-range, and expensive houses.
See the video and read the guide here.
When it comes to creating a good Data Science Project you will need to ensure you cover a great deal of aspects. This template will show you what to cover and where to find more information on a specific topic.
The common pitfall for most junior Data Scientist is to focus on the technical part of the Data Science Workflow. To add real value to the clients you need to focus on more steps, which are often neglected.
This guide will walk you through all steps and elaborate and link to in-depth content if you need more explanations.
In the project you will try the template and get demonstrated how to use on a great example from IMDB.
See the video and read the guide.