Ultimate Guide to the Data Science Career Path

What does it take to become a Data Scientist?

Data Science is in a cross-field of different fields.

This means you need a lot of different skills.

By experience, most Data Scientists work in teams and do not necessarily need to be experts in all areas. Hence, the list of skills is an idea of what you need.

A great way to look at it is in hard and soft skills.

Hard skills

  • Math and Statistics
  • Programming
  • Domain knowledge
  • Data visualization

Soft skills

  • Curiosity
  • Communication
  • Storytelling Skills
  • Structured Thinking

I would say, that the soft skills you learn by experience and but you need an interest in them. The hard skills are the ones you need to get good or at least decent at.

Looking at the hard skills, you do not need to master all aspects of them.

Before we dive into the hard skills, let’s also understand what a Data Scientist does.

Math and Statistics

I understand that many get scared of this one and if you take a formal education in Data Science, you will learn a lot of Statistics.

Experience shows, that it is the few specialists that need a high level of statistics as a Data Scientists. That said, you still need to understand some aspects in-depth.

What does that include?

The most important is.

  • Count descriptive statistics and counts observations. Count is the most used in statistics and has high importance to evaluate findings.
    • Example: Making a conclusion on childhood weights and the study only had 12 childing (observations). Is that trustworthy?
    • The count says something about the quality of the study
  • Mean – The average value.
  • Standard Deviation – is a measure of how dispersed (spread) the data is in relation to the mean.
    • Low standard deviation means data is close to the mean.
    • High standard deviation means data is spread out.

Also understanding box-plots.

What correlation means.

You can learn more about it here.

Programming

Python is used in scientific communities for a set of reasons.

  • Ease of use and simple syntax.
  • Easy to adapt without an engineering background.
  • Many libraries.
  • Wide community.
  • General purpose makes it easy to collaborate.

Python is the most popular programming language in the Scientific Community including Data Science. It is a solid choice to learn.

But do you need to master Python programming on a high level?

No, you need to understand Python programming to a simple level where you master the following.

  • Basic understanding of programming – how Python code works.
  • Variable and Data Types.
  • Calculations with simple types
  • Loop over Data Objects
  • How functions can help you work.
  • How methods can be applied to Data Objects.
  • How to read and write data.
  • Master Data Types: Lists, Dicts, NumPy, DataFrames.
  • Use of Machine Learning Models

This sounds like a lot but can be broken down into steps.

Python Basics

Most beginners’ courses in Python will do fine, while some specialize too much. But what you need to understand and get a feeling of, is how Python code works.

Some common things you learn in the Basic Python course.

  • Variables and built-in Data Types like Lists and Dicts.
  • How to calculate with variables.
  • Looping over Data Objects like Lists and Dicts.
  • Built-in Python functions to ease your work.
  • How to work with files.

Other things you learn, that are good to understand, but not needed to master.

  • Object Oriented Programming (OOP) – you need to understand the idea of OOP, as it will help you understand how a computer works, and how it works on your Data Objects.

A great source is this free course.

Learn NumPy and DataFrames

For the most part, you get really far with pandas DataFrames as a Data Scientist. If you understand them and can work with data with them. Then you are really far.

NumPy is an extension on top of DataFrames (even though it is implemented opposite).

But what are DataFrames and NumPy?

They are data structures used to contain the data you work with as a Data Scientist.

A great place to learn about DataFrames is to follow this free course.

Machine Learning

The Machine Learning models you create are the one that creates your insights to deliver value to your clients. Therefore you need skills to master them and understand how they work.

There are a lot of models and you don’t need to be an expert in all of them. But it is a great idea to understand them.

A few ones could be.

  • k-Nearest-Neighbors Classifier.
  • Linear Classifier
  • Support Vector Machines
  • Linear Regression
  • Q-Learning
  • k-Means clustering
  • Deep Neural Network (DNN)
  • Convolutional Neural Network (CNN)
  • Recurrent Neural Network (RNN)

And be knowledgeable in frameworks like.

  • Sci-Kit Learn
  • Tensorflow
  • PyTorch

You can build up your skills in this free course.

Domain Knowledge

This is actually often the key to getting a job as a Data Scientist.

If you know a lot about Windmills, power prediction patterns, and so forth. Well, then it will be easier for you to get a job as a Data Scientist for a company predicting power production by Windmills.

Or you are an expert in the weather forecast. You can also, get a job as a Data Scientist for predicting power production by Windmills.

The point is twofold.

First, if you have worked in an industry for a few years, then you have deep domain knowledge about that field. Is there is cross-field where you can apply Data Science? Well, find those jobs and you will have a great edge to getting them.

Why?

Well, most say that it is easier to train people to make Data Science, than give them 3-4 years of experience in a Domain.

Take advantage of that.

Second, if you have an interest in some specific area of Data Science. Focus on it. Become an expert.

Again, having Domain Knowledge is crucial to set yourself apart from the other applicants.

Data Visualization

Data Visualization is often misunderstood by beginners in Data Science.

It is actually crucial in 3 different aspects.

  1. Data Quality: Explore data quality including identifying outliers
  2. Data Exploration: Understand data by visualizing ideas
  3. Data Presentation: Present results

Most only focus on the Data Presentation – presenting your findings. While this is an art in itself, most do not fully capture the importance of the other ones at first.

Our human brain is not wired to understand data as digits, but when we see them visually on a chart, we can immediately see and understand them.

Just look at this one.

What is wrong? Well, it looks that some heights are not fitting the other heights.

This tells you something about Data Quality. Is there something wrong with it?

The chart would tell you something is wrong no matter how many data points you have. But image you had to look through 10,000 data points manually in a table. That would take hours and you might miss it.

When it comes to exploring data, seeing it visually on a chart shows you patterns.

Again, you would notice that looking at the data in a table.

Finally, data presentation is an art in itself.

Does this one tell you a story?

A great resource to learn about Data Visualization can be found here.

Does that map out what you need as a Data Scientist?

This gives you the hard skills you need as a Data Scientist.

A great way to think of it is also to understand the Data Science Workflow.

It gives you an idea of what steps a Data Science Project goes through.

Learn Python

Learn Python A BEGINNERS GUIDE TO PYTHON

  • 70 pages to get you started on your journey to master Python.
  • How to install your setup with Anaconda.
  • Written description and introduction to all concepts.
  • Jupyter Notebooks prepared for 17 projects.

Python 101: A CRASH COURSE

  1. How to get started with this 8 hours Python 101: A CRASH COURSE.
  2. Best practices for learning Python.
  3. How to download the material to follow along and create projects.
  4. A chapter for each lesson with a descriptioncode snippets for easy reference, and links to a lesson video.

Expert Data Science Blueprint

Expert Data Science Blueprint

  • Master the Data Science Workflow for actionable data insights.
  • How to download the material to follow along and create projects.
  • A chapter to each lesson with a Description, Learning Objective, and link to the lesson video.

Machine Learning

Machine Learning – The Simple Path to Mastery

  • How to get started with Machine Learning.
  • How to download the material to follow along and make the projects.
  • One chapter for each lesson with a Description, Learning Objectives, and link to the lesson video.

Leave a Comment