3 Easy Steps to Get Started With Machine Learning: Understand the Concept and Implement Linear Regression in Python

What will we cover in this article?

  • What is Machine Learning and how it can help you?
  • How does Machine Learning work?
  • A first example of Linear Regression in Python

Step 1: How can Machine Learning help you?

Machine Learning is a hot topic these days and it is easy to get confused when people talk about it. But what is Machine Learning and how can it you?

I found the following explanation quite good.

Classical vs modern (No machine learning vs machine learning) approach to predictions.
Classical vs modern (No machine learning vs machine learning) approach to predictions.

In the classical computing model every thing is programmed into the algorithms. This has the limitation that all decision logic need to be understood before usage. And if things change, we need to modify the program.

With the modern computing model (Machine Learning) this paradigm is changes. We feed the algorithms with data, and based on that data, we do the decisions in the program.

While this can seem abstract, this is a big change in thinking programming. Machine Learning has helped computers to have solutions to problems like:

  • Improved search engine results.
  • Voice recognition.
  • Number plate recognition.
  • Categorisation of pictures.
  • …and the list goes on.

Step 2: How does Machine Learning work?

I’m glad you asked. I was wondering about that myself.

On a high level you can divide Machine Learning into two phases.

  • Phase 1: Learning
  • Phase 2: Prediction

The Learning phase is divided into steps.

Machine Learning: The Learning Phase: Training data, Pre-processing, Learning, Testing
Machine Learning: The Learning Phase: Training data, Pre-processing, Learning, Testing

It all starts with a training set (training data). This data set should represent the type of data that the Machine Learn model should be used to predict from in Phase 2 (predction).

The pre-processing step is about cleaning up data. While the Machine Learning is awesome, it cannot figure out what good data looks like. You need to do the cleaning as well as transforming data into a desired format.

Then for the magic, the learning step. There are three main paradigms in machine learning.

  • Supervised: where you tell the algorithm what categories each data item is in. Each data item from the training set is tagged with the right answer.
  • Unsupervised: is when the learning algorithm is not told what to do with it and it should make the structure itself.
  • Reinforcement: teaches the machine to think for itself based on past action rewards.

Finally, the testing is done to see if the model is good. The training data was divided into a test set and training set. The test set is used to see if the model can predict from it. If not, a new model might be necessary.

After that the Prediction Phase begins.

How Machine Learning predicts new data.
How Machine Learning predicts new data.

When the model has been created it will be used to predict based on it from new data.

Step 3: For our first example of Linear Regression in Python

Installing the libraries

Linear regression is a linear approach to modelling the relationship between a scalar response to one or more variables. In the case we try to model, we will do it for one single variable. Said in another way, we want map points on a graph to a line (y = a*x + b).

To do that, we need to import various libraries.

# Importing matplotlib to make a plot
import matplotlib.pyplot as plt
# work with number as array
import numpy as np
# we want to use linear_model (that uses datasets)
from sklearn import linear_model

The matplotlib library is used to make a plot, but is a comprehensive library for creating static, animated, and interactive visualizations in Python. If you do not have it installed you can do that by typing in the following command in a terminal.

pip install matplotlib

The numpy is a powerful library to calculate with N-dimensional arrays. If needed, you can install it by typing the following command in a terminal.

pip install numpy

Finally, you need the linear_model from the sklearn library, which you can install by typing the following command in a terminal.

pip install scikit-learn

Training data set

This simple example will let you make a linear regression of an input of the following data set.

# data set
prices = [245, 312, 279, 308, 199, 409, 200, 400, 230]
size = [50, 60, 35, 55, 30, 65, 30, 75, 25]

Here some items are sold, but each item has a size. The first item was sold for 245 ($) and had a size of 50 (something). The next item was sold to 312 ($) and had a size of 60 (something).

The sizes needs to be reshaped before we model it.

# Importing matplotlib and numpy and sklearn
import matplotlib.pyplot as plt
# work with number as array
import numpy as np
# we want to use linear_model (that uses datasets)
from sklearn import linear_model

# data set
prices = [245, 312, 279, 308, 199, 409, 200, 400, 230]
size = [50, 60, 35, 55, 30, 65, 30, 75, 25]

# reshape the input for regression ( second argument how many items
size2 = np.array(size).reshape((-1, 1))
print(size2)

Which results in the following output.

[[50]
 [60]
 [35]
 [55]
 [30]
 [65]
 [30]
 [75]
 [25]]

Hence, the reshape((-1, 1)) transforms it from a row to a single array.

Then for the linear regression.

# Importing matplotlib and numpy and sklearn
import matplotlib.pyplot as plt
# work with number as array
import numpy as np
# we want to use linear_model (that uses datasets)
from sklearn import linear_model

# data set
prices = [245, 312, 279, 308, 199, 409, 200, 400, 230]
size = [50, 60, 35, 55, 30, 65, 30, 75, 25]

# reshape the input for regression ( second argument how many items
size2 = np.array(size).reshape((-1, 1))
print(size2)

regr = linear_model.LinearRegression()
regr.fit(size2, prices)
print("Coefficients", regr.coef_)
print("intercepts", regr.intercept_)

Which prints out the coefficient (a) and the intercept (b) of a formula y = a*x + b.

Now you can predict future prices, when given a size.

# How to predict
size_new = 60
price = size_new * regr.coef_ + regr.intercept_
print(price)
print(regr.predict([[size_new]]))

Where you both can compute it directly (2nd line) or use the regression model (4th line).

Finally, you can plot the linear regression as a graph.

# Importing matplotlib and numpy and sklearn
import matplotlib.pyplot as plt
# work with number as array
import numpy as np
# we want to use linear_model (that uses datasets)
from sklearn import linear_model

# data set
prices = [245, 312, 279, 308, 199, 409, 200, 400, 230]
size = [50, 60, 35, 55, 30, 65, 30, 75, 25]

# reshape the input for regression ( second argument how many items
size2 = np.array(size).reshape((-1, 1))
print(size2)

regr = linear_model.LinearRegression()
regr.fit(size2, prices)

# Here we plot the graph
x = np.array(range(20, 100))
y = eval('regr.coef_*x + regr.intercept_')
plt.plot(x, y)
plt.scatter(size, prices, color='black')
plt.ylabel('prices')
plt.xlabel('size')
plt.show()

Which results in the following graph.

Example of linear regression in Python
Example of linear regression in Python

Conclusion

This is obviously a simple example of linear regression, as it only has one variable. This simple example shows you how to setup the environment in Python and how to make a simple plot.

Leave a Reply