Trading ## Master Markowitz Portfolio Optimization (Efficient Frontier) in Python using Pandas

The Efficient Frontier takes a portfolio of investments and optimizes the expected return in regards to the risk. That is to find the optimal return for a risk.

According to investopedia.org the return is based on the expected Compound Annual Growth Rate (CAGR) and risk metric is the standard deviation of the return.

But what does all that mean? We will learn that in this tutorial.

We will use the following portfolio of 4 stocks of Apple (**AAPL**), Microsoft (**MSFT**), IBM (**IBM**) and Nvidia (**NVDA**).

To get the time series we will use the Yahoo! Finance API through the Pandas-datareader.

We will look 5 years back.

```
import pandas_datareader as pdr
import pandas as pd
import datetime as dt
from dateutil.relativedelta import relativedelta
years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL','MSFT','IBM','NVDA']
for ticker in tickers:
tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
close_price[ticker] = tmp['Close']
print(close_price)
```

Resulting in the following output (or the first few lines).

```
AAPL MSFT IBM NVDA
Date
2015-08-25 103.739998 40.470001 140.960007 20.280001
2015-08-26 109.690002 42.709999 146.699997 21.809999
2015-08-27 112.919998 43.900002 148.539993 22.629999
2015-08-28 113.290001 43.930000 147.979996 22.730000
2015-08-31 112.760002 43.520000 147.889999 22.480000
```

It will contain all the date time series for the last 5 years from current date.

To calculate the expected return, we use the Compound Average Growth Rate (CAGR) based on the last 5 years. The CAGR is used as investopedia suggest. An alternative that also is being used is the mean of the returns. The key thing is to have some common measure of the return.

The CAGR is calculated as follows.

**CAGR = (end-price/start-price)^(1/years) – 1**

We will also calculate the covariance as we will use that the calculate the variance of a weighted portfolio. Remember that the standard deviation is given by the following.

**sigma = sqrt(variance)**

A portfolio is a vector **w** with the balances of each stock. For example, given **w = [0.2, 0.3, 0.4, 0.1]**, will say that we have 20% in the first stock, 30% in the second, 40% in the third, and 10% in the final stock. It all sums up to 100%.

Given a weight **w** of the portfolio, you can calculate the variance of the stocks by using the covariance matrix.

**variance = w^T Cov w**

Where **Cov** is the covariance matrix.

This results in the following pre-computations.

```
returns = close_price/close_price.shift(1)
cagr = (close_price.iloc[-1]/close_price.iloc[0])**(1/years) - 1
cov = returns.cov()
print(cagr)
print(cov)
```

Where you can see the output here.

```
# CACR:
AAPL 0.371509
MSFT 0.394859
IBM -0.022686
NVDA 0.905011
dtype: float64
# Covariance
AAPL MSFT IBM NVDA
AAPL 0.000340 0.000227 0.000152 0.000297
MSFT 0.000227 0.000303 0.000164 0.000306
IBM 0.000152 0.000164 0.000260 0.000210
NVDA 0.000297 0.000306 0.000210 0.000879
```

This is where the power of computing comes into the picture. The idea is to just try a random portfolio and see how it rates with regards to expected return and risk.

It is that simple. Make a random weighted distribution of your portfolio and plot the point of expected return (based on our CAGR) and the risk based on the standard deviation calculated by the covariance.

```
import matplotlib.pyplot as plt
import numpy as np
def random_weights(n):
k = np.random.rand(n)
return k / sum(k)
exp_return = []
sigma = []
for _ in range(20000):
w = random_weights(len(tickers))
exp_return.append(np.dot(w, cagr.T))
sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()
```

We introduce a helper function **random_weights**, which returns a weighted portfolio. That is, it returns a vector with entries that sum up to one. This will give a way to distribute our portfolio of stocks.

Then we iterate 20.000 times (could be any value, just want to have enough to plot our graph), where we make a random weight **w**, then calculate the expected return by the dot-product of **w** and **cagr-transposed**. This is done by using NumPy’s dot-product function.

What a dot-product of **np.dot(w, cagr.T)** does is to take elements pairwise from **w** and **cagr** and multiply them and sum up. The transpose is only about the orientation of it to make it work.

The standard deviation (assigned to sigma) is calculated similar by the formula given in the last step: **variance = w^T Cov w** (which has dot-products between).

This results in the following graph.

This shows a graph which outlines a parabola. The optimal values lie along the upper half of the parabola line. Hence, given a risk, the optimal portfolio is one corresponding on the upper boarder of the filled parabola.

The Efficient Frontier gives you a way to balance your portfolio. The above code can by trial an error find such a portfolio, but it still leaves out some consideratoins.

How often should you re-balance? It has a cost to do that.

The theory behind has some assumptions that may not be a reality. As investopedia points out, it assumes that asset returns follow a normal distribution, but in reality returns can be more the 3 standard deviations away. Also, the theory builds upon that investors are rational in their investment, which is by most considered a flawed assumption, as more factors play into the investments.

Below here you find the full source code from the tutorial.

```
import pandas_datareader as pdr
import datetime as dt
import pandas as pd
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import numpy as np
years = 5
end_date = dt.datetime.now()
start_date = end_date - relativedelta(years=years)
close_price = pd.DataFrame()
tickers = ['AAPL', 'MSFT', 'IBM', 'NVDA']
for ticker in tickers:
tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
close_price[ticker] = tmp['Close']
returns = close_price / close_price.shift(1)
cagr = (close_price.iloc[-1] / close_price.iloc[0]) ** (1 / years) - 1
cov = returns.cov()
def random_weights(n):
k = np.random.rand(n)
return k / sum(k)
exp_return = []
sigma = []
for _ in range(20000):
w = random_weights(len(tickers))
exp_return.append(np.dot(w, cagr.T))
sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
plt.plot(sigma, exp_return, 'ro', alpha=0.1)
plt.show()
```

Build and Deploy an AI App with Python Flask, OpenAI API, and Google Cloud: In…

5 days ago

Python REST APIs with gcloud Serverless In the fast-paced world of application development, building robust…

5 days ago

App Development with Python using Docker Are you an aspiring app developer looking to level…

6 days ago

Why Value-driven Data Science is the Key to Your Success In the world of data…

2 weeks ago

Harnessing the Power of Project-Based Learning and Python for Machine Learning Mastery In today's data-driven…

2 weeks ago

Is Python the right choice for Machine Learning? Should you learn Python for Machine Learning?…

2 weeks ago