Learn how you can become a Python programmer in just 12 weeks.

    We respect your privacy. Unsubscribe at anytime.

    Master Markowitz Portfolio Optimization (Efficient Frontier) in Python using Pandas

    What is Markowitz Portfolios Optimization (Efficient Frontier)?

    The Efficient Frontier takes a portfolio of investments and optimizes the expected return in regards to the risk. That is to find the optimal return for a risk.

    According to investopedia.org the return is based on the expected Compound Annual Growth Rate (CAGR) and risk metric is the standard deviation of the return.

    But what does all that mean? We will learn that in this tutorial.

    Step 1: Get the time series of your stock portfolio

    We will use the following portfolio of 4 stocks of Apple (AAPL), Microsoft (MSFT), IBM (IBM) and Nvidia (NVDA).

    To get the time series we will use the Yahoo! Finance API through the Pandas-datareader.

    We will look 5 years back.

    import pandas_datareader as pdr
    import pandas as pd
    import datetime as dt
    from dateutil.relativedelta import relativedelta
    years = 5
    end_date = dt.datetime.now()
    start_date = end_date - relativedelta(years=years)
    close_price = pd.DataFrame()
    tickers = ['AAPL','MSFT','IBM','NVDA']
    for ticker in tickers:
      tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
      close_price[ticker] = tmp['Close']
    print(close_price)
    

    Resulting in the following output (or the first few lines).

                      AAPL        MSFT         IBM        NVDA
    Date                                                      
    2015-08-25  103.739998   40.470001  140.960007   20.280001
    2015-08-26  109.690002   42.709999  146.699997   21.809999
    2015-08-27  112.919998   43.900002  148.539993   22.629999
    2015-08-28  113.290001   43.930000  147.979996   22.730000
    2015-08-31  112.760002   43.520000  147.889999   22.480000
    

    It will contain all the date time series for the last 5 years from current date.

    Step 2: Calculate the CAGR, returns, and covariance

    To calculate the expected return, we use the Compound Average Growth Rate (CAGR) based on the last 5 years. The CAGR is used as investopedia suggest. An alternative that also is being used is the mean of the returns. The key thing is to have some common measure of the return.

    The CAGR is calculated as follows.

    CAGR = (end-price/start-price)^(1/years) – 1

    We will also calculate the covariance as we will use that the calculate the variance of a weighted portfolio. Remember that the standard deviation is given by the following.

    sigma = sqrt(variance)

    A portfolio is a vector w with the balances of each stock. For example, given w = [0.2, 0.3, 0.4, 0.1], will say that we have 20% in the first stock, 30% in the second, 40% in the third, and 10% in the final stock. It all sums up to 100%.

    Given a weight w of the portfolio, you can calculate the variance of the stocks by using the covariance matrix.

    variance = w^T Cov w

    Where Cov is the covariance matrix.

    This results in the following pre-computations.

    returns = close_price/close_price.shift(1)
    cagr = (close_price.iloc[-1]/close_price.iloc[0])**(1/years) - 1
    cov = returns.cov()
    print(cagr)
    print(cov)
    

    Where you can see the output here.

    # CACR:
    AAPL    0.371509
    MSFT    0.394859
    IBM    -0.022686
    NVDA    0.905011
    dtype: float64
    # Covariance
              AAPL      MSFT       IBM      NVDA
    AAPL  0.000340  0.000227  0.000152  0.000297
    MSFT  0.000227  0.000303  0.000164  0.000306
    IBM   0.000152  0.000164  0.000260  0.000210
    NVDA  0.000297  0.000306  0.000210  0.000879
    

    Step 3: Plot the return and risk

    This is where the power of computing comes into the picture. The idea is to just try a random portfolio and see how it rates with regards to expected return and risk.

    It is that simple. Make a random weighted distribution of your portfolio and plot the point of expected return (based on our CAGR) and the risk based on the standard deviation calculated by the covariance.

    import matplotlib.pyplot as plt
    import numpy as np
    def random_weights(n):
        k = np.random.rand(n)
        return k / sum(k)
    exp_return = []
    sigma = []
    for _ in range(20000):
      w = random_weights(len(tickers))
      exp_return.append(np.dot(w, cagr.T))
      sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
    plt.plot(sigma, exp_return, 'ro', alpha=0.1) 
    plt.show()
    

    We introduce a helper function random_weights, which returns a weighted portfolio. That is, it returns a vector with entries that sum up to one. This will give a way to distribute our portfolio of stocks.

    Then we iterate 20.000 times (could be any value, just want to have enough to plot our graph), where we make a random weight w, then calculate the expected return by the dot-product of w and cagr-transposed. This is done by using NumPy’s dot-product function.

    What a dot-product of np.dot(w, cagr.T) does is to take elements pairwise from w and cagr and multiply them and sum up. The transpose is only about the orientation of it to make it work.

    The standard deviation (assigned to sigma) is calculated similar by the formula given in the last step: variance = w^T Cov w (which has dot-products between).

    This results in the following graph.

    Returns vs risks

    This shows a graph which outlines a parabola. The optimal values lie along the upper half of the parabola line. Hence, given a risk, the optimal portfolio is one corresponding on the upper boarder of the filled parabola.

    Considerations

    The Efficient Frontier gives you a way to balance your portfolio. The above code can by trial an error find such a portfolio, but it still leaves out some consideratoins.

    How often should you re-balance? It has a cost to do that.

    The theory behind has some assumptions that may not be a reality. As investopedia points out, it assumes that asset returns follow a normal distribution, but in reality returns can be more the 3 standard deviations away. Also, the theory builds upon that investors are rational in their investment, which is by most considered a flawed assumption, as more factors play into the investments.

    The full source code

    Below here you find the full source code from the tutorial.

    import pandas_datareader as pdr
    import datetime as dt
    import pandas as pd
    from dateutil.relativedelta import relativedelta
    import matplotlib.pyplot as plt
    import numpy as np
    
    years = 5
    end_date = dt.datetime.now()
    start_date = end_date - relativedelta(years=years)
    close_price = pd.DataFrame()
    tickers = ['AAPL', 'MSFT', 'IBM', 'NVDA']
    for ticker in tickers:
        tmp = pdr.get_data_yahoo(ticker, start_date, end_date)
        close_price[ticker] = tmp['Close']
    returns = close_price / close_price.shift(1)
    cagr = (close_price.iloc[-1] / close_price.iloc[0]) ** (1 / years) - 1
    cov = returns.cov()
    def random_weights(n):
        k = np.random.rand(n)
        return k / sum(k)
    exp_return = []
    sigma = []
    for _ in range(20000):
        w = random_weights(len(tickers))
        exp_return.append(np.dot(w, cagr.T))
        sigma.append(np.sqrt(np.dot(np.dot(w.T, cov), w)))
    plt.plot(sigma, exp_return, 'ro', alpha=0.1)
    plt.show()
    

    Python Circle

    Do you know what the 5 key success factors every programmer must have?

    How is it possible that some people become programmer so fast?

    While others struggle for years and still fail.

    Not only do they learn python 10 times faster they solve complex problems with ease.

    What separates them from the rest?

    I identified these 5 success factors that every programmer must have to succeed:

    1. Collaboration: sharing your work with others and receiving help with any questions or challenges you may have.
    2. Networking: the ability to connect with the right people and leverage their knowledge, experience, and resources.
    3. Support: receive feedback on your work and ask questions without feeling intimidated or judged.
    4. Accountability: stay motivated and accountable to your learning goals by surrounding yourself with others who are also committed to learning Python.
    5. Feedback from the instructor: receiving feedback and support from an instructor with years of experience in the field.

    I know how important these success factors are for growth and progress in mastering Python.

    That is why I want to make them available to anyone struggling to learn or who just wants to improve faster.

    With the Python Circle community, you can take advantage of 5 key success factors every programmer must have.

    Python Circle
    Python Circle

    Be part of something bigger and join the Python Circle community.

    Leave a Comment