Is portfolio diversification possible with S&P500 stocks?ΒΆ

This Python notebook plot the correlations of 25 constituents of the S&P500 between 01-03-2022 and 01-03-2024. Most pairwise correlations are positive. In theory, a diversified portfolio can be constructed with positively correlated assets. It is probably not the most diversified portfolio.

First, load packages

In [2]:
import networkx as nx
import numpy as np
import sys
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import yfinance as yf

Define the tickers variable containing the symbols for the stocks. All 25 are constituents of the S&P500

In [5]:
tickers = ['AXON','BA',	'HWM','LMT','NOC','ROST','LULU','NKE','TPR','VFC','ADBE','FICO','INTU','ORCL','SNPS','F','GM',
           'TSLA','ABBV','GILD','MRNA','REGN','VRTX','CVX','XOM','HES','NFLX','PARA','DIS','HAL','SLB','AAPL','HPE',
  'HPQ','NTAP','STX','MA','PYPL','V']

Download data from Yahoo Finance

In [8]:
data= yf.download(tickers, start="2022-03-01", end="2024-03-01")
data = data['Adj Close']
log_returns = np.log(data/data.shift())
[*********************100%%**********************]  39 of 39 completed

Calculate correlation of returns

In [11]:
corr_m=log_returns.corr()
corr_m = np.asmatrix(corr_m)

Plot the correlation matrix as a heat map

In [44]:
corr_figure=log_returns.corr()
plt.subplots(figsize=(7,7))
plt.suptitle('Visualisation of the matrix',fontsize=18)
sns.heatmap(corr_figure,square=True,cmap="YlGnBu", linewidth=0.5)
Out[44]:
<Axes: xlabel='Ticker', ylabel='Ticker'>
No description has been provided for this image

The heat map shows that most correlations tend to be below 60%. The diagonal has self-correlations and is obviously equal to 100%. From a risk diversification perspective this is promising. It would be better if there were more negative correlations.

Let's see the pairwise correlations as a network

In [17]:
matrix_array=np.array(corr_m)
matrix_corr=np.matrix(matrix_array,copy=False,dtype=None)

Remove the correlation values=1 ie the self-correlations

In [20]:
matrix_corr_filtered=np.where(matrix_corr!=1.0,matrix_corr,0)
In [26]:
nodes_assets=tickers
Graph_asset=nx.from_numpy_array(matrix_corr_filtered)
In [52]:
plt.subplots(figsize=(10,10))
nx.draw_shell(Graph_asset,with_labels=True,node_color='pink',node_size=500,edge_color='purple')
plt.title('Graph 1: Asset price with both correlation types',fontsize=18)
plt.show()
#plt.savefig('dow_30_corr_coeff_circular.png')
No description has been provided for this image

The network shows the huge number of pairwise correlations that arise from a small portfolio of 25 assets. This happens because the correlations are assumed to be equally important. If a hierarchy is introduced, then it is possible to reduce the number of correlations in the portfolio. See Lopez De Prado(2018) for details.

Let's have a look at the negative and positive correlations. First, define two separate matrices that will contain the two different types.

In [31]:
matrix_new=np.array(matrix_corr_filtered)
positive_corr=np.where(matrix_new>0,matrix_new,0)
negative_corr=np.where(matrix_new<0,matrix_new,0)
In [48]:
Graph_asset_negative=nx.from_numpy_array(negative_corr)
plt.subplots(figsize=(7,7))
nx.draw_shell(Graph_asset_negative,with_labels=True,node_color='green',node_size=500,edge_color='lightseagreen')
plt.title('Graph 2: Asset price with negative correlation',fontsize=18)
plt.show()
No description has been provided for this image

Shockingly, there are only 3 negative correlations in this subsample of the S&P500. Maybe the sample is biased, but the stocks were chosen from very differnt industries, and theoretically more should be negatively correlated. choice of industries was varied.

In [54]:
Graph_asset_positive=nx.from_numpy_array(positive_corr)
plt.subplots(figsize=(10,10))
nx.draw_shell(Graph_asset_positive,with_labels=True,node_color='teal',node_size=500,edge_color='darkslategrey')
plt.title('Graph 3: Asset price with positive correlation',fontsize=18)
plt.show()
No description has been provided for this image

This network is virtually the same as the first graph.

ReferenceΒΆ

Lopez De Prado, M. (2018), Advances in financial machine learning, Wiley