Is portfolio diversification possible with S&P500 stocks?ΒΆ
This Python notebook plot the correlations of 25 constituents of the S&P500 between 01-03-2022 and 01-03-2024. Most pairwise correlations are positive. In theory, a diversified portfolio can be constructed with positively correlated assets. It is probably not the most diversified portfolio.
First, load packages
import networkx as nx
import numpy as np
import sys
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import yfinance as yf
Define the tickers variable containing the symbols for the stocks. All 25 are constituents of the S&P500
tickers = ['AXON','BA', 'HWM','LMT','NOC','ROST','LULU','NKE','TPR','VFC','ADBE','FICO','INTU','ORCL','SNPS','F','GM',
'TSLA','ABBV','GILD','MRNA','REGN','VRTX','CVX','XOM','HES','NFLX','PARA','DIS','HAL','SLB','AAPL','HPE',
'HPQ','NTAP','STX','MA','PYPL','V']
Download data from Yahoo Finance
data= yf.download(tickers, start="2022-03-01", end="2024-03-01")
data = data['Adj Close']
log_returns = np.log(data/data.shift())
[*********************100%%**********************] 39 of 39 completed
Calculate correlation of returns
corr_m=log_returns.corr()
corr_m = np.asmatrix(corr_m)
Plot the correlation matrix as a heat map
corr_figure=log_returns.corr()
plt.subplots(figsize=(7,7))
plt.suptitle('Visualisation of the matrix',fontsize=18)
sns.heatmap(corr_figure,square=True,cmap="YlGnBu", linewidth=0.5)
<Axes: xlabel='Ticker', ylabel='Ticker'>
The heat map shows that most correlations tend to be below 60%. The diagonal has self-correlations and is obviously equal to 100%. From a risk diversification perspective this is promising. It would be better if there were more negative correlations.
Let's see the pairwise correlations as a network
matrix_array=np.array(corr_m)
matrix_corr=np.matrix(matrix_array,copy=False,dtype=None)
Remove the correlation values=1 ie the self-correlations
matrix_corr_filtered=np.where(matrix_corr!=1.0,matrix_corr,0)
nodes_assets=tickers
Graph_asset=nx.from_numpy_array(matrix_corr_filtered)
plt.subplots(figsize=(10,10))
nx.draw_shell(Graph_asset,with_labels=True,node_color='pink',node_size=500,edge_color='purple')
plt.title('Graph 1: Asset price with both correlation types',fontsize=18)
plt.show()
#plt.savefig('dow_30_corr_coeff_circular.png')
The network shows the huge number of pairwise correlations that arise from a small portfolio of 25 assets. This happens because the correlations are assumed to be equally important. If a hierarchy is introduced, then it is possible to reduce the number of correlations in the portfolio. See Lopez De Prado(2018) for details.
Let's have a look at the negative and positive correlations. First, define two separate matrices that will contain the two different types.
matrix_new=np.array(matrix_corr_filtered)
positive_corr=np.where(matrix_new>0,matrix_new,0)
negative_corr=np.where(matrix_new<0,matrix_new,0)
Graph_asset_negative=nx.from_numpy_array(negative_corr)
plt.subplots(figsize=(7,7))
nx.draw_shell(Graph_asset_negative,with_labels=True,node_color='green',node_size=500,edge_color='lightseagreen')
plt.title('Graph 2: Asset price with negative correlation',fontsize=18)
plt.show()
Shockingly, there are only 3 negative correlations in this subsample of the S&P500. Maybe the sample is biased, but the stocks were chosen from very differnt industries, and theoretically more should be negatively correlated. choice of industries was varied.
Graph_asset_positive=nx.from_numpy_array(positive_corr)
plt.subplots(figsize=(10,10))
nx.draw_shell(Graph_asset_positive,with_labels=True,node_color='teal',node_size=500,edge_color='darkslategrey')
plt.title('Graph 3: Asset price with positive correlation',fontsize=18)
plt.show()
This network is virtually the same as the first graph.
ReferenceΒΆ
Lopez De Prado, M. (2018), Advances in financial machine learning, Wiley