Can Twitter Predict the Outcome of the US Presidential Election?

By Andrew Hamlet |  andrewshamlet@gmail.com

This case was presented at The New York Chapter of the American Association for Public Opinion Research in early 2016 and published in Data Visualization Made Simple: Insights into Becoming Visual, Sosulski, K, Routledge: New York.

In November 2015, a group of NYU Stern MBA students (Troy Manos, Keita Shimizu, Tarang Dawer, and Andrew Hamlet) set out to study whether Twitter can predict the outcome of the U.S. presidential election. Based on preliminary research of the opinions of major news outlets, the answer was not clear. There are many different viewpoints on the value of social media, specifically Twitter, to predict election outcomes.  Some see any publicity as good publicity.

“What people say on Twitter or Facebook is a very good indicator of how they will vote.”

“In 2010, …Twitter data predicted the winner in 404 out of 435 competitive [congressional] races.”

“If people must talk about you, even in negative ways, it is a signal that acandidate is on the verge of victory”

The Washington Post

While others questioned if social media could provide a direct measurement of voter intention.  To what extent could interactions on Twitter signal a specific outcome? News sources commented:

“…Twitter is a notably non-representative sample of people.”

“At last count, eight percent of American adults use Twitter daily; only 15 percent are on it at all.”

“In the best of circumstances it is possible to detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events.”

The Atlantic

What do you think? How would you begin to explore these questions?

To even begin to answer if Twitter can predict the outcome of a presidential election you would have to look at the data. What data would you need? You could begin with a social graph of each presidential candidate and their interactions and following on Twitter as seen in Table 1.

DateFavoritesFollowersMentionsPartyPoliticianRetweets
2/25/16235364355899DemocratBernie Sanders1106
2/26/16445572095016DemocratBernie Sanders2243
2/27/16346951477551DemocratBernie Sanders1520
2/28/16061442901DemocratBernie Sanders0
6/16/15171172627681DemocratHillary Clinton813
6/17/1558667896329DemocratHillary Clinton312
6/18/15261866465380DemocratHillary Clinton1688
6/19/15133768824734DemocratHillary Clinton674
6/20/15125957875333DemocratHillary Clinton959
6/21/15191564643950DemocratHillary Clinton655

Table 1. A sample of Twitter data collected on the presidential candidates and the volume of tweets, the audience engagement, and Twitter followers

This is exactly what the team did. However, the data alone did not present any interesting findings. They needed to make sense of the data and developed a methodology for analyzing the Twitter data. They created three key metrics: 1) volume 2) engagement and 3) followers, see Figure 1.

Figure 1.  Methodology for analyzing Twitter data, outlining the three-key metrics.

Volume was measured using two inputs: the number of tweets defined as the daily count of tweets from the respective profile and by the number of mentions defined as the daily count of tweets referencing the respective profile.

Engagement was measured by two inputs: the average retweet per tweet defined as the daily average retweets per tweet from the respective candidate profile and the average favorite per tweet, which was the daily average favorites per tweet from the respective candidate profile.

Followers were measured by the number of followers defined as the daily amount of followers gained by the respective candidate profile

These metrics were gathered daily from 11/1/2015 to 11/30/2015. Each metric was averaged by month, normalized across the candidates, equally weighted, summed for a Total Composite Score, and sorted in descending order to produce a ranking.

  • Averaged by month

[22, 20, 25, 15, 15, 37, 25, 38, 10, 23, 16, 20, 13, 22, 25, 12, 13, 7, 31, 28, 45, 19, 18, 17, 20, 9, 8, 18, 17, 13] / [30]

[20] 

  • Normalized across candidates, divided by the maximum for each metric

[20, 10, 10, 12] / [20]

[1.00, 0.50, 0.50, 0.60] 

  • Equally weighted

number of tweets + number of mentions + avg. retweet per tweet + avg. favorite per tweet + number of followers

  • Summed for Total Composite Score

1.00 + 0.93 + 1.00 + 1.00 + 1.00 = 4.93

  • Sorted in descending order to produce a ranking

The team appended the original data table with their new metrics TW Followers, TW Mentions, TW Retweets, TW Tweets, and Composite Score.

Based on the methodology, the team conducted an analysis of leading candidates from both political parties for the 2016 Presidential Election. They showed what the presidential race looked like on Twitter based on their analysis (see Figure 2). The heat map suggests Donald Trump, when compared to the other candidates, maximized his Twitter presence. Additionally, the heat map shows that the race (as viewed on Twitter) was closer between Democrats than it was among Republicans. 


Figure 2. A heat map displaying Twitter activity across the presidential candidates with Donald Trump leading during the timeframe 11/1/2015 – 11/30/2015.

The heat map then presents the rank of each candidate according to the calculated metrics. The darker shading was used to indicate relative leadership in a category as compared to others. The metrics are divided into four subgroups that correspond the ranges 1 to .75, .74 to 50, .49 to .25, and .24 to 0. Each subgroup was assigned a shade of green.

After analyzing the data from November 2015, Andrew observed how the model would adjust over time. To do this, he categorized the candidates by political party and applied the methodology through the presidential primaries.  Andrew showed how the two leading candidates from the Republican and the Democrat parties ranked according to the model from June 2015 through February 2016, see Figure 3.

Figure 3. Time series of social media behaviors on Twitter showing Donald Trump leading Ted Cruz and Hillary Clinton more narrowly leading Bernie Sanders through February 2016.

As of February 2016, Donald Trump was leading Ted Cruz by 1.23 and Hillary Clinton leading Bernie Sanders by 0.31, both in terms of Total Composite Score. Thus, the line chart presents a smaller gap between Hillary Clinton and Bernie Sanders than between Donald Trump and Ted Cruz.

Around March 2016, as it became clear the race would be between Donald Trump and Hillary Clinton, the methodology was applied to the two nominees. The following time series display (see Figure 4) illustrates how Donald Trump and Hillary Clinton ranked according to the model by month from June 2015 through October 2016.

Figure 4.  Time series of social media behaviors on Twitter showing Donald Trump leading Hillary Clinton throughout the primary and general campaigns.

Donald Trump would go on to win the 2016 Presidential Election. Many cite social media as a contributing factor in the outcome. Or perhaps the insights presented are more indicative of Clinton’s defeat than Trump’s win?

August 6, 2017

“Everyone understands that what gets shared online matters now.”

The New York Times


July 8, 2107

“Given the role that Twitter played in the presidential campaign, we analyzed Mr. Trump’s and Mrs. Clinton’s Twitter accounts in the six months before the election. We found that Mr. Trump benefited by using moral-emotional language (a 15 percent increase in retweets) but Mrs. Clinton did not.”

The New York Times

Visualization simplifies the complex, which is the beauty of the medium. When performed well, it presents information clearly; however, interpreting the information is not always so clear. This was the case for the Twitter prediction project. During the election, the prevailing belief was Trump would not win. The model displayed a different story. Between the perspectives existed the insight. Though, arriving at the insight often involves more questions, such as

  • Who engaged on Twitter with Donald Trump?
  • Is there something about the content of the tweets that leads the audience to engage?
  • If Twitter represents mass public opinion, why did the engagement rates not translate to the popular vote?
  • What is the relationship between social and traditional media?



Python Tutorial: Getting Started

Contact: andrewshamlet@gmail.com // @andrewshamlet

Getting Started

Congratulations, and welcome to Stock Technical Analysis in Python!

You have taken your first step towards making smarter, more disciplined trading decisions.

Before diving in, let’s make sure you have everything you may need.

Anaconda 4.4.0

We recommend downloading Anaconda 4.4.0, with Python 2.7.

– http://www.continuum.io/downloads

Pandas, Numpy, and MatPlotLib

Anaconda comes pre-loaded with the three modules you will use throughout the course.

– http://www.pandas.pydata.org/

– http://www.numpy.org/

– http://www.matplotlib.org/

Quantopian

You will backtest your strategy using the Quantopian platform.-

– http://www.quantopian.com/

StockCharts, Investopedia, and Google Finance

StockCharts, Investopedia, and Google Finance are great resources for financial knowledge.

– http://www.stockcharts.com

– http://www.investopedia.com/

– http://www.google.com/finance

Stack Overflow

Stack Overflow is a great resource for coding questions.-

– http://www.stackoverflow.com/

You are now ready to dive in!

Python Tutorial: Stochastic Oscillator

Download the accompanying IPython Notebook for this Tutorial from Github. 

Last Tutorial, we outlined steps for calculating the Mass Index.

In this Tutorial, we introduce a new technical indicator, the Stochastic Oscillator.

Developed by George C. Lane in the late 1950s, the Stochastic Oscillator is a momentum indicator that shows the location of the close relative to the high-low range over a set number of periods.

The Stochastic Oscillator is calculated as follows:

%K = (Current Close - Lowest Low)/(Highest High - Lowest Low) * 100
%D = 3-day SMA of %K

Lowest Low = lowest low for the look-back period
Highest High = highest high for the look-back period

The default setting for the Stochastic Oscillator is 14 periods, which can be days, weeks, months or an intraday timeframe. A 14-period %K would use the most recent close, the highest high over the last 14 periods and the lowest low over the last 14 periods. %D is a 3-day simple moving average of %K.

As a bound oscillator, the Stochastic Oscillator makes it easy to identify overbought and oversold levels. The oscillator ranges from zero to one hundred. No matter how fast a security advances or declines, the Stochastic Oscillator will always fluctuate within this range. Traditional settings use 80 as the overbought threshold and 20 as the oversold threshold. These levels can be adjusted to suit analytical needs and security characteristics. Readings above 80 for the 20-day Stochastic Oscillator would indicate that the underlying security was trading near the top of its 20-day high-low range. Readings below 20 occur when a security is trading at the low end of its high-low range.

Before looking at some chart examples, it is important to note that overbought readings are not necessarily bearish. Securities can become overbought and remain overbought during a strong uptrend. Closing levels that are consistently near the top of the range indicate sustained buying pressure. In a similar vein, oversold readings are not necessarily bullish. Securities can also become oversold and remain oversold during a strong downtrend. Closing levels consistently near the bottom of the range indicate sustained selling pressure. It is, therefore, important to identify the bigger trend and trade in the direction of this trend. Look for occasional oversold readings in an uptrend and ignore frequent overbought readings. Similarly, look for occasional overbought readings in a strong downtrend and ignore frequent oversold readings.

Let’s use Python to compute the Stochastic Oscillator.

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for the Stochastic Oscillator, both %K and %D.

def STOK(close, low, high, n): 
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 return STOK

def STOD(close, low, high, n):
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 STOD = pd.rolling_mean(STOK, 3)
 return STOD

How does the Stochastic Oscillator function work?

3.a.) To calculate %K, we find the difference between the current close and the lowest low for the look-back period, n. We then find the difference between the highest high for the look-back period, n, and the lowest low for the same look-back period. Dividing these two values and multiplying the result by 100, we arrive at %K, which we set to variable STOK.

#STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100

3.b.) Function returns STOK.

#return STOK

3.c.) To calculate %D, we first calculate %K.

#STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100

3.d.) Then we take the 3 day moving average of %K, and set the value to variable STOD.

#STOD = pd.rolling_mean(STOK, 3) 

3.e.) Function returns STOD.

#return STOD 

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily close, low, and high through %K and %D functions. Save series to new columns in dataframe.

df['%K'] = STOK(df['Close'], df['Low'], df['High'], 14)
df['%D'] = STOD(df['Close'], df['Low'], df['High'], 14)
df.tail()

6.) Plot daily close, %K, and %D.

df.plot(y=['Close'], figsize = (20, 5))
df.plot(y=['%K', '%D'], figsize = (20, 5))

There you have it! We created our Stochastic Oscillator indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

def STOK(close, low, high, n): 
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 return STOK

def STOD(close, low, high, n):
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 STOD = pd.rolling_mean(STOK, 3)
 return STOD

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['%K'] = STOK(df['Close'], df['Low'], df['High'], 14)
df['%D'] = STOD(df['Close'], df['Low'], df['High'], 14)
df.tail()

Python Tutorial: Mass Index

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Commodity Channel Index (CCI).

In this Tutorial, we introduce a new technical indicator, the Mass Index.

Developed by Donald Dorsey, the Mass Index uses the high-low range to identify trend reversals based on range expansions. In this sense, the Mass Index is a volatility indicator that does not have a directional bias. Instead, the Mass Index identifies range bulges that can foreshadow a reversal of the current trend.

The Mass Index is calculated as follows:

Single EMA = 9-period exponential moving average (EMA) of the high-low differential 

Double EMA = 9-period EMA of the 9-period EMA of the high-low differential

EMA Ratio = Single EMA divided by Double EMA

Mass Index = 25-period sum of the EMA Ratio

First, the Single EMA provides the average for the high-low range.

Second, the Double EMA provides a second smoothing of this volatility measure.

Using a ratio of these two exponential moving averages normalizes the data series. This ratio shows when the Single EMA becomes large relative to the Double EMA.

The final step, a 25-period summation, acts like a moving average to further smooth the data series.

Overall, the Mass Index rises as the high-low range widens and falls as the high-low range narrows.

Donald Dorsey looked for “reversal bulges” to signal a trend reversal. According to Dorsey, a bulge occurs when the Mass Index moves above 27. This initial bulge does not complete the signal though. Dorsey waited for this bulge to reverse with a move back below 26.50. Once the reversal bulge is complete, traders should use other analysis techniques to determine the direction of the next move. Ideally, a downtrend followed by a reversal bulge would suggest a bullish trend reversal. Conversely, an uptrend followed by a reversal bulge would suggest a bearish trend reversal.

Let’s use Python to compute the Mass Index.

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for Mass Index.

def MassIndex(high, low): 
 Range = high - low 
 EX1 = pd.ewma(Range, span = 9, min_periods = 8) 
 EX2 = pd.ewma(EX1, span = 9, min_periods = 8) 
 Mass = EX1 / EX2 
 MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index') 
 return MassIndex

How does the Mass Index function work?

3.a.) Function calculates the difference between the high and the low, and sets this value to variable Range.

#Range = high - low  

3.b.) Function takes a 9 period Exponential Moving Average of the Range, and sets this value to variable EX1.

#EX1 = pd.ewma(Range, span = 9, min_periods = 8)  

3.c.) Function takes a 9 period Exponential Moving Average of the EX1, to smooth volatility, and sets this value to variable EX2.

#EX2 = pd.ewma(EX1, span = 9, min_periods = 8)  

3.d.) Function takes the ratio of EX1 to EX2, and sets this value to variable Mass.

#Mass = EX1 / EX2  

3.e.) Function calculates the 25 period rolling sum of Mass, and sets this value to variable MassIndex.

#MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index')  

3.f.) Function returns MassIndex.

#return MassIndex

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily high and low through Mass Index function. Save series to new column in dataframe.

df['MassIndex'] = MassIndex(df['High'], df['Low'])
df.tail()

6.) Plot daily close and Mass Index.

df.plot(y=['Close'])
df.plot(y=['MassIndex'])

There you have it! We created our Mass Index indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']
 
def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']
 
def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']
 
def MassIndex(high, low): 
 Range = high - low 
 EX1 = pd.ewma(Range, span = 9, min_periods = 8) 
 EX2 = pd.ewma(EX1, span = 9, min_periods = 8) 
 Mass = EX1 / EX2 
 MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index') 
 return MassIndex
 
df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['MassIndex'] = MassIndex(df['High'], df['Low'])
df.tail()

Python Tutorial: CCI

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Rate of Change (ROC).

In this Tutorial, we introduce a new technical indicator, the Commodity Channel Index (CCI).

Developed by Donald Lambert, the Commodity Channel Index (CCI) is a versatile indicator that can be used to identify a new trend or warn of extreme conditions. CCI measures the current price level relative to an average price level over a given period of time. CCI is relatively high when prices are far above their average. CCI is relatively low when prices are far below their average. In this manner, CCI can be used to identify overbought and oversold levels.

The Commodity Channel Index (CCI) is calculated as follows:

CCI = (Typical Price  -  n-period SMA of TP) / (Constant x Mean Deviation)

Typical Price (TP) = (High + Low + Close)/3

Constant = .015

 Lambert set the Constant at .015 to ensure that approximately 70 to 80 percent of CCI values would fall between -100 and +100. This percentage also depends on the look-back period. A shorter CCI (10 periods) will be more volatile with a smaller percentage of values between +100 and -100. Conversely, a longer CCI (40 periods) will have a higher percentage of values between +100 and -100.

Lambert set the Constant at .015 to ensure that approximately 70 to 80 percent of CCI values would fall between -100 and +100. This percentage also depends on the look-back period. A shorter CCI (10 periods) will be more volatile with a smaller percentage of values between +100 and -100. Conversely, a longer CCI (40 periods) will have a higher percentage of values between +100 and -100.

The Commodity Channel Index (CCI) can be used as either a coincident or leading indicator. As a coincident indicator, surges above +100 reflect strong price action that can signal the start of an uptrend. Plunges below -100 reflect weak price action that can signal the start of a downtrend.

As a leading indicator, chartists can look for overbought or oversold conditions that may foreshadow a mean reversion. Similarly, bullish and bearish divergences can be used to detect early momentum shifts and anticipate trend reversals.

Let’s use Python to compute the Commodity Channel Index (CCI).

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for Commodity Channel Index (CCI).

def CCI(close, high, low, n, constant): 
 TP = (high + low + close) / 3 
 CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n)) 
 return CCI

How does the CCI function work?

3.a.) Function calculates Typical Price as the sum of the (High, Low, and Close) divided by three. The function sets this value to variable TP.

#TP = (high + low + close) / 3  

3.b.) Function subtracts n period simple moving average of the Typical Price from the current Typical Price. The difference is divided by the n period standard deviation of the Typical Price multiplied by the constant. The function sets this value to variable CCI.

#CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n))  

3.c.) Function returns CCI

#return CCI

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily close, high, and low through CCI function. Save series to new column in dataframe.

df['CCI'] = CCI(df['Close'], df['High'], df['Low'], 20, 0.015)
df.tail()

6.) Plot daily close and CCI.

df.plot(y=['Close'])
df.plot(y=['CCI'])

There you have it! We created our CCI indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

def CCI(close, high, low, n, constant): 
 TP = (high + low + close) / 3 
 CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n)) 
 return CCI

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['CCI'] = CCI(df['Close'], df['High'], df['Low'], 20, 0.015)
df.tail()

Python Tutorial: ROC

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Relative Strength Index (RSI).
In this Tutorial, we introduce a new technical indicator, the Rate of Change (ROC).

‘The only thing constant is change’

The Rate of Change (ROC) is a technical indicator of momentum that measures the percentage change in price between the current price and the price n periods in the past.

The Rate of Change (ROC) is calculated as follows:

ROC = ((Most recent closing price - Closing price n periods ago) / Closing price n periods ago) x 100

The Rate of Change (ROC) is classed as a momentum indicator because it measures strength of price momentum. For example, if a stock’s price at the close of trading today is 10, and the closing price five trading days prior was 7, then the Rate of Change (ROC) over that time frame is approximately 43, calculated as (10 – 7 / 7) x 100 = 42.85.

 Positive values indicate upward buying pressure or momentum, while negative values below zero indicate selling pressure or downward momentum. Increasing values in either direction, positive or negative, indicate increasing momentum, and decreasing values indicate waning momentum.

The Rate of Change (ROC) is also sometimes used to indicate overbought or oversold conditions for a security. Positive values that are greater than 30 are generally interpreted as indicating overbought conditions, while negative values lower than negative 30 indicate oversold conditions.

 Let’s use Python to compute the Rate of Change (ROC).

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.) Define function for querying daily close.

def get_stock(stock,start,end):
     return web.DataReader(stock,'google',start,end)['Close']

3.) Define function for Rate of Change (ROC).

def ROC(df, n):  
    M = df.diff(n - 1)  
    N = df.shift(n - 1)  
    ROC = pd.Series(((M / N) * 100), name = 'ROC_' + str(n))   
    return ROC

How does the ROC function work?

3.a.) Function calculates difference in most recent closing price from closing price n periods ago. Sets the value to variable M.
#M = df.diff(n - 1)

3.b.) Function calculates closing price n periods ago. Sets the value to variable N.

#N = df.shift(n - 1)

3.c.) Function creates series called ROC that is ((M/N) * 100)

#ROC = pd.Series(((M / N) * 100), name = 'ROC_' + str(n))

3.d.) Function returns ROC

#return ROC

4.) Query daily close for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))

5.) Run daily close through ROC function. Save series to new column in dataframe.

df['ROC'] = ROC(df['Close'], 12)
df.tail()

6.) Plot daily close and ROC.

df.plot(y=['Close'])
df.plot(y=['ROC'])

There you have it! We created our ROC indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
     return web.DataReader(stock,'google',start,end)['Close']
    
def ROC(df, n):  
    M = df.diff(n - 1)  
    N = df.shift(n - 1)  
    ROC = pd.Series(((M / N) * 100), name = 'ROC_' + str(n))   
    return ROC
    
df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['ROC'] = ROC(df['Close'], 12)
df.tail()

Quantopian: RSI Strategy Backtest

Relative Strength Index (RSI) Strategy Backtest

Contact: andrewshamlet@gmail.com // @andrewshamlet

Download the IPython Notebook that accompanies this Tutorial from Github. 

View the Quantopian Backtest here. 

Summary

  • The Relative Strength Index (RSI) is a momentum indicator that compares the magnitude of recent gains and losses over a specified time period. RSI values range from 0 to 100.
  • For this strategy, we buy $FB when the RSI is less than 30, and we will sell $FB when the RSI is greater than 70. The RSI will be calculated at a minutely frequency, as opposed to a daily frequency.
  • During 01/01/16 – 12/31/16,
    • The RSI Strategy produces 32.2% return, resulting in $3,220 pre-tax return.
    • FB Buy & Hold produces 10.0% return, resulting in $1,000 pre-tax return.
    • SPY Buy & Hold produces 12.0% return, resulting in $1,200 pre-tax return.
    • Compared to the SPY Buy & Hold, the RSI Strategy produces $2,220 Alpha whereas FB Buy & Hold produces ($200) Alpha, both on $10,000, principal.
  • During 05/19/12 – 12/31/16,
    • The RSI Strategy produces 147.4% return, resulting in $14,740 pre-tax return.
    • FB Buy & Hold produces 238.5% return, resulting in $23,850 pre-tax return.
    • SPY Buy & Hold produces 89.6% return, resulting in $8,960 pre-tax return.
    • Compared to SPY Buy & Hold, the RSI Strategy produces $5,780 Alpha whereas FB Buy & Hold produces $14,890 Alpha, both on $10,000 principal.
  • Thus, on the broader time horizon, FB Buy & Hold outperforms the RSI Strategy.
  • The question still stands: what about 2016 makes the RSI Strategy superior in performance to FB Buy & Hold?

 

Introduction 

In this post, we use Quantopian to build and backtest a Relative Strength Index (RSI) trading strategy.

 

Quantopian

About Quantopian:

Quantopian provides capital, education, data, a research environment, and a development platform to algorithm authors (quants). Quantopian provides everything a quant needs to create a strategy and profit from it.

Quantopian’s members include finance professionals, scientists, developers, and students from more than 180 countries from around the world. The members collaborate in our forums and in person at regional meetups, workshops, and QuantCon, Quantopian’s flagship annual event.”

In other words, Quantopian is a website where one can build, test, and deploy trading strategies, using Python.

 

Relative Strength Index

To review, the Relative Strength Index (RSI) is a momentum indicator that compares the magnitude of recent gains and losses over a specified time period to measure speed and change of price movements of a security. It is primarily used to identify overbought or oversold conditions in the trading of an asset.

RSI values range from 0 to 100.

The Relative Strength Index (RSI) is calculated as follows:

RSI = 100 - 100 / (1 + RS)

RS = Average gain of last 14 trading days / Average loss of last 14 trading days

 

Strategy

For this strategy, we buy $FB when the RSI is less than 30, and we will sell $FB when the RSI is greater than 70. The RSI will be calculated at a minutely frequency, as opposed to a daily frequency.

Trading Strategy

Buy - RSI < 30

Sell - RSI > 70

 

Code

Here is the Python code for the RSI Strategy.

import talib
import numpy as np
import pandas as pd

def initialize(context):
    context.stocks = symbols('FB')
    context.pct_per_stock = 1.0 / len(context.stocks)
    context.LOW_RSI = 30
    context.HIGH_RSI = 70
    
    set_benchmark(sid(42950))  
    
def handle_data(context, data):
    prices = data.history(context.stocks, 'price', 40, '1d')

    rsis = {}
    
    for stock in context.stocks:
        rsi = talib.RSI(prices[stock], timeperiod=14)[-1]
        rsis[stock] = rsi
        
        current_position = context.portfolio.positions[stock].amount
        
        if rsi > context.HIGH_RSI and current_position > 0 and data.can_trade(stock):
            order_target(stock, 0)

        elif rsi < context.LOW_RSI and current_position == 0 and data.can_trade(stock):
            order_target_percent(stock, context.pct_per_stock)

    record(FB_rsi=rsis[symbol('FB')])

At its foundation, Quantopian code is made up of three chunks: import modules, initialize, and handle_data.

1.) First we import the Talib, Numpy, and Pandas modules. As we’ll see, Talib streamlines the calculation of Technical Indicators.

import talib 
import numpy as np 
import pandas as pd

2.)  The initialize function:

def initialize(context): 
     context.stocks = symbols('FB') 
     context.pct_per_stock = 1.0 / len(context.stocks) 
     context.LOW_RSI = 30 
     context.HIGH_RSI = 70 

     set_benchmark(sid(42950)) 

2.a.) Define the security to trade, $FB.

context.stocks = symbols('FB') 

2.b.) Define the weight of each security. Since the RSI Strategy trades one security, the weight is 1.0. If there were two securities, the weight would be 0.5.

context.pct_per_stock = 1.0 / len(context.stocks) 

2.c.) Define the LOW_RSI value as 30

context.LOW_RSI = 30 

2.d.) Define the HIGH_RSI value as 70

context.HIGH_RSI = 70 

2.e.) Define the benchmark to which we will compare our strategy. In the example, the benchmark is set to $FB, essentially a buy and hold strategy. Remove ‘set_benchmark()’ to set the benchmark to the standard, ‘SPY’, or market rate.

set_benchmark(sid(42950)) 

3.)  The handle_data function:

def handle_data(context, data): 
     prices = data.history(context.stocks, 'price', 40, '1d') 

     rsis = {} 

     for stock in context.stocks: 
          rsi = talib.RSI(prices[stock], timeperiod=14)[-1] 
          rsis[stock] = rsi 

          current_position = context.portfolio.positions[stock].amount 

          if rsi > context.HIGH_RSI and current_position > 0 and data.can_trade(stock): 
               order_target(stock, 0) 

          elif rsi < context.LOW_RSI and current_position == 0 and data.can_trade(stock): 
               order_target_percent(stock, context.pct_per_stock) 

     record(FB_rsi=rsis[symbol('FB')])

3.a.) Query the ‘FB’ historical price data for the past 40 trading days.

prices = data.history(context.stocks, 'price', 40, '1d') 

3.b.) Create dictionary of RSI values.

rsis = {} 

3.c.) Create for loop for RSI calculation and order logic.

for stock in context.stocks: 

3.d.) Use Talib to calculate Relative Strength Index.

rsi = talib.RSI(prices[stock], timeperiod=14)[-1]

3.e.) Save Talib output to dictionary.

rsis[stock] = rsi 

3.f.) Save current portfolio positions in order to not execute too many/few orders.

current_position = context.portfolio.positions[stock].amount 

3.g.) Order logic: if RSI is greater than 70 and positions are greater than 0, then sell all positions.

if rsi > context.HIGH_RSI and current_position > 0 and data.can_trade(stock): 
               order_target(stock, 0) 

3.h.) Order logic: if RSI is less than 30 and positions are equal to 0, then buy positions equal to weight defined in initialize function.

elif rsi < context.LOW_RSI and current_position == 0 and data.can_trade(stock): 
               order_target_percent(stock, context.pct_per_stock) 

3.i.) Chart RSI data for $FB.

record(FB_rsi=rsis[symbol('FB')])

1 Year Performance

For the time period, 01/01/16 – 12/31/16

% Return Principal Pre-Tax Return Alpha
RSI Strategy 32.2% $10,000 $3,220 $2,220
FB Buy & Hold 10.0% $10,000 $1,000 ($200)
SPY Buy & Hold 12.0% $10,000 $1,200 N/A

 

We backtest the RSI Strategy with a $10,000 principal for the time period, 01/01/16 – 12/31/16. 

During 01/01/16 – 12/31/16,

  • The RSI Strategy produces 32.2% return, resulting in $3,220 pre-tax return.
  • FB Buy & Hold produces 10.0% return, resulting in $1,000 pre-tax return.
  • SPY Buy & Hold produces 12.0% return, resulting in $1,200 pre-tax return.
  • Compared to the SPY Buy & Hold, the RSI Strategy produces $2,220 Alpha whereas FB Buy & Hold produces ($200) Alpha, both on $10,000, principal.

 

 

Beyond 1 Year Performance

Yes, $2,220 Alpha on $10,000 principal is impressive.

Before we go and bet the farm, let’s see how the RSI Strategy performs over a longer time period.

Since the ‘FB’ IPO occurred on 05/18/12, we will backtest for the period 05/19/12 – 12/31/16.

For the time period, 05/19/12 – 12/31/16

% Return Principal Pre-Tax Return Alpha
RSI Strategy 147.4% $10,000 $14,740 $5,780
FB Buy & Hold 238.5% $10,000 $23,850 $14,890
SPY Buy & Hold 89.6% 10,000 $8,960 N/A

During 05/19/12 – 12/31/16,

  • The RSI Strategy produces 147.4% return, resulting in $14,740 pre-tax return.
  • FB Buy & Hold produces 238.5% return, resulting in $23,850 pre-tax return.
  • SPY Buy & Hold produces 89.6% return, resulting in $8,960 pre-tax return.
  • Compared to SPY Buy & Hold, the RSI Strategy produces $5,780 Alpha whereas FB Buy & Hold produces $14,890 Alpha, both on $10,000 principal.

Thus, on the broader time horizon, FB Buy & Hold outperforms the RSI Strategy.

 

Concluding Thought

Over the long term, money would go further with the FB Buy & Hold strategy.

The question still stands: what about 2016 makes the RSI Strategy superior in performance to FB Buy & Hold?

Until next time!

Python Tutorial: RSI

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Price Channels.

In this Tutorial, we introduce a new technical indicator, the Relative Strenght Index (RSI).

The Relative Strength Index (RSI) is a momentum indicator developed by noted technical analyst Welles Wilder, that compares the magnitude of recent gains and losses over a specified time period to measure speed and change of price movements of a security. It is primarily used to identify overbought or oversold conditions in the trading of an asset.

The Relative Strength Index (RSI) is calculated as follows:

RSI = 100 - 100 / (1 + RS)

RS = Average gain of last 14 trading days / Average loss of last 14 trading days

RSI values range from 0 to 100.

Traditional interpretation and usage of the RSI is that RSI values of 70 or above indicate that a security is becoming overbought or overvalued, and therefore may be primed for a trend reversal or corrective pullback in price. On the other side, an RSI reading of 30 or below is commonly interpreted as indicating an oversold or undervalued condition that may signal a trend change or corrective price reversal to the upside.

Let’s use Python to compute the Relative Strenght Index (RSI).

1.) Import modules (numpy included).

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

3.) Define function for RSI.

def RSI(series, period):
 delta = series.diff().dropna()
 u = delta * 0
 d = u.copy()
 u[delta > 0] = delta[delta > 0]
 d[delta < 0] = -delta[delta < 0]
 u[u.index[period-1]] = np.mean( u[:period] ) #first value is sum of avg gains
 u = u.drop(u.index[:(period-1)])
 d[d.index[period-1]] = np.mean( d[:period] ) #first value is sum of avg losses
 d = d.drop(d.index[:(period-1)])
 rs = pd.stats.moments.ewma(u, com=period-1, adjust=False) / \
 pd.stats.moments.ewma(d, com=period-1, adjust=False)
 return 100 - 100 / (1 + rs)

How does the RSI function work?

– 3.a.) Function creates two series of daily differences.

– 3.b.) One series is daily positive differences, i.e. gains.

– 3.c.) One series is daily negative difference, i.e. losses.

– 3.d.) Average daily positive differences for the period specified.

– 3.e.) Average daily negative difference for the period specified.

– 3.f.) RS is set equal to Exponential Moving Average of daily positive differences for the period sepcified / Exponential Moving Average of daily positive differences for the period sepcified.

– 3.g) Return 100 – 100 / (1 + RS)

 4.) Query daily close for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))

5.) Run daily close through RSI function. Save series to new column in dataframe.

df['RSI'] = RSI(df['Close'], 14)
df.tail()

6.) Plot daily close and RSI.

df.plot(y=['Close'])
df.plot(y=['RSI'])

There you have it! We created our RSI indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']
 
def RSI(series, period):
 delta = series.diff().dropna()
 u = delta * 0
 d = u.copy()
 u[delta > 0] = delta[delta > 0]
 d[delta < 0] = -delta[delta < 0]
 u[u.index[period-1]] = np.mean( u[:period] ) #first value is sum of avg gains
 u = u.drop(u.index[:(period-1)])
 d[d.index[period-1]] = np.mean( d[:period] ) #first value is sum of avg losses
 d = d.drop(d.index[:(period-1)])
 rs = pd.stats.moments.ewma(u, com=period-1, adjust=False) / \
 pd.stats.moments.ewma(d, com=period-1, adjust=False)
 return 100 - 100 / (1 + rs)
 
df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['RSI'] = RSI(df['Close'], 14)
df.tail()