Can Twitter Predict the Outcome of the US Presidential Election?

By Andrew Hamlet |  andrewshamlet@gmail.com

This case was presented at The New York Chapter of the American Association for Public Opinion Research in early 2016 and published in Data Visualization Made Simple: Insights into Becoming Visual, Sosulski, K, Routledge: New York.

In November 2015, a group of NYU Stern MBA students (Troy Manos, Keita Shimizu, Tarang Dawer, and Andrew Hamlet) set out to study whether Twitter can predict the outcome of the U.S. presidential election. Based on preliminary research of the opinions of major news outlets, the answer was not clear. There are many different viewpoints on the value of social media, specifically Twitter, to predict election outcomes.  Some see any publicity as good publicity.

“What people say on Twitter or Facebook is a very good indicator of how they will vote.”

“In 2010, …Twitter data predicted the winner in 404 out of 435 competitive [congressional] races.”

“If people must talk about you, even in negative ways, it is a signal that acandidate is on the verge of victory”

The Washington Post

While others questioned if social media could provide a direct measurement of voter intention.  To what extent could interactions on Twitter signal a specific outcome? News sources commented:

“…Twitter is a notably non-representative sample of people.”

“At last count, eight percent of American adults use Twitter daily; only 15 percent are on it at all.”

“In the best of circumstances it is possible to detect the online projections and manifestations of existing offline phenomena that tend to coincide with particular outcomes or events.”

The Atlantic

What do you think? How would you begin to explore these questions?

To even begin to answer if Twitter can predict the outcome of a presidential election you would have to look at the data. What data would you need? You could begin with a social graph of each presidential candidate and their interactions and following on Twitter as seen in Table 1.

DateFavoritesFollowersMentionsPartyPoliticianRetweets
2/25/16235364355899DemocratBernie Sanders1106
2/26/16445572095016DemocratBernie Sanders2243
2/27/16346951477551DemocratBernie Sanders1520
2/28/16061442901DemocratBernie Sanders0
6/16/15171172627681DemocratHillary Clinton813
6/17/1558667896329DemocratHillary Clinton312
6/18/15261866465380DemocratHillary Clinton1688
6/19/15133768824734DemocratHillary Clinton674
6/20/15125957875333DemocratHillary Clinton959
6/21/15191564643950DemocratHillary Clinton655

Table 1. A sample of Twitter data collected on the presidential candidates and the volume of tweets, the audience engagement, and Twitter followers

This is exactly what the team did. However, the data alone did not present any interesting findings. They needed to make sense of the data and developed a methodology for analyzing the Twitter data. They created three key metrics: 1) volume 2) engagement and 3) followers, see Figure 1.

Figure 1.  Methodology for analyzing Twitter data, outlining the three-key metrics.

Volume was measured using two inputs: the number of tweets defined as the daily count of tweets from the respective profile and by the number of mentions defined as the daily count of tweets referencing the respective profile.

Engagement was measured by two inputs: the average retweet per tweet defined as the daily average retweets per tweet from the respective candidate profile and the average favorite per tweet, which was the daily average favorites per tweet from the respective candidate profile.

Followers were measured by the number of followers defined as the daily amount of followers gained by the respective candidate profile

These metrics were gathered daily from 11/1/2015 to 11/30/2015. Each metric was averaged by month, normalized across the candidates, equally weighted, summed for a Total Composite Score, and sorted in descending order to produce a ranking.

  • Averaged by month

[22, 20, 25, 15, 15, 37, 25, 38, 10, 23, 16, 20, 13, 22, 25, 12, 13, 7, 31, 28, 45, 19, 18, 17, 20, 9, 8, 18, 17, 13] / [30]

[20] 

  • Normalized across candidates, divided by the maximum for each metric

[20, 10, 10, 12] / [20]

[1.00, 0.50, 0.50, 0.60] 

  • Equally weighted

number of tweets + number of mentions + avg. retweet per tweet + avg. favorite per tweet + number of followers

  • Summed for Total Composite Score

1.00 + 0.93 + 1.00 + 1.00 + 1.00 = 4.93

  • Sorted in descending order to produce a ranking

The team appended the original data table with their new metrics TW Followers, TW Mentions, TW Retweets, TW Tweets, and Composite Score.

Based on the methodology, the team conducted an analysis of leading candidates from both political parties for the 2016 Presidential Election. They showed what the presidential race looked like on Twitter based on their analysis (see Figure 2). The heat map suggests Donald Trump, when compared to the other candidates, maximized his Twitter presence. Additionally, the heat map shows that the race (as viewed on Twitter) was closer between Democrats than it was among Republicans. 


Figure 2. A heat map displaying Twitter activity across the presidential candidates with Donald Trump leading during the timeframe 11/1/2015 – 11/30/2015.

The heat map then presents the rank of each candidate according to the calculated metrics. The darker shading was used to indicate relative leadership in a category as compared to others. The metrics are divided into four subgroups that correspond the ranges 1 to .75, .74 to 50, .49 to .25, and .24 to 0. Each subgroup was assigned a shade of green.

After analyzing the data from November 2015, Andrew observed how the model would adjust over time. To do this, he categorized the candidates by political party and applied the methodology through the presidential primaries.  Andrew showed how the two leading candidates from the Republican and the Democrat parties ranked according to the model from June 2015 through February 2016, see Figure 3.

Figure 3. Time series of social media behaviors on Twitter showing Donald Trump leading Ted Cruz and Hillary Clinton more narrowly leading Bernie Sanders through February 2016.

As of February 2016, Donald Trump was leading Ted Cruz by 1.23 and Hillary Clinton leading Bernie Sanders by 0.31, both in terms of Total Composite Score. Thus, the line chart presents a smaller gap between Hillary Clinton and Bernie Sanders than between Donald Trump and Ted Cruz.

Around March 2016, as it became clear the race would be between Donald Trump and Hillary Clinton, the methodology was applied to the two nominees. The following time series display (see Figure 4) illustrates how Donald Trump and Hillary Clinton ranked according to the model by month from June 2015 through October 2016.

Figure 4.  Time series of social media behaviors on Twitter showing Donald Trump leading Hillary Clinton throughout the primary and general campaigns.

Donald Trump would go on to win the 2016 Presidential Election. Many cite social media as a contributing factor in the outcome. Or perhaps the insights presented are more indicative of Clinton’s defeat than Trump’s win?

August 6, 2017

“Everyone understands that what gets shared online matters now.”

The New York Times


July 8, 2107

“Given the role that Twitter played in the presidential campaign, we analyzed Mr. Trump’s and Mrs. Clinton’s Twitter accounts in the six months before the election. We found that Mr. Trump benefited by using moral-emotional language (a 15 percent increase in retweets) but Mrs. Clinton did not.”

The New York Times

Visualization simplifies the complex, which is the beauty of the medium. When performed well, it presents information clearly; however, interpreting the information is not always so clear. This was the case for the Twitter prediction project. During the election, the prevailing belief was Trump would not win. The model displayed a different story. Between the perspectives existed the insight. Though, arriving at the insight often involves more questions, such as

  • Who engaged on Twitter with Donald Trump?
  • Is there something about the content of the tweets that leads the audience to engage?
  • If Twitter represents mass public opinion, why did the engagement rates not translate to the popular vote?
  • What is the relationship between social and traditional media?



Python Tutorial: Getting Started

Contact: andrewshamlet@gmail.com // @andrewshamlet

Getting Started

Congratulations, and welcome to Stock Technical Analysis in Python!

You have taken your first step towards making smarter, more disciplined trading decisions.

Before diving in, let’s make sure you have everything you may need.

Anaconda 4.4.0

We recommend downloading Anaconda 4.4.0, with Python 2.7.

– http://www.continuum.io/downloads

Pandas, Numpy, and MatPlotLib

Anaconda comes pre-loaded with the three modules you will use throughout the course.

– http://www.pandas.pydata.org/

– http://www.numpy.org/

– http://www.matplotlib.org/

Quantopian

You will backtest your strategy using the Quantopian platform.-

– http://www.quantopian.com/

StockCharts, Investopedia, and Google Finance

StockCharts, Investopedia, and Google Finance are great resources for financial knowledge.

– http://www.stockcharts.com

– http://www.investopedia.com/

– http://www.google.com/finance

Stack Overflow

Stack Overflow is a great resource for coding questions.-

– http://www.stackoverflow.com/

You are now ready to dive in!

Update – Python Tutorial: MACD (Moving Average Convergence/Divergence)

Download the accompanying IPython Notebook for this Tutorial from Github. 

I received a question from Sam Khorsand about applying the Python Tutorial: MACD (Moving Average Convergence/Divergence) Tutorial   to multiple stocks. The code below will produce yesterday’s MACD Crossover for a list of stocks. Also, by adding ‘to_string(index = False’), you can clean up the date formatting. Enjoy!

import pandas as pd
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime as dt
%matplotlib inline

def MACD(stock, start, end):
    df = pd.DataFrame(web.DataReader(stock,'google',start,end)['Close'])
    df = df.reset_index()
    df['30 mavg'] = pd.rolling_mean(df['Close'], 30)
    df['26 ema'] = pd.ewma(df['Close'], span=26)
    df['12 ema'] = pd.ewma(df['Close'], span=12)
    df['MACD'] = (df['12 ema'] - df['26 ema'])
    df['Signal'] = pd.ewma(df['MACD'], span=9)
    df['Crossover'] = df['MACD'] - df['Signal']
    return stock, df['Date'][-1:].to_string(),df['Crossover'][-1:].mean()
    

stocks = ['FB', 'AAPL', 'GOOG', 'AMZN', 'TSLA']

d = []

for stock in stocks:
    stock, date, macd = MACD(stock, '1/1/2016', dt.datetime.today())
    d.append({'Stock':stock, 'Date':date, 'MACD':macd})
    
df2 = pd.DataFrame(d)
df2[['Date', 'Stock', 'MACD']]
Date Stock MACD
0 249 2017-09-20 FB -0.211440
1 249 2017-09-20 AAPL -0.828956
2 249 2017-09-20 GOOG -0.069812
3 249 2017-09-20 AMZN 1.028655
4 249 2017-09-20 TSLA 2.287354

 

Python Tutorial – Future Returns

Download the accompanying IPython Notebook for this Tutorial from Github. 

Last Tutorial, we outlined steps for calculating the Stochastic Oscillator.

In this Tutorial, we walk through calculating 5-day, 10-day, and 20-day future returns, from historical data.

Caculating 5-day, 10-day, and 20-day future returns will allow us to identify relationships between current technical indicators and future returns.

Let’s use Python to compute the 5-day, 10-day, and 20-day future returns.

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

3.) Define function for calculating the 5-day future return.

def fiveday(close): 
 fiveday = ((close.shift(-5) - close) / close) * 100
 return fiveday

4.) Define function for calculating 10-day future return.

def tenday(close): 
 tenday = ((close.shift(-10) - close) / close) * 100
 return tenday

5.) Define function for calculating 20-day future return.

def twentyday(close): 
 twentyday = ((close.shift(-20) - close) / close) * 100
 return twentyday

6.) Query daily close.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))

7.) Run daily close through fiveday, tenday, and twentday functions. Save series to new columns in dataframe.

df['5 day'] = fiveday(df['Close'])
df['10 day'] = tenday(df['Close'])
df['20 day'] = twentyday(df['Close'])
df = df.dropna()
df.tail()

8.) Plot average 5-day, 10-day, and 20-day future returns.

df2 = df[['5 day', '10 day', '20 day']].mean()
df2.plot(kind='bar')

There you have it! We calculated 5-day, 10-day, and 20-day future returns. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

def fiveday(close): 
 fiveday = ((close.shift(-5) - close) / close) * 100
 return fiveday

def tenday(close): 
 tenday = ((close.shift(-10) - close) / close) * 100
 return tenday

def twentyday(close): 
 twentyday = ((close.shift(-20) - close) / close) * 100
 return twentyday

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['5 day'] = fiveday(df['Close'])
df['10 day'] = tenday(df['Close'])
df['20 day'] = twentyday(df['Close'])
df = df.dropna()
df.tail()

Python Tutorial: Stochastic Oscillator

Download the accompanying IPython Notebook for this Tutorial from Github. 

Last Tutorial, we outlined steps for calculating the Mass Index.

In this Tutorial, we introduce a new technical indicator, the Stochastic Oscillator.

Developed by George C. Lane in the late 1950s, the Stochastic Oscillator is a momentum indicator that shows the location of the close relative to the high-low range over a set number of periods.

The Stochastic Oscillator is calculated as follows:

%K = (Current Close - Lowest Low)/(Highest High - Lowest Low) * 100
%D = 3-day SMA of %K

Lowest Low = lowest low for the look-back period
Highest High = highest high for the look-back period

The default setting for the Stochastic Oscillator is 14 periods, which can be days, weeks, months or an intraday timeframe. A 14-period %K would use the most recent close, the highest high over the last 14 periods and the lowest low over the last 14 periods. %D is a 3-day simple moving average of %K.

As a bound oscillator, the Stochastic Oscillator makes it easy to identify overbought and oversold levels. The oscillator ranges from zero to one hundred. No matter how fast a security advances or declines, the Stochastic Oscillator will always fluctuate within this range. Traditional settings use 80 as the overbought threshold and 20 as the oversold threshold. These levels can be adjusted to suit analytical needs and security characteristics. Readings above 80 for the 20-day Stochastic Oscillator would indicate that the underlying security was trading near the top of its 20-day high-low range. Readings below 20 occur when a security is trading at the low end of its high-low range.

Before looking at some chart examples, it is important to note that overbought readings are not necessarily bearish. Securities can become overbought and remain overbought during a strong uptrend. Closing levels that are consistently near the top of the range indicate sustained buying pressure. In a similar vein, oversold readings are not necessarily bullish. Securities can also become oversold and remain oversold during a strong downtrend. Closing levels consistently near the bottom of the range indicate sustained selling pressure. It is, therefore, important to identify the bigger trend and trade in the direction of this trend. Look for occasional oversold readings in an uptrend and ignore frequent overbought readings. Similarly, look for occasional overbought readings in a strong downtrend and ignore frequent oversold readings.

Let’s use Python to compute the Stochastic Oscillator.

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for the Stochastic Oscillator, both %K and %D.

def STOK(close, low, high, n): 
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 return STOK

def STOD(close, low, high, n):
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 STOD = pd.rolling_mean(STOK, 3)
 return STOD

How does the Stochastic Oscillator function work?

3.a.) To calculate %K, we find the difference between the current close and the lowest low for the look-back period, n. We then find the difference between the highest high for the look-back period, n, and the lowest low for the same look-back period. Dividing these two values and multiplying the result by 100, we arrive at %K, which we set to variable STOK.

#STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100

3.b.) Function returns STOK.

#return STOK

3.c.) To calculate %D, we first calculate %K.

#STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100

3.d.) Then we take the 3 day moving average of %K, and set the value to variable STOD.

#STOD = pd.rolling_mean(STOK, 3) 

3.e.) Function returns STOD.

#return STOD 

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily close, low, and high through %K and %D functions. Save series to new columns in dataframe.

df['%K'] = STOK(df['Close'], df['Low'], df['High'], 14)
df['%D'] = STOD(df['Close'], df['Low'], df['High'], 14)
df.tail()

6.) Plot daily close, %K, and %D.

df.plot(y=['Close'], figsize = (20, 5))
df.plot(y=['%K', '%D'], figsize = (20, 5))

There you have it! We created our Stochastic Oscillator indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

def STOK(close, low, high, n): 
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 return STOK

def STOD(close, low, high, n):
 STOK = ((close - pd.rolling_min(low, n)) / (pd.rolling_max(high, n) - pd.rolling_min(low, n))) * 100
 STOD = pd.rolling_mean(STOK, 3)
 return STOD

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['%K'] = STOK(df['Close'], df['Low'], df['High'], 14)
df['%D'] = STOD(df['Close'], df['Low'], df['High'], 14)
df.tail()

Python Tutorial: Mass Index

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Commodity Channel Index (CCI).

In this Tutorial, we introduce a new technical indicator, the Mass Index.

Developed by Donald Dorsey, the Mass Index uses the high-low range to identify trend reversals based on range expansions. In this sense, the Mass Index is a volatility indicator that does not have a directional bias. Instead, the Mass Index identifies range bulges that can foreshadow a reversal of the current trend.

The Mass Index is calculated as follows:

Single EMA = 9-period exponential moving average (EMA) of the high-low differential 

Double EMA = 9-period EMA of the 9-period EMA of the high-low differential

EMA Ratio = Single EMA divided by Double EMA

Mass Index = 25-period sum of the EMA Ratio

First, the Single EMA provides the average for the high-low range.

Second, the Double EMA provides a second smoothing of this volatility measure.

Using a ratio of these two exponential moving averages normalizes the data series. This ratio shows when the Single EMA becomes large relative to the Double EMA.

The final step, a 25-period summation, acts like a moving average to further smooth the data series.

Overall, the Mass Index rises as the high-low range widens and falls as the high-low range narrows.

Donald Dorsey looked for “reversal bulges” to signal a trend reversal. According to Dorsey, a bulge occurs when the Mass Index moves above 27. This initial bulge does not complete the signal though. Dorsey waited for this bulge to reverse with a move back below 26.50. Once the reversal bulge is complete, traders should use other analysis techniques to determine the direction of the next move. Ideally, a downtrend followed by a reversal bulge would suggest a bullish trend reversal. Conversely, an uptrend followed by a reversal bulge would suggest a bearish trend reversal.

Let’s use Python to compute the Mass Index.

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for Mass Index.

def MassIndex(high, low): 
 Range = high - low 
 EX1 = pd.ewma(Range, span = 9, min_periods = 8) 
 EX2 = pd.ewma(EX1, span = 9, min_periods = 8) 
 Mass = EX1 / EX2 
 MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index') 
 return MassIndex

How does the Mass Index function work?

3.a.) Function calculates the difference between the high and the low, and sets this value to variable Range.

#Range = high - low  

3.b.) Function takes a 9 period Exponential Moving Average of the Range, and sets this value to variable EX1.

#EX1 = pd.ewma(Range, span = 9, min_periods = 8)  

3.c.) Function takes a 9 period Exponential Moving Average of the EX1, to smooth volatility, and sets this value to variable EX2.

#EX2 = pd.ewma(EX1, span = 9, min_periods = 8)  

3.d.) Function takes the ratio of EX1 to EX2, and sets this value to variable Mass.

#Mass = EX1 / EX2  

3.e.) Function calculates the 25 period rolling sum of Mass, and sets this value to variable MassIndex.

#MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index')  

3.f.) Function returns MassIndex.

#return MassIndex

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily high and low through Mass Index function. Save series to new column in dataframe.

df['MassIndex'] = MassIndex(df['High'], df['Low'])
df.tail()

6.) Plot daily close and Mass Index.

df.plot(y=['Close'])
df.plot(y=['MassIndex'])

There you have it! We created our Mass Index indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']
 
def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']
 
def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']
 
def MassIndex(high, low): 
 Range = high - low 
 EX1 = pd.ewma(Range, span = 9, min_periods = 8) 
 EX2 = pd.ewma(EX1, span = 9, min_periods = 8) 
 Mass = EX1 / EX2 
 MassIndex = pd.Series(pd.rolling_sum(Mass, 25), name = 'Mass Index') 
 return MassIndex
 
df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['MassIndex'] = MassIndex(df['High'], df['Low'])
df.tail()

Python Tutorial: CCI

Download the accompanying IPython Notebook for this Tutorial from Github. 

Python streamlines tasks requiring multiple steps in a single block of code. For this reason, it is a great tool for querying and performing analysis on data.

Last Tutorial, we outlined steps for calculating Rate of Change (ROC).

In this Tutorial, we introduce a new technical indicator, the Commodity Channel Index (CCI).

Developed by Donald Lambert, the Commodity Channel Index (CCI) is a versatile indicator that can be used to identify a new trend or warn of extreme conditions. CCI measures the current price level relative to an average price level over a given period of time. CCI is relatively high when prices are far above their average. CCI is relatively low when prices are far below their average. In this manner, CCI can be used to identify overbought and oversold levels.

The Commodity Channel Index (CCI) is calculated as follows:

CCI = (Typical Price  -  n-period SMA of TP) / (Constant x Mean Deviation)

Typical Price (TP) = (High + Low + Close)/3

Constant = .015

 Lambert set the Constant at .015 to ensure that approximately 70 to 80 percent of CCI values would fall between -100 and +100. This percentage also depends on the look-back period. A shorter CCI (10 periods) will be more volatile with a smaller percentage of values between +100 and -100. Conversely, a longer CCI (40 periods) will have a higher percentage of values between +100 and -100.

Lambert set the Constant at .015 to ensure that approximately 70 to 80 percent of CCI values would fall between -100 and +100. This percentage also depends on the look-back period. A shorter CCI (10 periods) will be more volatile with a smaller percentage of values between +100 and -100. Conversely, a longer CCI (40 periods) will have a higher percentage of values between +100 and -100.

The Commodity Channel Index (CCI) can be used as either a coincident or leading indicator. As a coincident indicator, surges above +100 reflect strong price action that can signal the start of an uptrend. Plunges below -100 reflect weak price action that can signal the start of a downtrend.

As a leading indicator, chartists can look for overbought or oversold conditions that may foreshadow a mean reversion. Similarly, bullish and bearish divergences can be used to detect early momentum shifts and anticipate trend reversals.

Let’s use Python to compute the Commodity Channel Index (CCI).

1.) Import modules.

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

2.a.) Define function for querying daily close.

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

2.b.) Define function for querying daily high.

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

2.c.) Define function for querying daily low.

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

3.) Define function for Commodity Channel Index (CCI).

def CCI(close, high, low, n, constant): 
 TP = (high + low + close) / 3 
 CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n)) 
 return CCI

How does the CCI function work?

3.a.) Function calculates Typical Price as the sum of the (High, Low, and Close) divided by three. The function sets this value to variable TP.

#TP = (high + low + close) / 3  

3.b.) Function subtracts n period simple moving average of the Typical Price from the current Typical Price. The difference is divided by the n period standard deviation of the Typical Price multiplied by the constant. The function sets this value to variable CCI.

#CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n))  

3.c.) Function returns CCI

#return CCI

4.) Query daily close, high, and low for ‘FB’ during 2016.

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')

5.) Run daily close, high, and low through CCI function. Save series to new column in dataframe.

df['CCI'] = CCI(df['Close'], df['High'], df['Low'], 20, 0.015)
df.tail()

6.) Plot daily close and CCI.

df.plot(y=['Close'])
df.plot(y=['CCI'])

There you have it! We created our CCI indicator. Here’s the full code:

import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
%matplotlib inline

def get_stock(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Close']

def get_high(stock,start,end):
 return web.DataReader(stock,'google',start,end)['High']

def get_low(stock,start,end):
 return web.DataReader(stock,'google',start,end)['Low']

def CCI(close, high, low, n, constant): 
 TP = (high + low + close) / 3 
 CCI = pd.Series((TP - pd.rolling_mean(TP, n)) / (constant * pd.rolling_std(TP, n)), name = 'CCI_' + str(n)) 
 return CCI

df = pd.DataFrame(get_stock('FB', '1/1/2016', '12/31/2016'))
df['High'] = get_high('FB', '1/1/2016', '12/31/2016')
df['Low'] = get_low('FB', '1/1/2016', '12/31/2016')
df['CCI'] = CCI(df['Close'], df['High'], df['Low'], 20, 0.015)
df.tail()