Up to +0.9 Correlation between Twitterwonk Scores and National Polls

455906_2bf3212686734defb5f3b366a29c3c86 I have found, on average, a +0.6 correlation between Twitterwonk Scores and national polls. Of the four leading candidates, Bernie Sanders (displayed) shows the highest correlation, +0.9. The national poll data used in the analysis is from fivethirtyeight. The findings suggest Twitterwonk can be used as a substitute to traditional national polling methods, which are resource intensive and delayed in reporting. While national polls are not always predictive of state primary results, they play a big role in media coverage. 

Gauging US Politics with Reddit

Reddit is an entertainment, social networking, and news site where registered users can vote submissions up or down in a bulletin board-like fashion . Content entries are organized by areas of interest called “subreddits.” This post uses subreddits /r/Republican and /r/Democrats to analyze US Politics as of July 22, 2015.

Thanks to Dr. Randal Olson and his reddit-analysis script, we crawled /r/Republican and /r/Democrats. Making word clouds, we visualize word frequency, largest to smallest by count.

/r/Republican

redditrepublicanwordcloud

/r/Democrats

redditdemocratswordcloudThe word clouds provide a high level view of the subreddits. Now let’s dive in to gain insight!

/r/Republican has 16,942 readers, and /r/Democrats has 15,152.

During the timespan 6/22/15 – 7/22/15, 86,609 words appeared in /r/Republican and 73,156 words appeared in /r/Democrats. We will compare word frequency as % of total. In the event of significant difference, the greater of the two will be bolded.

/r/Republican % of total /r/Democrats % of total
“Good” 0.11 0.20
“Bad” 0.06 0.10
 /r/Republican % of total  /r/Democrats % of total
“Love” 0.05 0.05
“Hate” 0.05 0.05
 /r/Republican % of total   /r/Democrats % of total
“GOP” 0.07 0.27
“Fox” 0.02 0.08
 /r/Republican % of total  /r/Democrats % of total
“Trump” 0.30 0.15
“Hillary” 0.07 0.28
  /r/Republican % of total  /r/Democrats % of total
“Obama” 0.12 0.18
“Bush” 0.07 0.13
 /r/Republican % of total  /r/Democrats % of total
“Country” 0.11 0.14
“States” 0.14 0.07
 /r/Republican % of total  /r/Democrats % of total
“Students” 0.04 0.00
“School” 0.04 0.02
 /r/Republican % of total  /r/Democrats % of total
“Gay” 0.06 0.09
“Marriage” 0.10 0.13
 /r/Republican % of total  /r/Democrats % of total
“Inequality” 0.01 0.02
“Equality” 0.00 0.03
 /r/Republican % of total  /r/Democrats % of total
“White” 0.06 0.10
“Black” 0.04 0.04
 /r/Republican % of total  /r/Democrats % of total
“Health” 0.02 0.10
“Insurance” 0.03 0.05
 /r/Republican % of total  /r/Democrats % of total
“Workers” 0.02 0.04
“Unions” 0.05 0.01
 /r/Republican % of total  /r/Democrats % of total
“Gun” 0.05 0.04
“Control” 0.03 0.10
 /r/Republican % of total  /r/Democrats % of total
“Minimum” 0.02 0.06
“Wage” 0.02 0.08
 /r/Republican % of total  /r/Democrats % of total
“Church” 0.06 0.01
“Religion” 0.01 0.01

While we’ll let you come to your own conclusions, here are the insights we found surprising:

  • Greater Frequency of “GOP” and “Fox” in /r/Democrats
  • Greater Frequency of “Students” in /r/Republicans
  • Greater Frequency of “White” in /r/Democrats
  • Greater Frequency of “Union” in /r/Republicans

That’s it for now. Please comment with additional insights or reach out directly at:

andrewshamlet@gmail.com