Saturday, November 12, 2016

Presidential Election and Statistics

Tuesday’s election results were full of surprises. The election results indicated that almost all polls were wrong this time. I view the polling as a survey statistics. the election results exposed how survey statistics could go wrong - badly. 

Before the election, each poll is considered as an independent survey – taking a sample from the the overall population and then trying to make the prediction about the population. For presidential election, we calculate the sample proportion (proportion of polling subjects who will vote for a candidate) and then try to predict the population proportion. Usually in statistics, when we do the sampling to predict the population, we will never know if our prediction is correct or not because the truth for the population will never be revealed or known. 

It is different for the presidential election. After the election, we know the truth and the truth will verify if all the polls are wrong or correct. Unfortunately, the polls are mostly wrong this time.

We still remember the famous statistician (even though he is actually not a statistician) named Nate Silver and his website fivethirtyeight.com. He became famous after he predicted correctly for 49 out 50 states in 2008 presidential election and 50 out 50 states in 2012 presidential election. In 2013, he was invited to give a keynote speech in annual Joint Statistical Meeting (the largest conference in statistics field).

Predicting correctly for 49 out of 50 and 50 out of 50 states sounds like a great feat, however, for majority of the states, anybody who pays a little bit attention to the presidential election will be able to predict the results correctly. For example, it will be pretty safe to put Texas, Indiana, Kentucky,… into the category of the red states and New York, California into the category of the blue states. In probability terms, I am willing to bet that Hillary will have 100% chance to win California and Trump will have 100% chance to win Mississippi.  There are actually less than 10 (or maybe even less than 5) states – so called battleground states – where the polling and prediction are critical. Predicting correctly for 49 out of 50 states may essentially be just predicting 4 out of 5 states. For 2016 election, the predictions are down to several battleground states such as Florida, North Carolina, Ohio, Michigan, Virginia,... - he got many of them wrong, especially in Michigan, Wisconsin, and Pennsylvania.  In the final poll prediction prior to the November 8 election, Nate Silver and fivethirtyeight.com predicted the following: 
"giving Clinton a 71.4% chance of winning, and predicting the former Secretary of State would end up with 302 electoral votes (270 are required for victory) and a 3.6 percentage point margin–48.5% to 44.9%–in the popular vote." 

Here is comparison of the final predictions from fivethirtyeight.com and the final results for all 50 states. The highlights in yellow are states with discordance (i.e., the prediction probability of Trump winning less than 50%, but Trump won; the prediction probability of Trump winning greater than 50%, but Trump lost):

State
Abbr.
Probability of Trump Winning
Actual Result of Trump Winning
AL
greater than 99.9%
Yes
AK
76.4%
Yes
AZ
66.6%
Yes
AR
99.6%
Yes
CA
less than 0.1%
No
CO
22.4%
No
CT
2.7%
No
DE
8.5%
No
FL
44.9%
Yes
GA
79.1%
Yes
HI
1.1%
No
ID
99%
Yes
IL
1.7%
No
IN
97.5%
Yes
 Iowa
IA
69.8%
Yes
KS
97.3%
Yes
KY
99.6%
Yes
LA
99.5%
Yes
ME
17.3%
No
MD
less than 0.1%
No
MA
less than 0.1%
No
MI
21.1%
Yes
MN
15.0%
Yes
MS
97.8%
Yes
MO
96.1%
Yes
MT
95.9%
Yes
NE
97.7%
Yes
NV
41.7%
No
NH
30.2%
No
NJ
3.1%
No
NM
17.2%
No
NY
0.2%
No
NC
44.5%
Yes
ND
97.7%
Yes
 Ohio
OH
64.6%
Yes
OK
greater than 99.9%
Yes
OR
6.3%
No
PA
23.0%
Yes
RI
6.8%
No
SC
89.7%
Yes
SD
93.9%
Yes
TN
97.3%
Yes
TX
94.0%
Yes
 Utah
UT
83.2%
Yes
VT
1.9%
No
VA
14.5%
No
WA
1.6%
No
WV
99.7%
Yes
WI
16.5%
Yes
WY
98.9%
Yes

 If we just look at the discordance: there is 6 out of 50 (12%) states with the prediction probability of Trump winning less than 50%, but Trump won; there is 0 out of 50 states with the prediction probability of Trump winning greater than 50%, but Trump lost.


Actual Results of Trump Winning
Yes
No
Probability of Trump Winning
According to Fivethirtyeight.com:
                      greater than 50%
25
0
                      less than 50%
6
19

Nate Silver is a democratic and he indicates that the party affiliation does not have any impact on his prediction. However, there may be unconscious biases in the prediction, at least this seems to be true based on the prediction and the actual election results for this election: for the discordance, always underestimated the probability of Trump winning (in all six states with discordance).  

Not sure what model is used in prediction for Nate Silver and his fivethirtyeight.com. The predictions by fivethirtyeitht.com is still better than other polls (even though the predictions this time are not as good as previous elections).  I see the similarities between his analysis and the meta analysis – the data analysis or modeling based on various sources of the polling data. The prediction is made in the format of probability of winning by each candidate based on the aggregate data from the meta analysis.


For Nate Silver and his fivethirtyeight.com, with three consecutive presidential elections (2008, 2012, and 2016), he got the first two right, but the third one totally wrong. This reminds us that it needs duplication and the verification to tell if a model is robustly correct. Remember what George Box said "essentially, all models are wrong, but some are useful"  



No comments: