When to Sweat it: The Real Accuracy of Election Polling
Behind the stats: An introduction to confidence intervals and how these impact the validity of the trends reported
What a Sunday. In an unprecedented historic moment, Joe Biden became the first sitting president since 1968 to withdraw from the race, endorsing his VP, Kamala Harris as the Democratic nominee. From volatile voter sentiment stemming from this chain of events and an invigorated Democratic party, to last-minute changes of heart, this presidential race will be quite interesting to follow. From here through the November presidential election expect to see polls, lots of polls. Also, expect to see wide variability amongst them.
The overestimation of Democratic support in the 2016 election cycle has led to significant interest in statistical bias, particularly regarding whether polls are now systematically biased against Republicans. The 2016 election harshly highlighted that polls may lack both accuracy and precision, and in today’s newsletter we cover some of the reasons why.
First, each poll is designed differently and subject to different biases and, consequently, weighing corrections for those. Sampling biases, question wording and order, respondent demographics, survey mode (phone, internet or in-person) and response rates all influence the results. For reference, sampling is a process used to select a subset of individuals, items, or observations from a larger population to estimate the characteristics or parameters of that population. This allows researchers to make inferences about the entire population without surveying every single member, which is impossible (and impractical).
Second, voter attitudes shift over time, frequently reflecting current notorious events. Non-response bias results if certain groups of people are less likely to respond to polls and this is not adjusted for.
Lastly, every poll has a margin of error (MOE), meaning the true value lies within a range. Different polls might fall within different points of this range, leading to variations. Even with proper sampling techniques, there is always some level of random error due to the natural variability in samples. It is important to understand the MOE within the context of the same poll and different polls, as it ultimately tells us how reliable the poll results are. In the end, we care to know about the difference between two proportions (or percentages) or levels of support between the two leading candidates and whether we can trust the difference is real and not just statistical noise.
How accurate are the polls?
Polls conducted early in a campaign are often less predictive of the outcome. Even those taken just days before an election can struggle to account for late momentum shifts and the decisions of undecided voters, who can be numerous enough to alter the results. I expect to see this if Kamala becomes the Democratic nominee. We shouldn’t underestimate how many voters might swing left just to see the first woman president in their lifetime (and no, they aren’t just voting for her because she is a woman, but rather because she is apt to lead and isn’t a convicted felon) or sadly, the contrary sentiment.
A study analyzing over 1,400 polls from 11 election cycles discovered that only 60% of these polls correctly included the actual outcome a week before elections. Why is that? To answer that we must understand confidence intervals.
A simple example
Let’s suppose we want to examine the heights of 6th-grade girls. We randomly sample the population and find an average height of 59 inches.
This 59-inch average is a point estimate of the population mean. However, a point estimate alone does not indicate the uncertainty of this estimate; it does not provide a sense of how close this sample mean might be to the actual population mean. What is missing is an understanding of the uncertainty in this single sample.
Confidence intervals help give us that insight. A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. It is used in statistics to indicate the reliability of an estimate. It helps us answer the question of “How likely is it that our population parameter (height) is within the interval if we randomly measure samples many times?” Unlike a point estimate, an interval estimate provides a range of values within which the parameter is expected to lie.
Definitions:
A confidence interval is a range of values, bounded above and below the statistic's mean, that likely would contain an unknown population parameter.
Confidence level refers to the percentage of probability, or certainty, that the confidence interval would contain the true population parameter when you draw a random sample many times.
Most data follows a normal distribution. A normal distribution, often referred to as a bell curve, is a way to describe how a set of data is spread out. Here's a simple way to understand it:
Following our example above, imagine you're looking at the heights of a large group of people. Most people are of average height, but some are shorter and some are taller. If you were to chart everyone's height, you'd see most of the data cluster around the middle range (the average height), with fewer people being extremely tall or extremely short. This pattern of clustering around the middle and tapering off at the ends forms a shape that looks like a bell, which is why it's called a bell curve.
Key Characteristics of a Normal Distribution:
Symmetry: The left side of the distribution is a mirror image of the right side. If you split the bell curve down the middle, both halves would match perfectly.
Peak in the Middle: The highest point of the bell curve represents the most common value, or the average. This is where you find the typical case or the norm.
Spread: The width of the bell curve shows the range of variability or how much the data points differ from the average. A wider curve means more variation in the data, while a narrower curve means the data is more tightly clustered around the average.
Tails: The ends of the curve, known as tails, extend infinitely in both directions, indicating that there are always a few values that are much higher or lower than average, although they are very rare.
This distribution is very common in nature and statistics because it accurately represents many real-world phenomena where most of the observations cluster around a central value with fewer and fewer occurrences toward the "extremes" or the tails of the distribution.
Within the context of election poll results, the bell curve represents the mean level of support for a candidate, based on poll data. For example, if a candidate has an average support of 40%, this would be the center of the bell curve. The width of the bell curve is determined by the standard deviation, which measures the variability or spread of the data. A narrow bell curve indicates that most poll results are close to the mean, while a wider curve shows more variation in voter preferences. The bell curve illustrates the probability distribution of possible outcomes. The area under the curve represents the probability that a candidate's support will fall within a certain range.
Confidence intervals provide a range around the mean poll result within which the actual level of voter support is likely to fall. They offer a measure of the uncertainty or margin of error in the poll results. A 95% confidence level means that if the same poll were conducted 100 times, the true support level would fall within the confidence interval in 95 of those polls. For example, if a poll shows a candidate has 52% support with a 95% confidence interval of ±3%, the true support is likely between 49% and 55%. The confidence interval gives a range of likely outcomes, providing a clearer picture than a single-point estimate. For example, knowing that a candidate's support is between 49% and 55% is more informative than just knowing it is 52%.
The usual “margin of error” or confidence interval (CI) for a poll represents the 95% confidence level for an individual percentage of candidate support. First, let’s calculate the variance for a given level of support for a candidate. In statistics, variance is a measure of the dispersion or spread of a set of data points around their mean (average) value. The formula for the variance of a proportion (here this is the percentage level of support), p, is given as:
p = true proportion of population individuals with the property (supporting the candidate)
n = sample size.
Typically n in the denominator (number of people polled) is rather large across survey samples. As such, the difference in n and n − 1 is trivial and one reduces the denominator to n. The standard error for the proportion is therefore:
Notice that the proportion variance, p * (1 – p), appears in the standard error formula. As the proportion, p moves from 0.5 to zero or 1, the proportion variance decreases:
This means that standard error will also decrease because the variance is in the numerator. The 95% confidence interval or MOE of the poll is ±1.96 × se(p) (standard error), using the normal distribution approximation for large samples. The standard error depends on the proportion, p, and is at a maximum for p = .5, so a quick approximation of the widest confidence interval for a single proportion is:
This is usually what is reported as the margin of error for a poll. For example, if n = 400, 1/ √ 400 = .050, a MOE of ±5%. For n = 625, the MOE is 1/ √ 625 = .040 and for n = 1111, the MOE is 1/ √ 1111 = .030. Most polls have at least 1000 participants, setting the MOE to ~3%.
Now, here is where the polls may be misleading. Professor Don Moore from the Haas School of Business studies confidence and the calibration of statistical confidence judgments. He found that people often provide confidence intervals that are too narrow (leading to overconfidence), with true values falling within these intervals far less frequently than expected. A typical election poll with a 95% confidence interval and a sample size of 1,000 estimates a candidate’s vote share with a margin of error of plus or minus three percentage points. But, if a poll indicates Kamala Harris will receive 52% of the popular vote and she gets 56%, the actual result would fall outside this poll's margin of error.
More importantly, however, headlines fail to discuss how much the results can be trusted. In horse race polls, it's important to determine the difference in support between the top two candidates (and potentially other pairs). We need the confidence interval for this difference to assess whether the lead is statistically significant or "outside the margin of error." Confidence intervals allow for meaningful comparisons between candidates. If the intervals for two candidates overlap significantly, it suggests that the race is close and the difference in support is within the margin of error, and thus not statistically significant.
News reports often state a candidate’s lead is “outside the margin of error” to signify a lead greater than expected sampling error. However, simply leading by more than the margin of error for individual candidates (e.g., 3 points) is insufficient. We must calculate a new margin of error for the difference between candidates' support levels, typically about twice the individual margin (e.g., 6 points). This accounts for potential overestimations in one candidate's share likely causing underestimations in the other's.
The correct formula for the variance of the difference of two multinomial proportions for candidates 1 and 2, p1 and p2, i one poll is actually given by:
The 95% confidence interval (“margin of error”) for the difference of proportions is therefore:
When there are few undecided or third-party supporters, the margin of error is close to "twice the margin of error." However, as the number of "other" responses increases, the correct margin of error will be slightly less.
For example, in Poll A depicted below, a 3-point margin of error per candidate translates to roughly a 6-point margin for the difference. Hence, a 4-point Republican lead (50 vs 46) suggests their true lead could range from -2 to +10 points (4 +/- 6). To be confident the lead isn’t due to sampling error, it must be 6 points or more so this lead isn’t very statistically significant.
In Poll B, also with a 3-point individual and 6-point difference margin of error, an 8-point Republican lead is significant enough to likely surpass a sampling error.
If we use the formulas above to calculate the exact MOE between candidates for Polls A and B, we would find the values to be 6.1% for Poll A and B are 5.9%. Quite close to 2 times the 3-point margin of error per candidate!
Confidence intervals reported by most political pollsters typically account for statistical errors like sampling issues. However, non-statistical errors can also explain the gap between the reported 95% confidence and the actual 60% accuracy. Gabriel Lenz, a professor at Berkeley specializing in American politics, suggests that one such error could be herding, where pollsters adjust their results to align with other recent polls and avoid outliers, often by mishandling them. Additionally, Lenz points out that likely voter screenings, which determine if someone is likely to vote and should be included in the survey, can introduce errors since some individuals labeled as unlikely to vote end up voting anyway.
With poll numbers changing constantly, how can we distinguish real change from statistical noise?
With new polling numbers released daily, media reports often describe a candidate’s lead as changing from poll to poll but distinguishing real change from statistical noise is challenging, especially because different polls are designed differently and likely have different biases and weighing adjustments correcting for those (see below).
What we care to determine is if there has been a statistically significant change in opinion from one poll to the next. That is, how has voter sentiment changed? Unlike comparing results within a single poll, the percentage of support for a candidate in one poll is independent of their support in the other poll.
The difference of interest is now p2−p1, where subscripts 1 and 2 refer to polls 1 and 2, respectively, and we are evaluating the support for the same candidate in both polls. The variance of this difference with independent samples is:
The margin of error now becomes:
Now consider the difference between two independent polls above, A and B. Is the difference between Democratic support, from 46% to 42% statistically significant? The margin of error between the two polls, given that support is 46% in Poll 1 and 42% in Poll 2 with 1000 respondents each, is approximately 4.3%. If the change in percentage points observed between polls is less than their margin of error, as in this example (4 < 4.3 ), the change in voter support or approval rating is not quite statistically significant. Breathe.
What else impacts the validity and reliability of polls (and the headlines around them)?
Because it is impractical to poll every citizen, a poll must sample from the population. This means that a poll's validity and reliability are impacted by how well it samples and adjusts for demographic differences between those polled and the population per se. Gauging voter sentiment is extremely challenging. Below is a comprehensive list of factors that can skew a poll outcome. It is worth noting that these factors become even more critical when looking at polls in battleground states, where small polling errors can shift the results.
Sampling Errors
Small Sample Size: Smaller samples can lead to greater variability and less reliable results. This may mean statistics from sparsely populated counties are unreliable. I explain why here.
Non-Random Sampling: If the sample is not truly random, it may not be representative of the population and will exhibit a sampling bias, where certain groups may be overrepresented or underrepresented due to the sampling method or based on polling method (internet, versus phone or landlines).
Inadequate Coverage: Failure to include certain segments of the population, such as those without internet access or phone lines.
Non-Response Bias
Low Response Rate: A low response rate can result in a non-representative sample if the respondents differ significantly from non-respondents.
Voluntary Response Bias: Individuals with strong opinions are more likely to respond, skewing the results.
Questionnaire Design
Leading Questions: Questions that suggest a particular answer can bias responses.
Complex Wording: Difficult or confusing questions can lead to misunderstanding and inaccurate responses.
Order of Questions: The sequence in which questions are asked can influence responses.
Timing of the Poll and Longitudinal Changes
Election Dynamics: Polls conducted too early may not capture late changes in voter sentiment. Voter preferences can change rapidly, making it difficult to capture accurate trends.
External Events: Significant events occurring after the poll can render its results outdated. This just happened!
Weighting and Adjustment
Incorrect Weighting: Applying incorrect weights to adjust for demographic imbalances can skew results.
Post-Stratification: Adjustments made after data collection can introduce new biases if not done correctly.
Social Desirability Bias
Honest Responses: Respondents may give socially desirable answers rather than their true opinions, particularly on sensitive issues. In 2016 many attributed the polls’ lack of accuracy to people not willingly admitting they would vote for Donald Trump. I wouldn’t be surprised if we see the opposite - folks in deep Republican rural areas voting for the Democratic Nominee (even more so now).
Mode of Data Collection
Different Modes: Telephone, online, and face-to-face polls can yield different results due to mode-specific biases.
Interviewer Effect: The presence of an interviewer can influence responses.
Geographic Variability
Regional Differences: Differences in voter preferences across regions may not be accurately captured if the sample is not geographically representative.
Urban vs. Rural: Differences in political views between urban and rural areas can affect overall poll results. Rural areas are more sparsely populated and thus more likely to report extreme results due to smaller sample sizes.
Misleading Reporting
Headline Sensationalism: Media may focus on sensational aspects or extreme results, ignoring the nuances of the data.
Cherry-Picking Data: Selectively reporting results that support a particular narrative or agenda.
External Influences
Media Coverage: Media coverage of polls can influence public opinion and subsequent polling results.
Strategic Voting: Voters may change their preferences based on the perceived viability of candidates as reported in polls.
Which polls to trust?
While polls aren’t perfect, I do like Fivethirtyeight’s assessment that the best way to gauge a poll’s accuracy is by examining its absolute error, which is the difference between the poll’s predicted margin and the actual election margin (for the top two finishers). For instance, if a poll showed the Democratic candidate leading by 3 percentage points but the Republican won by 2 points, the poll would have a 5-point error.
The real value of polls lies not in predicting the winner, but in indicating how competitive a race is and, consequently, how confident we should be about the outcome. Historically, candidates with a lead of 20 points or more in polls have won 99% of the time, whereas those with a lead of less than 3 points have won only 55% of the time. Essentially, races with less than a 3-point lead in the polls are nearly as uncertain as a coin toss.
Lastly, for the presidential election, we want to pay close attention to voter attitudes and voter turnout around key battleground states and across subgroups of populations, which a National Average may not reflect. Survey estimates for subgroups of the population have larger margins of error due to the smaller number of cases, and in some instances, these margins can be significantly larger.
Next Up: I will discuss voter turnout, population statistics in key battleground states, and gulp…predicting outcome (the day of).
If you would like more information on election polling statistics, please check this out.