When the Numbers Lie
Should we have been quite so stunned with Trump’s victory?
Not only is it a brave new world in presidential politics, but the polling profession is facing an overhaul as well.
The headlines say it all:
- Donald Trump is Elected in Stunning Repudiation of the Establishment (New York Times)
- Trump Confounds the Pros, Connects with Just the Right Voters (NPR)
- Donald Trump Victory Shocks World, Pleases Putin (CNN)
- World Gasps in Collective Disbelief Following Trump’s Election (Washington Post)
- Trump’s Win Turns Stock Market into Shock Market (CBS News)
Here’s the thing, the U.S. presidential elections are some of the most-polled events in the world. There have been weekly polls running for more than 18 months, not to mention custom studies conducted by expert specialist firms. These polls measure from every angle: age, geography, gender, race, socioeconomic status, education level, profession, the level of detail is impressive.
Except that the polling misled us all.
Where were the numbers derailed? What did the polling actually measure? Millions of Americans didn’t make an off-hand decision to vote Trump as they pulled up to voting stations, so why did this groundswell of support go undetected?
Here are leading factors that can undermine the strength of polling models, and how the polling profession should think about shoring them up.
Measuring the absence of something
In brand, or reputation, analytics one of the most difficult aspects is to quantify the absence of data. To what degree are people not talking about something?
Take care to look for false indicators and deliberately include data sets that can provide a “gut check” measure.
Strength of conviction
How strongly do people feel about the statements you are tracking. Is it “hell or high water” support, or “if I don’t have to wash my hair” support?
Including threshold questions that serve to weight responses can go a long way toward assessing strength of conviction.
Sourcing and representation
True representation is difficult to achieve. We are looking at a widening gap in the ways that different generations, regions and occupations are reachable for polling. When the pool you are sampling is less and less representative, there is already bias built into your model.
On this issue, the polling profession must take a hard look at how survey models are constructed and develop new ways to ensure that they are making a best effort to capture a cross sample of the population.
Sampling error
It is important to accurately assess your error margins. People forget that there is room for error and take polling numbers at face value.
A suggestion is to report a range that includes the error margins along with the percentages. Overlaying a range provides a quick assessment of how close poll results might be among the given options.
The human factor
People, you just can’t predict what they will do. In this case it appears as though people were thinking different thoughts than they were saying out loud, or admitting in polling.
This may be the most tricky factor and there is no foolproof method. At the very least, questions should be constructed in a way that makes the respondent feel that any answer is equally valid.
The past performance fallacy
There is a reason why financial prospectuses always use the caveat “past performance does not guarantee future results.” Much of the groundwork of polling models relies on past performance data.
While some past data is necessary to construct forward-looking projections, polling creators must go the extra mile to create alternative scenarios that model variables that may perform differently at present than they did during past iterations.
Everyone is like you bias
Here’s where we look at the punditry and media personalities and hope that they understand the degree to which they have become disconnected from significant segments of Americans. When you mainly talk to people who are like you, your environment becomes a false sample of how the larger group thinks.
Talk to people who are different than you – we’re looking at you social media! But seriously, take polling numbers as an objective assessment. It is not appropriate to dismiss numbers that you don’t agree with simply because you do not believe that they can be true.
Missed indicators
This is not to say anyone should have seen it coming, but at the same time it would be a good exercise to look at proxy indicators to understand other sources that would build a more complete picture of activity.
Any data set should be matched with other data sources to ensure that they tell the same story.
As the polling profession faces an uphill battle to regain its credibility, this is a great opportunity to include new analytics techniques that can help echo-chamber proof techniques as we look to the future.
What do you think about the future of polling?
Let Us Know in the Comments Section Below!