During the general election season, one poll has stood as an outlier from all the other polls, consistently stating that Donald Trump has led Hillary Clinton: the U.S.C. Dornsife/Los Angeles Times Daybreak poll.
There are two good reasons, both of them based on the poll’s weighting, as to why that poll has favored Trump while all the other polls showed Clinton with a small or moderate lead, as Nate Cohn delineated comprehensively. Cohn derived his position from studying the data set and the documentation necessary to replicate the survey offered by the pollsters. As Cohn pointed out, “Despite falling behind by double digits in some national surveys, Mr. Trump has generally led in the U.S.C./LAT poll. He held the lead for a full month until Wednesday, when Hillary Clinton took a nominal lead.”
Here are the two reasons for the poll’s seeming inaccuracy in its weighting:
1. The poll weights for many tiny categories; weighting for very tiny groups, which results in big weights. For example, instead of weighting for a small category such as like 18-to-21-year-olds, the poll weights for 18-to-21-year-old men, and even admits that group comprises roughly 3.3% of the general population. As Cohn points out, “But for those voters to make up 3.3 percent of the weighted sample, these 15 voters have to count as much as 86 people — an average weight of 5.7.” One 19-year-old black man in Illinois was “weighted as much as 30 times more than the average respondent, and as much as 300 times more than the least-weighted respondent. Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump’s margin by 1 point in the survey, even though he is one of around 3,000 panelists.” Cohn added that the reason Clinton was finally listed as leading Trump on Wednesday was that the young man was not included in the poll.
2. The poll weights by past vote. Cohn notes that the poll weights the sample according to how people said they voted in the 2012 election. This can be inaccurate because, as Cohn explains, “People don’t report their past vote very accurately. They tend to over-report three things: voting, voting for the winner and voting for some other candidate. They underreport voting for the loser.” Thus the poll included 27 Barack Obama voters and 25% Mitt Romney voters. Cohn states, “If the survey didn’t include a past vote weight, the past vote of its respondents would be Obama 38, Romney 30. This is a lot like national surveys that were published around the same time as the U.S.C./LAT poll, like those from NBC/WSJ or the NYT/CBS News. By emphasizing past vote, they might significantly underweight those who claim to have voted for Mr. Obama and give much more weight to people who say they didn’t vote.”
Cohn explains, “If the poll was weighted to a generic set of census categories like most surveys (four categories of age, five categories of education, gender and four categories of race and Hispanic origin), Mrs. Clinton would have led in every iteration of the survey except the period immediately after the Republican convention.”
“The U.S.C./LAT poll had terrible luck: The single most overweighted person in the survey was unrepresentative of his demographic group."
Cohn explains the the poll eschewed “trimming” the weights, meaning a poll preventing one person from being weighted up by more than some amount, like five or 10. In 2012, Gallup trimmed its weights, and nonwhite voters were underrepresented. Cohn further explains the risks of trimming or not trimming, that trimming the weights might not include enough of the voters who tend to be underrepresented, but eschewing trimming could mean a few heavily-weighted respondents could unbalance the poll.
But then he notes that the U.S.C./LAT poll is a panel — which means it recontacts the same voters repeatedly, thus leaving the poll at risk of both problems.
Cohn concludes of the single young black voter unbalancing the poll: “The U.S.C./LAT poll had terrible luck: The single most overweighted person in the survey was unrepresentative of his demographic group. The people running the poll basically got stuck at the extreme of the added variance. By design, the U.S.C./LAT poll is stuck with the respondents it has. If it had a slightly too Republican sample from the start — and it seems it did, regardless of weighting — there was little it could do about it.”