The paradox of big data spoils vaccination surveys
Turns out that big data is often poor quality data.
From phys.org:
When Delphi-Facebook and the U.S. Census Bureau provided near-real time estimates of COVID-19 vaccine uptake last spring, their weekly surveys drew on responses from as many as 250,000 people.
The large data sets provided statistically tiny margins of error, a key measure of a poll's accuracy, and raised confidence that the numbers were correct. But when the Centers for Disease Control and Prevention later provided figures of actual reported vaccination rates, the two polls were offby a lot. By the end of May, the Delphi-Facebook study overestimated vaccine uptake by 17 percentage points70 percent versus 53 percent, according to the CDCand the Census Bureau's Household Pulse Survey did the same by 14 percentage points.
A comparative analysis by statisticians and political scientists from Harvard, Oxford, and Stanford universities concludes that the surveys fell victim to the "Big Data Paradox," the mathematical tendency of big data sets to minimize one type of errorthat due to small sample sizebut to magnify another that tends to get lesser attention: errors due to systematic biases that make the surveyed sample a poor representation of the larger population.
The "Big Data Paradox" was identified and coined by one of the study's authors, Harvard statistician Xiao-Li Meng, the Whipple V.N. Jones Professor of Statistics, in his 2018 analysis of polling during the 2016 presidential election. Famous for predicting a Hillary Clinton presidency, those election polls were skewed by what is termed "nonresponse bias," which in this case was the tendency of Trump voters to either not respond or define themselves as "undecided."
more ...