I was being conservative when I included ABC, Pew, and USA/Gallup, realizing that they were very suspicious late outliers.This is satire, right? Heck, just pick your five favorite polls from the bunch, today's "very suspicious" data was last week's
"1 in 76 billion". You can reach an infinite number of conclusions by shrinking your sample-of-samples on bizarre subjective grounds ("realizing they were very suspicious"), after the fact, which you've already tried:
After selectively accusing CNN, Gallup, AP, Pew, USAToday and ABC of bias, you proceeded to use their data when it fit
your conclusion, and dropped some of it when you wanted a
different conclusion. If that isn't the definition of cherry-picking, what is?
Test the hypothesis by collecting
more data to see if the hypothesis continues to show the assumed pattern. If the data does not support the hypothesis, it must be changed, or rejected in favor of a better one.
In collecting data, one must NOT ignore data that contradicts the hypothesis in favor of only supportive data. (That is called "cherry-picking" and is commonly used by pseudo-scientists attempting to scam people unfamiliar with the scientific method. <snipped crack at creationism> )
http://servercc.oakton.edu/~billtong/eas100/scientificmethod.htmWhich brings us to the real Central Limit Theorem:
The CLT states that if the sum of the variables has a finite variance, then it will be approximately normally distributed. Since many real processes yield distributions with finite variance, this explains the ubiquity of the normal distribution.From your Wiki link:
They all express the fact that any sum of many
independent identically distributed random variables will tend to be distributed according to a particular "attractor distribution". The most important and famous result is called The Central Limit Theorem which states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.
http://en.wikipedia.org/wiki/Central_limit_theoremA smattering of polls with unrelated wording and methodology aren't independent Bernoulli trials (read: "fixed-size") guided by the CLT, especially when n is a transient number depending on the day's cherry-picking algorithm, so in disregarding the "statistical fine print" you have your dilemma: billions to one probability of this OP being consistent with assumptions you presented
last week.
Do you have a problem with that? And I was being conservative when I used a 1.5% MoE. As for those "house effects", why don't you try to quantify them?I realize you were "being conservative", I got the string of zeroes by trying your less conservative assumptions. Quantifying the house effects of these 116 polls (or however many aren't "very suspicious") would be a major undertaking, but treating house effects in general as a conspiracy theory is easy to refute:
When combining polls from different survey organizations, house effects also are a problem. These effects represent the consequences of survey houses employing different methodologies, including survey design itself. Indeed, much of the observed difference across survey houses may reflect underlying differences in screening and weighting procedures. Results can differ across houses for other reasons, including data collection mode, interviewer training, procedures for coping with refusals, and the like (see Converse and Traugott, 1986; Lau, 1994; also see Crespi, 1988). Whatever the source, poll results can vary from day to day because polls reported on different days are conducted by different houses.
http://www.nuffield.ox.ac.uk/Politics/papers/2002/w27/wlezien.pdf