Democratic Underground - the solution is more cherry-picking?

I was being conservative when I included ABC, Pew, and USA/Gallup, realizing that they were very suspicious late outliers.

This is satire, right? Heck, just pick your five favorite polls from the bunch, today's "very suspicious" data was last week's "1 in 76 billion". You can reach an infinite number of conclusions by shrinking your sample-of-samples on bizarre subjective grounds ("realizing they were very suspicious"), after the fact, which you've already tried:

No, I leave them out because they are BIASED for Bush.

Why include them if they skew the averages against Kerry?

These organizations <FOX, CNN/Gallup, AP> are notoriously pro Bush. They constantly prop him up and thrash the Democrat, whether it was Gore in 2000 or Kerry today.

Taking them out of the averages can only improve the forecast.

http://www.democraticunderground.com/discuss/duboard.php?az=show_topic&forum=104&topic_id=1998204#1998844

No, I leave them out because they are BIASED for Bush.
Why include them if they skew the averages against Kerry?

http://www.democraticunderground.com/discuss/duboard.php?az=show_topic&forum=104&topic_id=1998204#1998844

That's why I stick with Zogby and ARG, etc. I've stopped blindly incoporating the latest poll that Votemaster puts up - unless t makes sense. Call me a cherry-picker. I throw away the bad ones. That's why my numbers seem so far out there for Kerry.

http://www.democraticunderground.com/discuss/duboard.php?az=show_topic&forum=132&topic_id=1039319#1041539

After selectively accusing CNN, Gallup, AP, Pew, USAToday and ABC of bias, you proceeded to use their data when it fit your conclusion, and dropped some of it when you wanted a different conclusion. If that isn't the definition of cherry-picking, what is?

Test the hypothesis by collecting more data to see if the hypothesis continues to show the assumed pattern. If the data does not support the hypothesis, it must be changed, or rejected in favor of a better one. In collecting data, one must NOT ignore data that contradicts the hypothesis in favor of only supportive data. (That is called "cherry-picking" and is commonly used by pseudo-scientists attempting to scam people unfamiliar with the scientific method. <snipped crack at creationism> )

http://servercc.oakton.edu/~billtong/eas100/scientificmethod.htm

Which brings us to the real Central Limit Theorem:

The CLT states that if the sum of the variables has a finite variance, then it will be approximately normally distributed. Since many real processes yield distributions with finite variance, this explains the ubiquity of the normal distribution.

From your Wiki link:

They all express the fact that any sum of many independent identically distributed random variables will tend to be distributed according to a particular "attractor distribution". The most important and famous result is called The Central Limit Theorem which states that if the sum of the variables has a finite variance, then it will be approximately normally distributed.

http://en.wikipedia.org/wiki/Central_limit_theorem

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

http://en.wikipedia.org/wiki/Normal_distribution#The_central_limit_theorem

The Central Limit Theorem tells us, quite generally, what happens when we have the sum of a large number of independent random variables each of which contributes a small amount to the total.

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter9.pdf

Some students having difficulty reconciling their own understanding of the central limit theorem with some of the textbooks statements. Some textbooks do not emphasize the independent, random samples of fixed-size n (say more than 30).

http://academic.cmru.ac.th/phraisin/teaching/4112105/home_ubalt_edu_ntsbarsh_Business-stat_opre504.pdf

A smattering of polls with unrelated wording and methodology aren't independent Bernoulli trials (read: "fixed-size") guided by the CLT, especially when n is a transient number depending on the day's cherry-picking algorithm, so in disregarding the "statistical fine print" you have your dilemma: billions to one probability of this OP being consistent with assumptions you presented last week.

Do you have a problem with that? And I was being conservative when I used a 1.5% MoE. As for those "house effects", why don't you try to quantify them?

I realize you were "being conservative", I got the string of zeroes by trying your less conservative assumptions. Quantifying the house effects of these 116 polls (or however many aren't "very suspicious") would be a major undertaking, but treating house effects in general as a conspiracy theory is easy to refute:

When combining polls from different survey organizations, house effects also are a problem. These effects represent the consequences of survey houses employing different methodologies, including survey design itself. Indeed, much of the observed difference across survey houses may reflect underlying differences in screening and weighting procedures. Results can differ across houses for other reasons, including data collection mode, interviewer training, procedures for coping with refusals, and the like (see Converse and Traugott, 1986; Lau, 1994; also see Crespi, 1988). Whatever the source, poll results can vary from day to day because polls reported on different days are conducted by different houses.

http://www.nuffield.ox.ac.uk/Politics/papers/2002/w27/wlezien.pdf

Reply #87: the solution is more cherry-picking? [View All]