Premier on Statistical Assumptions (long, but not as long as some others!)

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 01:42 PM
Original message

Premier on Statistical Assumptions (long, but not as long as some others!)

Whew, I'm going to try this in the hope of helping. If not, then you can always delete it!
-----------------------------------------------------
What do "statistical probabilities tell us?"

Usually, the odds that something happened so that we can make an informed decision.

If I had 10 people in a room, and everyone put in a check for all they had in the bank, and we drew one name to receive all the money - would you participate?

The odds are easy to compute IF we ASSUME some things. Each person gets their name in the hat ONCE. We will only draw a name out ONCE. Given the assumptions, your odds are 1 in 10 that you will win the hat of checks in one drawing.

Would you play? That depends on IF you are overdrawn or your account includes money that Uncle B. Gates just left you ($100,000,000). The ODDS, even if they meet the assumptions and are calculated precisely don't change what you think is important...they just inform the decision.

Let's say that we could not compute the exact odds because we didn't meet the assumptions: maybe you had the possibility of putting your name in TWICE or maybe there were THREE drawings and the first two names were left out, but the THIRD name won the money. Even if you weren't exactly sure what the odds of winning were, your decision to play is probably as dependent on the value to you as much as the exact odds!
------------------------------------------------------
What about TIA, Febble, OTOH, and similar debates?

More complicated statistical methods have lots of fancy assumptions similar to the hat example. In actual social science research, very few "quasi-experimental" designs (polls where the data comes from those who chose to participate) and lower quality measures ("Do you agree" questions which are not interval level like a ruler with inch marks) virtually NEVER meet ALL the assumptions. Fortunately, the exact computation of the odds is often the best we can do, but we think it is good enough (called robustness) to make the informed judgment with confidence. TIA depends somewhat on robustness when he claims the "law of large numbers" or "central limit theorem".

IF a conservative analysis (Feeble indicates that the evidence is 12 to 1 that pre-election polls differed from the actual election more than chance) or liberal analysis (TIA says the odds are 65,000,000 to 1) use different assumptions that are actually unknown, THEN it may sway your level of confidence. On the other hand, IF you don't think that there should be ANY difference that could ever occur that indicates stealing an election, then even the lower odds are good enough to raise hell. It depends on your values. If Febble or OTOH want to wait until there is absolute assumption compliance, likely it will never happen from the methodology used in surveys and polls. They may get closer with more sophisticated (powerful) techniques (meta analysis for effect sizes, SEM's, and multivariate correlations), but even then, it's throwing out the baby with the bath to expect social science research to follow all the conventions of laboratory math - and Febble correctly suggests this on many occasions! She is taking a "conservative" approach to the math, but admits that some leeway is granted to logical experience and observation.
-------------------------------------------------------
What can be done to meet the "assumptions" and use polls to indicate problems in the actual votes IF they are there?

The best way (in Sancho's opinion) is not to argue about which assumptions of which statistical technique are "met"? The control over the questions asked and the sampling designs are in the hands of the various pollsters. The pollsters COULD help improve the process by increasing data collection for key elections, asking the most valid questions on pre-election polls and exit polls, and sharing profusely. They harm the process by being secretive, using manipulations that appear to meet assumptions that they really didn't, and failing to broadcast the changes in the process that would improve the accuracy. IF there appears to be an issue with poll data, the pollsters should try to fix it, tell us what they intend to do to improve the process, and see if it works. Maybe the pollsters say they are doing that, but it doesn't seem like a sincere effort to me at this point when you lock yourself up and don't over sample in Florida's District 13 in 2006!

OTOH (pun intended), those performing analysis would benefit from keeping it simple and avoiding the debates of "sophisticated" theories that are hard to confirm (reluctant responder or gender-based interviewing or vague questions). Those performing analysis may want to be conservative on the probabilities they claim, but make a serious effort to describe up front what the observed discrepancy indicates and why it logically shows something that could not be happening by chance. If an analysis is supposed to meet some assumptions, they can report it, but most people don't care. Even the pure statisticians realize that often the assumptions are violated and there is little we can do about it. We do know that fancy attempts to guess what would have happened if there was a perfect analysis rarely work well. I think EDA's report is closer to this style than TIA's, but there's always room for improvement.

We can set up our own DU polls, but often the infrastructure and experience isn't there, so it might be easier to convince the pollsters to do MORE than describe why people voted, but also help out with the evidence that the election was consistent with the reported outcome! Then assumptions would not matter, because the evidence would be likely become overwhelmingly obvious.

------------------------------------------------------
What do we know now about the elections because of the debates over polls and assumptions?

Even though the exact odds and mathematical assumptions have not been available, we've had 6 years of one analysis after another that suggest a difference between various polls and the posted election results. If the odds were computed that you would win the hat game 1 out of 10 times, but you weren't sure if those odds were precise; and you played 200 times but you never won, you may wonder IF it mattered what the odds were...it's time to quit playing this game!

It's not that TIA or EDA have met the "assumptions" that demonstrate a particular confidence...it's the fact that many polls, precincts, and questions asked on the polls seem inconsistent with the actual election in many unlikely ways - seeming to favor a particular direction in systematic patterns, or outside of any possible expected error in others. Febble and I also agree that going after the obvious problems would be a good use of time and energy. In many cases, TIA or EDA type of reports help focus on important targets to investigate.

If you are a pure statistician at heart, then jump in there and inform others how to "fix" the assumptions WITHOUT a process change by the pollsters. If it can't be "fixed" to satisfy assumptions from today's available data, then direct the email at the pollsters to do better, or let's find a poll process for 2008 that we can rely on!

Printer Friendly | Permalink | | Top

Febble

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 01:50 PM
Response to Original message

1. Oh, for goodness sake, Sancho

I didn't make a "conservative assumption". I simply showed the amount of variance in the data. In other words, I did not start by assuming that the only variance was sampling variance, I actually demonstrated that there was a heck of a lot of non-sampling variance.

My only "assumption" was actually a non-conservative assumption - that the non-sampling error had a normal distribution and a mean of zero. We have no way of knowing that. All we know is that some of the polls were biased.

TIA is not making less "conservative" assumptions than I am. His own data are enough to demonstrate that his assumptions are simply wrong.

Printer Friendly | Permalink | | Top

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:07 PM
Response to Reply #1

2. I think you chose a conservative analysis...

Meta analysis is on my list of conservative techniques...certainly not exploratory. I don't know about your tool kit.

Again, I don't think it helps to debate probabilities of null hypothesis testing, since ALL the polls fail to meet the most basic concept of interval level data to start with!

If you want to do power analysis for effect sizes, and examine each question one at time to scale them, that's for a different day.

Know one knows what the generalizability of all the generic polls to the actual election results is...so you don't know that TIA is "wrong".

You know that he doesn't have evidence that he met some mathematical assumption.

Do you know absolutely that the generic polls did or didn't reveal a difference from the election? Without regard to the probability?

I'd bet you'd say, "we don't know"! Ok, what can you do to fix it next time.

Printer Friendly | Permalink | | Top

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:14 PM
Response to Reply #1

4. You're probably correct to say...

You chose to ignore some mathematical assumptions and TIA is content to ignore others...both coming from different perspectives.

Neither "meet" the total set of mathematical assumptions...

Printer Friendly | Permalink | | Top

Febble

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:28 PM
Response to Reply #4

6. What assumptions

did I "ignore"? I am not accusing TIA of "ignoring assumptions". I am accusing him of making untenable assumptions.

The between poll variance is enough to tell you, me or TIA that there was bias in some of the polls. And although we know there was bias in some, there is no way that we can know in which direction, or in which polls, the bias was.

What TIA is "ignoring" is not "assumptions" but the evidence in his own data that there was a large amount of non-sampling error in the polls.

Printer Friendly | Permalink | | Top

Febble

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:13 PM
Response to Original message

3. Where I agree with you

(or I assume you agree with me) is that once you move to a higher level of analysis, the variance at the lower level analysis becomes largely irrelevant. In fact, even if all the precinct level polls in an exit poll were within their own MoE, if more of them were "off" in one direction than the other, you'd know you had a problem, either in your poll or your count. Ditto with state polls. Even if the results were all within the MoE of each poll, if significantly more were off in one direction or another (one sample t test; chi square) then you'd know you had a problem. Both these things were true in 2004.

The other thing worth pointing out is that the more power in your study, the more power you will have to detect any effect, whether it is what you are interested in (e.g. fraud) or methodological problems in the study itself. Non-sampling error will tend to dominate the error variance in large surveys, where it may be swamped by sampling error in small surveys.

So the difficult part is not establishing that polls deviated from a count, but why. It is certainly perfectly plausible that polls should tend to have a systematic pro-Democratic bias, but we should not assume it and I, for one, don't). We should actually look at evidence.

And there is substantial evidence, including evidence from a recent survey of attitudes to exit polls, and from actual experiments with manipulated variables, that Democrats are more willing to participate in exit polls. This has nothing to do with prior "assumptions". The null hypothesis when testing for bias must be that the polls are unbiased. Similarly the null hypothesis when testing for fraud must be that the count is correct.

Printer Friendly | Permalink | | Top

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:18 PM
Response to Reply #3

5. Null hypotheses

I would perfer that election results are equal to poll results at the precinct level, without assuming either is "correct".

That's one reason why precinct level election results compared to precinct level polls is important. That's the most likely level of detectable effect size!

If they differ, the patterns will possibly reveal the "cause".

Printer Friendly | Permalink | | Top

Febble

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:31 PM
Response to Reply #5

7. Exactly.

Edited on Sun Nov-26-06 02:32 PM by Febble

We should not assume that either is correct. I don't. What makes you think that I do?

And I entirely agree with you about precinct level results and polls. We should not assume either are correct. And I don't.

And, indeed, the patterns did reveal the likely causes, which were largely selection bias in the polls.

edited for clarity

Printer Friendly | Permalink | | Top

Febble

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 02:49 PM
Response to Original message

8. Re my estimate

No, I don't argue that the odds are 12:1 that pre-election polls differed from the count by more than chance. What I mean is that the counted result was well within the confidence limits of any prediction made from those 12 polls, after taking into account the between-poll variance.

However, post hoc, we can say that the polls were significantly (p<.01 on a one sample t test) more pro-Democratic than the count. In other words, the polls alone tell us that some of the polls had bias; the result tells us that bias may have tended to be consistently pro-Democratic. But fraud is another possible inference.

However, we cannot infer fraud from a series of polls, some of which we know, a priori, are biased. There is absolutely reason to assume that bias in polls is likely to have a net effect across polls of zero. And a number of people have suggested reasons to think that generic polls might have a pro-Democratic "bias" when used to predict specific congressional races; moreover, polls that were conducted for specific races tended to result in predictions that were borne out by the result.

To me, that starts to stack the deck fairly firmly in favour of the inference of a net pro-Democratic bias in the generic polls when used as an estimator of House races.

Printer Friendly | Permalink | | Top

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 03:25 PM
Response to Reply #8

9. Along those lines...here's another reason that "assumptions don't matter"

We sort of agree. TIA's math overestimates, but maybe he has a point that the polls differ from the election. We agree that we don't often know why they differ and we can't infer fraud as the exclusive problem:

Example (as I think you indicate):
Sancho's generic polls: Election reports that Pubs win, 55% to 45%

Poll results:
#1 Demos 45% Pubs 55% (but this was because of hacking the machines in this district)
#2 Demos 45% Pubs 55% (actually, a biased question was culturally loaded for this interview of new citizens)
#3 Demos 45% Pubs 55% (the sample of 10 interviews was too small to represent 1 million voters)
#4 Demos 45% Pubs 55% (the cute interviewer volunteered to work at Muscle Beach, Florida!)
#5 Demos 45% Pubs 55% (in this district, the voters actually voted this way)
#6 Demos 45% Pubs 55% (the Conservative Christians didn't want others to know how they voted and lied)
#7 Demos 45% Pubs 55% (the lazy pollster didn't do a random poll, just went to the mall and interviewed)

Even if you test that the polls differ from the election results, and possibly use a test that "meets" assumptions, you conclude that the polls differ from the election. After all, the polls really are perfectly different than the election results, but you can't tell what happened! You might get hints and clues from different SE's or things, but...that is your issue with TIA. Even comparisons of different types of variance may not be much better at precise computation of a "level of signficance" because of all the different issues.

The only way to diagnosis is to see the poll matched with the group polled, interviewer characteristics, etc. - likely at the precinct level - and have access to the raw data and the election data (undervotes, demographics of voters, etc.).

If the pattern is obvious, then mathematical assumptions become less relevant.

What we have now are little bits and pieces of a mess, but no clear reason for the discrepancy. We need someone (likely the pollsters) to take the most likely problem districts and precincts and find out if there is #1 going on! The others are interesting, but #1 is the question that needs testing.

Printer Friendly | Permalink | | Top

Sancho

(1000+ posts) Send PM | Profile | Ignore

Sun Nov-26-06 03:27 PM
Response to Reply #8

10. Sorry if I misrepresented you conclusion...

we agree that we can't infer fraud from this level of analysis alone.

We also agree that patterns point us where to look.

Printer Friendly | Permalink | | Top

DU AdBot (1000+ posts)

Thu Apr 25th 2024, 03:23 AM
Response to Original message

Advertisements [?]

Top

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators

Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.