One approach to deal with the systematic error would be to ask the respondents how they voted in a previous election. In the run-up to the Presidential 2020 election, the most relevant past election would be the 2016 Presidentail election. We know the result of that past election election, if the polster were to use the results from his respondents to estimate that past election, he could then compare that with the actual result and he could see if he has big error. If a methodology can't predict the past, then it will struggle with predicting the future.
We can make an adjustment to the weighting of each respondent so that we (more or less) recover the previous election result. We can then apply that same adjusted weighting to the respondents' answers about who they will vote for in the next election. This adjustment will work well if for example there are many people who are reluctant to admit that vote for Trump, but when they have anonimity in the real vote, they do so.
A polster may have his own methodology to obtain his results from his data, but the approach outlined here may be useful when working out the confidence interval. Rather than using one model, with its assumptions, to work out the confidence interval, it may be wise to consider more than one set of modeling assumptions and then report a confidence interval that is wide enough to include the results from all the different models.
Now, to delve into some maths. I'm going to use the same notation as I used in my earlier post. After we've carried out a pole. We convert the answers to probabilities. We'll use the notation that P(r,c) is the probability that respondent r voted for candidate c in the given election.
So
When we ask voters about how they voted in the past, most will give a definitive response, either that they voted for a particular candidate or that that didn't vote at all. However in some cases repondents will say (in many cases with honesty) that they can't remember how they voted or indeed if they voted. In the model we'll work with here, we'll use the same probabilities as before (P(r,c)) even though in many cases there will be certainty, in that case we'll have 0's and 1's. But we can stick with the general case, when there may be come uncertainty.
Using our data, we'll have a raw election result prediction. The proportion of votes for candidate c will be:
For a given candidate say
We can introduce an adjustment and use the parameter
When we have no adjustment
and we'll have a full vote allocation to candidate
We can achieve this with adjustments that are linear in
We'll deal with those two cases separately.
1: When we want to increase the estimate for candidate
In this case
2: When we want to decrease the estimate for candidate
In this case we set:
We have chosen to adjust the probabilities in such a way that the expected number of votes (N) does not change, i.e. N' = N
We can choose the adjustment
In that case:
This approach could be used as follows, when we have the responses from an opion poll which includes both questions about a historical poll and a future poll:
For each candidate in the historical pole, choose the adjustment
No comments:
Post a Comment