Are over-dispersion tests in GLMs actually *useful*?

Are over-dispersion tests in GLMs actually useful?

The phenomenon of 'over-dispersion' in a GLM arises whenever we use a model that restricts the variance of the response variable, and the data exhibits greater variance than the model restriction allows. This occurs commonly when modelling count data using a Poisson GLM, and it can be diagnosed by well-known tests. If tests show that there is statistically significant evidence of over-dispersion then we usually generalise the model by using a broader family of distributions that free the variance parameter from the restriction occurring under the original model. In the case of a Poisson GLM it is common to generalise either to a negative-binomial or quasi-Poisson GLM.

This situation is pregnant with an obvious objection. Why start with a Poisson GLM at all? One can start directly with the broader distributional forms, which have a (relatively) free variance parameter, and allow the variance parameter to be fit to the data, ignoring over-dispersion tests completely. In other situations when we are doing data analysis we almost always use distributional forms that allow freedom of at least the first two-moments, so why make an exception here?

My Question: Is there any good reason to start with a distribution that fixes the variance (e.g., the Poisson distribution) and then perform an over-dispersion test? How does this procedure compare with skipping this exercise completely and going straight to the more general models (e.g., negative-binomial, quasi-Poisson, etc.)? In other words, why not always use a distribution with a free variance parameter?

asked 3 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
3 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
2 hours ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
2 hours ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth that gets at one thing that has always bothered me about analysis of deviance as a goodness of fit test: missing covariates is confounded with overdispersion. It suggests some problems regarding how the material is often taught. I teach a class in categorical and the textbooks don’t make this point very strongly.
$endgroup$
– guy
19 mins ago

add a comment |

asked 3 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
3 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
2 hours ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
2 hours ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth that gets at one thing that has always bothered me about analysis of deviance as a goodness of fit test: missing covariates is confounded with overdispersion. It suggests some problems regarding how the material is often taught. I teach a class in categorical and the textbooks don’t make this point very strongly.
$endgroup$
– guy
19 mins ago

add a comment |

asked 3 hours ago

Ben

24.9k226117

overdispersion

asked 3 hours ago

Ben

24.9k226117

asked 3 hours ago

Ben

24.9k226117

asked 3 hours ago

Ben

24.9k226117

asked 3 hours ago

Ben

24.9k226117

asked 3 hours ago

Ben

24.9k226117

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
3 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
2 hours ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
2 hours ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth that gets at one thing that has always bothered me about analysis of deviance as a goodness of fit test: missing covariates is confounded with overdispersion. It suggests some problems regarding how the material is often taught. I teach a class in categorical and the textbooks don’t make this point very strongly.
$endgroup$
– guy
19 mins ago

add a comment |

$begingroup$
my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.
$endgroup$
– mlofton
3 hours ago

1

$begingroup$
In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.
$endgroup$
– Gordon Smyth
2 hours ago

$begingroup$
@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.
$endgroup$
– Cliff AB
2 hours ago

$begingroup$
@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.
$endgroup$
– Gordon Smyth
1 hour ago

$begingroup$
@GordonSmyth that gets at one thing that has always bothered me about analysis of deviance as a goodness of fit test: missing covariates is confounded with overdispersion. It suggests some problems regarding how the material is often taught. I teach a class in categorical and the textbooks don’t make this point very strongly.
$endgroup$
– guy
19 mins ago

my guess is that, if the underlying truly is poisson, then your glm result will not exhibit those well known-good properties like estimates also being efficient in the sense of the variance of the estimates being greater than it needs to be, if the correct model had been used. Estimates are probably not even unbiased or MLE's. But that's just my intuition and I could be wrong. I'd be curious what a good answer is.

– mlofton
3 hours ago

In my experience, testing for over-dispersion is (paradoxically) mainly of use when you know (from a knowledge of the data generation process) that over-dispersion can't be present. In this context, testing for over-dispersion tells you whether the linear model is picking up all the signal in the data. If it isn't, then adding more covariates to the model should be considered. If it is, then more covariates cannot help.

– Gordon Smyth
2 hours ago

@GordonSmyth: I think that's a good answer. If you don't want to turn that into its own answer, I'll fold it into mine.

– Cliff AB
2 hours ago

@CliffAB Feel free to incorporate my comment into your answer as I don't have time to compose a full answer myself.

– Gordon Smyth
1 hour ago

@GordonSmyth that gets at one thing that has always bothered me about analysis of deviance as a goodness of fit test: missing covariates is confounded with overdispersion. It suggests some problems regarding how the material is often taught. I teach a class in categorical and the textbooks don’t make this point very strongly.

– guy
19 mins ago

add a comment |

2 Answers
2

active

oldest

votes

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

(1) Less flexible means more efficient estimates. Given that variance parameters tend to be less stable than mean parameters, your assumption of fixed mean-variance relation may stabilize standard errors more.

(2) Model checking. I've worked with physicists who believe that various measurements can be described by Poisson distributions due to theoretical physics. If we reject the hypothesis that mean = variance, we have evidence against the Poisson distribution hypothesis. As pointed out in a comment by @GordonSmyth, if you have reason to believe that a given measurement should follow a Poisson distribution, if you have evidence of over dispersion, you have evidence that you are missing important factors.

(2.5) Proper distribution. While the negative binomial regression comes from a valid statistical distribution, it's my understanding that the Quasi-Poisson does not. That means you can't really simulate count data if you believe $Var[y] = alpha E[y]$ for $alpha neq 1$. That might be annoying for some use cases. Likewise, you can't use probabilities to test for outliers, etc.

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
1 hour ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.
$endgroup$
– guy
26 mins ago

add a comment |

Although this is my own question, I'm also going to post my own two-cents as an answer, so that we add to the number of perspectives on this question. The issue here is whether or not it is sensible to initially fit a one-parameter distribution to data. When you use a one-parameter distribution (such as the Poisson GLM, or a binomial GLM with fixed trial parameter), the variance is not a free parameter, and is instead constrained to be some function of the mean. This means that it is ill-advised to fit a one-parameter distribution to data in any situation where you are not absolutely sure that the variance follows the structure of that distribution.

Fitting one-parameter distributions to data is almost always a bad idea: Data is often messier than proposed models indicate, and even when there are theoretical reasons to believe that a particular one-parameter model may obtain, it is often the case that the data actually come from a mixture of that one-parameter distribution, with a range of parameter values. This is often equivalent to a broader model, such as a two-parameter distribution that allows greater freedom for the variance. As discussed below, this is true for the Poisson GLM in the case of count data.

As stated in the question, in most applications of statistics, it is standard practice to use distributional forms that at least allow the first two moments to vary freely. This ensures that the fitted model allows the data to dictate the inferred mean and variance, rather than having these artificially constrained by the model. Having this second parameter only loses one degree-of-freedom in the model, which is a tiny loss compared to the benefit of allowing the variance to be estimated from the data. One can of course extend this reasoning and add a third parameter to allow fitting of skewness, a fourth to allow fitting of kurtosis, etc. The reason that these higher-order moments are not usually as important is that asymptotic theorems for estimators usually show that they converge to a normal distribution (regardless of the higher-order moments of the underlying data) and in this case estimates of the mean and variance are sufficient to get good estimates of the asymptotic distribution of the parameter estimators.

With some extremely minor exceptions, a Poisson GLM is a bad model: In my experience, fitting a Poisson distribution to count data is almost always a bad idea. For count data it is extremely common for the variance in the data to be 'over-dispersed' relative to the Poisson distribution. Even in situations where theory points to a Poisson distribution, often the best model is a mixture of Poisson distributions, where the variance becomes a free parameter. Indeed, in the case of count data the negative-binomial distribution is a Poisson mixture with a gamma distribution for the rate parameter, so even when there are theoretical reasons to think that the counts arrive according to the process of a Poisson distribution, it is often the case that there is 'over-dispersion' and the negative-binomial distribution fits much better.

The practice of fitting a Poisson GLM to count data and then doing a statistical test to check for 'over-dispersion' is an anachronism, and it is hardly ever a good practice. In other forms of statistical analysis, we do not start with a two-parameter distribution, arbitrarily choose a variance restriction, and then test for this restriction to try to eliminate a parameter from the distribution. By doing things this way, we actually create an awkward hybrid procedure, consisting of an initial hypothesis test used for model selection, and then the actual model (either Poisson, or a broader distribution). It has been shown in many contexts that this kind of practice of creating hybrid models from an initial model selection test leads to bad overall models.

An analogous situation, where a similar hybrid method has been used, is in T-tests of mean difference. It used to be the case that statistics courses would recommend first using Levene's test (or even just some much crappier "rules of thumb") to check for equality of variances between two populations, and then if the data "passed" this test you would use the Student T-test that assumes equal variance, and if the data "failed" the test then you would instead use Welch's T-test. This is actually a really bad procedure (see e.g., here and here). It is much better just to use the latter test, which makes no assumption on the variance, rather than creating an awkward compound test that jams together a preliminary hypothesis test and then uses this to choose the model.

For count data, you will generally get good initial results by fitting a two-parameter model such as a negative-binomial or quasi-Poisson model. (Note that the latter is not a real distribution, but it still gives a reasonable two-parameter model.) If any further generalisation is needed at all, it is usually the addition of zero-inflation, where there are an excessive number of zeroes in the data. Restricting to a Poisson GLM is an artificial and senseless model choice, and this is not made much better by testing for over-dispersion.

Okay, now here are the minor exceptions: The only real exceptions to the above are two situations:

(1) You have extremely strong a priori theoretical reasons for believing that the assumptions for the one parameter distribution are satisfied, and part of the analysis is to test this theoretical model against the data; or

(2) For some other (strange) reason, the purpose of your analysis is to conduct a hypothesis test on the variance of the data, and so you actually want to restrict this variance to this hypothesised restriction, and then test this hypothesis.

These situations are very rare. They tend to arise only when there is strong a priori theoretical knowledge about the data-generating mechanism, and the purpose of the analysis is to test this underlying theory. This may be the case in an extremely limited range of applications where data is generated under tightly controlled conditions (e.g., in physics).

answered 31 mins ago

Ben

24.9k226117

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f392591%2fare-over-dispersion-tests-in-glms-actually-useful%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
1 hour ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.
$endgroup$
– guy
26 mins ago

add a comment |

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
1 hour ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.
$endgroup$
– guy
26 mins ago

add a comment |

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

In principle, I actually agree that 99% of the time, it's better to just use the more flexible model. With that said, here are two and a half arguments for why you might not.

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

edited 1 hour ago

answered 3 hours ago

Cliff AB

12.8k12363

answered 3 hours ago

Cliff AB

12.8k12363

answered 3 hours ago

Cliff AB

12.8k12363

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
1 hour ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.
$endgroup$
– guy
26 mins ago

add a comment |

$begingroup$
On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.
$endgroup$
– Björn
1 hour ago

$begingroup$
@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.
$endgroup$
– Cliff AB
1 hour ago

$begingroup$
Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.
$endgroup$
– guy
26 mins ago

On 2.5: There's of course negative binomial and GLMM with random effects that don't have that limitation.

– Björn
1 hour ago

@Björn: that's why it's only half an argument; only applies to Quasi-Likelihood methods. As far as I know, there are no likelihood based methods for under dispersion, even though this can be analyzed with a Quasi-Likelihood model.

– Cliff AB
1 hour ago

Also on 2.5: my understanding is that there is no exponential dispersion family that satisfies the desired relation. Meaning, the quasi score does not correspond to a genuine score. That doesn’t mean there are no families of distributions for count data which satisfy the desired relation; there should be many such families.

– guy
26 mins ago

add a comment |

Okay, now here are the minor exceptions: The only real exceptions to the above are two situations:

answered 31 mins ago

Ben

24.9k226117

add a comment |

Okay, now here are the minor exceptions: The only real exceptions to the above are two situations:

answered 31 mins ago

Ben

24.9k226117

add a comment |

Okay, now here are the minor exceptions: The only real exceptions to the above are two situations:

answered 31 mins ago

Ben

24.9k226117

Okay, now here are the minor exceptions: The only real exceptions to the above are two situations:

answered 31 mins ago

Ben

24.9k226117

answered 31 mins ago

Ben

24.9k226117

answered 31 mins ago

Ben

24.9k226117

answered 31 mins ago

Ben

24.9k226117

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ytdyklly