Gamma GLM - Derive prediction intervals for new x_iCan we make probabilistic statements with prediction intervals?Does it make sense to generate prediction intervals for the estimates of a logistic regression?Credible/Confidence intervals for fitted valuesIs the canonical parameter (and therefore the canonical link function) for a Gamma not unique?
What is this symbol: semicircles facing eachother
Why don't we use Cavea-B
Is there a limit on how long the casting (speaking aloud part of the spell) of Wish can be?
Why did MS-DOS applications built using Turbo Pascal fail to start with a division by zero error on faster systems?
Script that helps people make better choices
Why does The Ancient One think differently about Doctor Strange in Endgame than the film Doctor Strange?
How is "sein" conjugated in this sub-sentence?
Church Booleans
Why aren't RCS openings an issue for spacecraft heat shields?
Do AT motherboards (286, 386, 486) really need -5V (besides redirecting it to ISA connectors)?
Did the British navy fail to take into account the ballistics correction due to Coriolis force during WW1 Falkland Islands battle?
How to compare two different formulations of a problem?
Why is my Earth simulation slower than the reality?
Are illustrations in novels frowned upon?
Why is 日本 read as "nihon" but not "nitsuhon"?
Brexit and backstop: would changes require unanimous approval by all EU countries? Does Ireland hold a veto?
Was Tuvok bluffing when he said that Voyager's transporters rendered the Kazon weapons useless?
Can pay be witheld for hours cleaning up after closing time?
Are required indicators necessary for radio buttons?
Which household object drew this pattern?
Is “I am getting married with my sister” ambiguous?
On the feasibility of space battleships
In the MCU, why does Mjölnir retain its enchantments after Ragnarok?
What professions would a medieval village with a population of 100 need?
Gamma GLM - Derive prediction intervals for new x_i
Can we make probabilistic statements with prediction intervals?Does it make sense to generate prediction intervals for the estimates of a logistic regression?Credible/Confidence intervals for fitted valuesIs the canonical parameter (and therefore the canonical link function) for a Gamma not unique?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
In a Gamma GLM, the statistical model for each observation 𝑖 is assumed to be $Y_i sim Gamma(shape, scale)$, where $E(Y_i) = mu_i = f(X_ibeta)$, and $f$ is the link function.
I've used MLE to estimate $hatbeta$ and $hatscale$, and wish to produce a 90% prediction interval on a new point $Y'$ given $X'$.
I can produce the confidence intervals on $E(Y|X') = mu'$ by using link function $f$ on the normally distributed confidence intervals for $Xhatbeta$. Let's say $hatmu' = 10$ and 90% confidence intervals are [5, 30].
However, we want the intervals from the distribution of $Y'$, not $mu'$. Intuitively, these intervals should be much wider than the confidence intervals for $mu'$ I think they should also be wider than the 5th and 95th percentile of a single Gamma distribution with $mu=hatmu'$, since the uncertainty around $hatmu'$ should translate into increased uncertainty around the final distribution, sort of like an vague prior on a bayesian posterior distribution.
What is the correct way to model prediction intervals on the new point $Y'$?
The below schema shows how uncertainty on $mu'$ translates into many possible gamma distributions and a wide prediction interval for $Y'$
References:
https://www.rocscience.com/help/swedge/swedge/Gamma_Distribution.htm
https://www.statsmodels.org/stable/glm.html
generalized-linear-model gamma-distribution prediction-interval statsmodels
$endgroup$
add a comment |
$begingroup$
In a Gamma GLM, the statistical model for each observation 𝑖 is assumed to be $Y_i sim Gamma(shape, scale)$, where $E(Y_i) = mu_i = f(X_ibeta)$, and $f$ is the link function.
I've used MLE to estimate $hatbeta$ and $hatscale$, and wish to produce a 90% prediction interval on a new point $Y'$ given $X'$.
I can produce the confidence intervals on $E(Y|X') = mu'$ by using link function $f$ on the normally distributed confidence intervals for $Xhatbeta$. Let's say $hatmu' = 10$ and 90% confidence intervals are [5, 30].
However, we want the intervals from the distribution of $Y'$, not $mu'$. Intuitively, these intervals should be much wider than the confidence intervals for $mu'$ I think they should also be wider than the 5th and 95th percentile of a single Gamma distribution with $mu=hatmu'$, since the uncertainty around $hatmu'$ should translate into increased uncertainty around the final distribution, sort of like an vague prior on a bayesian posterior distribution.
What is the correct way to model prediction intervals on the new point $Y'$?
The below schema shows how uncertainty on $mu'$ translates into many possible gamma distributions and a wide prediction interval for $Y'$
References:
https://www.rocscience.com/help/swedge/swedge/Gamma_Distribution.htm
https://www.statsmodels.org/stable/glm.html
generalized-linear-model gamma-distribution prediction-interval statsmodels
$endgroup$
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago
add a comment |
$begingroup$
In a Gamma GLM, the statistical model for each observation 𝑖 is assumed to be $Y_i sim Gamma(shape, scale)$, where $E(Y_i) = mu_i = f(X_ibeta)$, and $f$ is the link function.
I've used MLE to estimate $hatbeta$ and $hatscale$, and wish to produce a 90% prediction interval on a new point $Y'$ given $X'$.
I can produce the confidence intervals on $E(Y|X') = mu'$ by using link function $f$ on the normally distributed confidence intervals for $Xhatbeta$. Let's say $hatmu' = 10$ and 90% confidence intervals are [5, 30].
However, we want the intervals from the distribution of $Y'$, not $mu'$. Intuitively, these intervals should be much wider than the confidence intervals for $mu'$ I think they should also be wider than the 5th and 95th percentile of a single Gamma distribution with $mu=hatmu'$, since the uncertainty around $hatmu'$ should translate into increased uncertainty around the final distribution, sort of like an vague prior on a bayesian posterior distribution.
What is the correct way to model prediction intervals on the new point $Y'$?
The below schema shows how uncertainty on $mu'$ translates into many possible gamma distributions and a wide prediction interval for $Y'$
References:
https://www.rocscience.com/help/swedge/swedge/Gamma_Distribution.htm
https://www.statsmodels.org/stable/glm.html
generalized-linear-model gamma-distribution prediction-interval statsmodels
$endgroup$
In a Gamma GLM, the statistical model for each observation 𝑖 is assumed to be $Y_i sim Gamma(shape, scale)$, where $E(Y_i) = mu_i = f(X_ibeta)$, and $f$ is the link function.
I've used MLE to estimate $hatbeta$ and $hatscale$, and wish to produce a 90% prediction interval on a new point $Y'$ given $X'$.
I can produce the confidence intervals on $E(Y|X') = mu'$ by using link function $f$ on the normally distributed confidence intervals for $Xhatbeta$. Let's say $hatmu' = 10$ and 90% confidence intervals are [5, 30].
However, we want the intervals from the distribution of $Y'$, not $mu'$. Intuitively, these intervals should be much wider than the confidence intervals for $mu'$ I think they should also be wider than the 5th and 95th percentile of a single Gamma distribution with $mu=hatmu'$, since the uncertainty around $hatmu'$ should translate into increased uncertainty around the final distribution, sort of like an vague prior on a bayesian posterior distribution.
What is the correct way to model prediction intervals on the new point $Y'$?
The below schema shows how uncertainty on $mu'$ translates into many possible gamma distributions and a wide prediction interval for $Y'$
References:
https://www.rocscience.com/help/swedge/swedge/Gamma_Distribution.htm
https://www.statsmodels.org/stable/glm.html
generalized-linear-model gamma-distribution prediction-interval statsmodels
generalized-linear-model gamma-distribution prediction-interval statsmodels
edited 7 hours ago
Nelson
asked 8 hours ago
NelsonNelson
1237 bronze badges
1237 bronze badges
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago
add a comment |
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.
In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.
Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.
Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations.
One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.
Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.
$endgroup$
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note thatpredicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.
$endgroup$
– Josef
2 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f423169%2fgamma-glm-derive-prediction-intervals-for-new-x-i%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.
In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.
Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.
Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations.
One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.
Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.
$endgroup$
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note thatpredicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.
$endgroup$
– Josef
2 hours ago
add a comment |
$begingroup$
The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.
In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.
Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.
Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations.
One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.
Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.
$endgroup$
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note thatpredicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.
$endgroup$
– Josef
2 hours ago
add a comment |
$begingroup$
The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.
In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.
Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.
Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations.
One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.
Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.
$endgroup$
The prediction interval for a new observation depends on both the assumed inherent randomness in this case given by the gamma distribution, and the uncertainty coming from the parameters that are estimated and not assumed to be known.
In general there is no analytical or closed form expression for the combination of the two effects. The two main options are to ignore parameter uncertainty and to use simulation methods.
Ignoring parameter uncertainty: If we ignore that the parameters are estimated with some sampling uncertainty, the the distribution of a new observation is just given by the assumed gamma distribution. We can use the mean and scale estimates to compute the relevant prediction intervals using e.g. the distribution methods in scipy.stats. The parameterization might have to be transformed from the GLM parameterization to the scipy.stats parameterization.
Simulation Methods: One possibility is to use full bootstrap on the original sample to simulate new observations. The simpler method is to assume that the asymptotic normal distribution for the parameter estimates are appropriate and simulate the parameters of the gamma distribution from the mean and covariance of the parameter estimates. For each sampled parameter we can sample a new observation and compute a confidence interval from the simulated observations.
One problem with this is that GLM only estimates the mean parameters directly, the scale is estimated using deviance or pearson chi2. That is, GLM in statsmodels in other packages does not provide a joint covariance for mean and scale parameter.
Because of these problems, statsmodels currently provides prediction intervals for new observations that take parameter uncertainty into account only for the linear normal case, i.e. OLS.
answered 7 hours ago
JosefJosef
1,7431 gold badge10 silver badges12 bronze badges
1,7431 gold badge10 silver badges12 bronze badges
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note thatpredicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.
$endgroup$
– Josef
2 hours ago
add a comment |
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note thatpredicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.
$endgroup$
– Josef
2 hours ago
1
1
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Thanks @Josef. How about sampling the underlying data with replacement, fitting the full Gamma GLM on each new sample and predicting y_i, and then take the distribution of those y_i s ? That way, we get some natural variation in the scale parameter by differing deviances
$endgroup$
– Nelson
7 hours ago
$begingroup$
Note that
predicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.$endgroup$
– Josef
2 hours ago
$begingroup$
Note that
predicting y_i
usually means to predict the mean or expected value and not an observation. You would need a draw from the predictive distribution for a new observation. Something like that is an option that includes all estimation uncertainty and what I meant with full bootstrap. The disadvantage is that it is costly to compute because the model has to be estimated for each draw of a bootstrap sample. Using the original parameter estimates as starting values will speed up the computations but there is still a large amount of overhead to compute.$endgroup$
– Josef
2 hours ago
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f423169%2fgamma-glm-derive-prediction-intervals-for-new-x-i%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
The interval you are asking for is called a prediction-interval. A confidence interval is something else.
$endgroup$
– whuber♦
8 hours ago
$begingroup$
Thanks @whuber, fixed
$endgroup$
– Nelson
7 hours ago