Waiting time distribution parameters given expected meanHow do I get the CDF of a gamma distribution with mean and sd?Average waiting timeBayesian modeling of train wait times: The model definitionBayesian approach to estimate expected run time of an algorithmGenerate Posterior predictive distribution at every step in the MCMC chain for a hierarchical regression modelBeeping boxes: survival analysis? times series?Distribution of inter arrival times in a Poisson processPoisson distribution calculate event from probabilityJensen's inequality; Hazard, Likelihood and Prior?Probabilities of medical risksBayesian hypothesis testing with multiple beta-binomials
Waiting time distribution parameters given expected mean
Deep Learning based time series forecasting
What are some symbols representing peasants/oppressed persons fighting back?
Should you avoid redundant information after dialogue?
Is it okay to retroactively change things when running a published adventure?
(algebraic topology) question about the cellular approximation theorem
Was adding milk to tea started to reduce employee tea break time?
What exactly is the Tension force?
Old short story where the future emperor of the galaxy is taken for a tour around Earth
Remove intersect line for one circle using venndiagram2sets
GPIO and Python - GPIO.output() not working
Are villager price increases due to killing them temporary?
What is this old "lemon-squeezer" shaped pan
Why linear regression uses "vertical" distance to the best-fit-line, instead of actual distance?
Does ability to impeach an expert witness on science or scholarship go too far?
How to make "plastic" sounding distored guitar
Filtering fine silt/mud from water (not necessarily bacteria etc.)
Nested-Loop-Join: How many comparisons and how many pages-accesses?
Won 50K! Now what should I do with it
Hot object in a vacuum
3D-Plot with an inequality condition for parameter values
Why does the trade federation become so alarmed upon learning the ambassadors are Jedi Knights?
Redox reactions redefined
Are there any double stars that I can actually see orbit each other?
Waiting time distribution parameters given expected mean
How do I get the CDF of a gamma distribution with mean and sd?Average waiting timeBayesian modeling of train wait times: The model definitionBayesian approach to estimate expected run time of an algorithmGenerate Posterior predictive distribution at every step in the MCMC chain for a hierarchical regression modelBeeping boxes: survival analysis? times series?Distribution of inter arrival times in a Poisson processPoisson distribution calculate event from probabilityJensen's inequality; Hazard, Likelihood and Prior?Probabilities of medical risksBayesian hypothesis testing with multiple beta-binomials
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.
A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...
I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.
I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.
I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.
Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.
I would like to know if I choose the right distributions given the problem at hand.
probability distributions bayesian modeling
$endgroup$
add a comment |
$begingroup$
I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.
A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...
I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.
I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.
I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.
Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.
I would like to know if I choose the right distributions given the problem at hand.
probability distributions bayesian modeling
$endgroup$
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago
add a comment |
$begingroup$
I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.
A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...
I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.
I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.
I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.
Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.
I would like to know if I choose the right distributions given the problem at hand.
probability distributions bayesian modeling
$endgroup$
I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.
A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...
I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.
I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.
I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.
Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.
I would like to know if I choose the right distributions given the problem at hand.
probability distributions bayesian modeling
probability distributions bayesian modeling
asked 8 hours ago
BakaburgBakaburg
95411 silver badges23 bronze badges
95411 silver badges23 bronze badges
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago
add a comment |
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.
Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]
Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.
set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662
So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$
In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]
par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))
These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101)
, where parameter n
is the desired number of simulated values.
However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.
Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]
Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2)
.
set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749
Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.
Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]
$endgroup$
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417425%2fwaiting-time-distribution-parameters-given-expected-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.
Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]
Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.
set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662
So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$
In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]
par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))
These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101)
, where parameter n
is the desired number of simulated values.
However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.
Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]
Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2)
.
set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749
Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.
Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]
$endgroup$
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
add a comment |
$begingroup$
It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.
Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]
Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.
set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662
So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$
In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]
par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))
These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101)
, where parameter n
is the desired number of simulated values.
However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.
Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]
Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2)
.
set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749
Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.
Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]
$endgroup$
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
add a comment |
$begingroup$
It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.
Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]
Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.
set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662
So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$
In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]
par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))
These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101)
, where parameter n
is the desired number of simulated values.
However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.
Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]
Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2)
.
set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749
Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.
Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]
$endgroup$
It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.
Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]
Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.
set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662
So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$
In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]
par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))
These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101)
, where parameter n
is the desired number of simulated values.
However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.
Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]
Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2)
.
set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749
Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.
Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]
edited 6 hours ago
answered 7 hours ago
BruceETBruceET
11k1 gold badge8 silver badges25 bronze badges
11k1 gold badge8 silver badges25 bronze badges
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
add a comment |
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417425%2fwaiting-time-distribution-parameters-given-expected-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago
$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago
$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago