Waiting time distribution parameters given expected meanHow do I get the CDF of a gamma distribution with mean and sd?Average waiting timeBayesian modeling of train wait times: The model definitionBayesian approach to estimate expected run time of an algorithmGenerate Posterior predictive distribution at every step in the MCMC chain for a hierarchical regression modelBeeping boxes: survival analysis? times series?Distribution of inter arrival times in a Poisson processPoisson distribution calculate event from probabilityJensen's inequality; Hazard, Likelihood and Prior?Probabilities of medical risksBayesian hypothesis testing with multiple beta-binomials

Waiting time distribution parameters given expected mean

Deep Learning based time series forecasting

What are some symbols representing peasants/oppressed persons fighting back?

Should you avoid redundant information after dialogue?

Is it okay to retroactively change things when running a published adventure?

(algebraic topology) question about the cellular approximation theorem

Was adding milk to tea started to reduce employee tea break time?

What exactly is the Tension force?

Old short story where the future emperor of the galaxy is taken for a tour around Earth

Remove intersect line for one circle using venndiagram2sets

GPIO and Python - GPIO.output() not working

Are villager price increases due to killing them temporary?

What is this old "lemon-squeezer" shaped pan

Why linear regression uses "vertical" distance to the best-fit-line, instead of actual distance?

Does ability to impeach an expert witness on science or scholarship go too far?

How to make "plastic" sounding distored guitar

Filtering fine silt/mud from water (not necessarily bacteria etc.)

Nested-Loop-Join: How many comparisons and how many pages-accesses?

Won 50K! Now what should I do with it

Hot object in a vacuum

3D-Plot with an inequality condition for parameter values

Why does the trade federation become so alarmed upon learning the ambassadors are Jedi Knights?

Redox reactions redefined

Are there any double stars that I can actually see orbit each other?



Waiting time distribution parameters given expected mean


How do I get the CDF of a gamma distribution with mean and sd?Average waiting timeBayesian modeling of train wait times: The model definitionBayesian approach to estimate expected run time of an algorithmGenerate Posterior predictive distribution at every step in the MCMC chain for a hierarchical regression modelBeeping boxes: survival analysis? times series?Distribution of inter arrival times in a Poisson processPoisson distribution calculate event from probabilityJensen's inequality; Hazard, Likelihood and Prior?Probabilities of medical risksBayesian hypothesis testing with multiple beta-binomials






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1












$begingroup$


I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.



A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...



I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.



I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.



I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.



Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.



I would like to know if I choose the right distributions given the problem at hand.










share|cite|improve this question









$endgroup$











  • $begingroup$
    The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
    $endgroup$
    – Matthew Anderson
    8 hours ago











  • $begingroup$
    Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
    $endgroup$
    – Matthew Anderson
    8 hours ago










  • $begingroup$
    wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
    $endgroup$
    – Matthew Anderson
    4 hours ago

















1












$begingroup$


I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.



A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...



I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.



I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.



I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.



Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.



I would like to know if I choose the right distributions given the problem at hand.










share|cite|improve this question









$endgroup$











  • $begingroup$
    The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
    $endgroup$
    – Matthew Anderson
    8 hours ago











  • $begingroup$
    Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
    $endgroup$
    – Matthew Anderson
    8 hours ago










  • $begingroup$
    wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
    $endgroup$
    – Matthew Anderson
    4 hours ago













1












1








1





$begingroup$


I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.



A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...



I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.



I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.



I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.



Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.



I would like to know if I choose the right distributions given the problem at hand.










share|cite|improve this question









$endgroup$




I have a set of healthcare providers serving patients. In a given amount of time, a specific provider can see only a certain amount of patients, depending on the medical procedure and other variables.



A patient, since the booking of the procedure, is seen by the doctor after a certain amount of days, again specific to provider, procedure etc...



I have a data set in which for each provider, for a given amount of time, I know how many patients have been seen and what was the average waiting time.
I want to simulate more patients for each provider and their individual waiting times.



I assumed that the number of patients seen in the given time can be modeled as a $sim Poisson(lambda)$ with lambda depending on provider and procedure characteristics and amount of time.



I modeled the average waiting time as $sim lognormal(mu_global, sigma)$ with parameters as function of the same variables of before plus the log of n.patients.



Finally, I'm modeling the simulated new patients waiting time as $sim Gamma(mu/theta,theta)$ with $mu$ predicted from the model above and $theta$ chosen using domain knowledge since I don't have past information on individual waiting times.



I would like to know if I choose the right distributions given the problem at hand.







probability distributions bayesian modeling






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 8 hours ago









BakaburgBakaburg

95411 silver badges23 bronze badges




95411 silver badges23 bronze badges











  • $begingroup$
    The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
    $endgroup$
    – Matthew Anderson
    8 hours ago











  • $begingroup$
    Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
    $endgroup$
    – Matthew Anderson
    8 hours ago










  • $begingroup$
    wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
    $endgroup$
    – Matthew Anderson
    4 hours ago
















  • $begingroup$
    The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
    $endgroup$
    – Matthew Anderson
    8 hours ago











  • $begingroup$
    Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
    $endgroup$
    – Matthew Anderson
    8 hours ago










  • $begingroup$
    wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
    $endgroup$
    – Matthew Anderson
    4 hours ago















$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago





$begingroup$
The number of patients seen in the given time cannot be modeled as Poisson — this is not a Poisson process because the events are not independent of each other. If a provider sees one patient, it becomes much more unlikely they will see the next because they have to spend time with the patient. In other words, the time between events cannot be modeled by an exponential distribution. Instead, time between events should be modeled as a gamma distribution where $ ktheta approx 20 $ (if 20min is the average time spent seeing a patient).
$endgroup$
– Matthew Anderson
8 hours ago













$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago




$begingroup$
Moreover, could you please explain the derivation of $lambda$ with respect to $Poisson(lambda)$
$endgroup$
– Matthew Anderson
8 hours ago












$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago




$begingroup$
wait wait. This is not a queueing system, in which patients are seen in the same order in which they book. I can get my visit booked in urgency (less than one day) or months from today. Then there is the personnel at each provider (can be more than one doctor). Providers always keep some room for unseen visits that are more urgent. Furthermore operators move patients up and down (with no clear criteria sometime). Therefore I thought about treating the patients as independent, with a different Poisson process for each situation (providere, procedure, emergency class) with a different $lambda$.
$endgroup$
– Bakaburg
5 hours ago












$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago




$begingroup$
Then, maybe I wasn't clear, but I don't have individual patient data, I only got the average waiting time for a certain amount of patients seen in a certain amount of time of provider activity (ie. 4 and 12 months)
$endgroup$
– Bakaburg
5 hours ago












$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago




$begingroup$
still sounds not Poisson to me because the waiting time would be Gamma where $mu >approx 10$ as opposed to exponential.
$endgroup$
– Matthew Anderson
4 hours ago










1 Answer
1






active

oldest

votes


















3












$begingroup$

It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.



Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]



Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.



set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662


So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$



In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]



par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))


enter image description here



These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101), where parameter n is the desired number of simulated values.



However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.



Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]



Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2).



set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749


Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.



enter image description here



Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]



enter image description here






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
    $endgroup$
    – BruceET
    4 hours ago













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417425%2fwaiting-time-distribution-parameters-given-expected-mean%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.



Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]



Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.



set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662


So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$



In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]



par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))


enter image description here



These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101), where parameter n is the desired number of simulated values.



However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.



Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]



Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2).



set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749


Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.



enter image description here



Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]



enter image description here






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
    $endgroup$
    – BruceET
    4 hours ago















3












$begingroup$

It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.



Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]



Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.



set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662


So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$



In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]



par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))


enter image description here



These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101), where parameter n is the desired number of simulated values.



However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.



Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]



Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2).



set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749


Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.



enter image description here



Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]



enter image description here






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
    $endgroup$
    – BruceET
    4 hours ago













3












3








3





$begingroup$

It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.



Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]



Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.



set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662


So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$



In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]



par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))


enter image description here



These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101), where parameter n is the desired number of simulated values.



However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.



Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]



Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2).



set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749


Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.



enter image description here



Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]



enter image description here






share|cite|improve this answer











$endgroup$



It is best to look at the distribution of waiting times for a particular provider. My first thought would be that if the process is anything like a queueing If process that the distribution should be nearly exponential. So I would check to see if the sample mean and standard deviation are approximately equal. If so, I would look to see if an empirical CDF (ECDF) of the data
roughly fits the CDF of $mathsfExp(textrate = lambda),$ where $mu = 1/lambda$ is estimated as $hat mu = bar X,$ the sample mean.



Only if that doesn't seem to work well, would I pursue fitting the data to a gamma distribution. This is also a plausible possibility, partly because the sum of $k$ exponential waiting times (of the same rate) is gamma-distributed with shape parameter $k.$ [If you use a 'distribution ID' program, a 'gamma' distribution will almost always win out over 'exponential' because the family $mathsfExp$ is a sub-family of $mathsfGamma.$]



Exponential data. As an initial example, let's pursue data randomly sampled from $mathsfExp(textrate = 1/10),$ so that the average waiting time is (a perhaps optimistic) 10 days. Suppose we
have waiting times for $n = 500$ patients.



set.seed(714) # for reproducibility
x = rexp(500, 0.1)
mean(x); sd(x)
[1] 9.909112
[1] 10.36662


So the sample mean and SD are about the same. In practice, I would not know
the true rate $lambda$ so I will estimate it as $hat lambda = 1/9.9 = 0.101.$



In the plot below, the boxplot shows many high 'outliers', as is typical of
an exponential sample. The Density function of $mathsfExp(0.101),$ is a
reasonable fit to the histogram of the data. Also (usually more revealing),
the ECDF plot of the sample is well-approximated by the CDF of this distribution. [The ECDF is a 'stairstep' plot that jumps up by $1/500$ at each of the $500$ observed
values.]



par(mfrow=c(1,3))
boxplot(x, col="skyblue2", pch=19, main="Boxplot of DATA")
hist(x, prob=T, br=20, col="skyblue2", main="Histogram with EXP(.101) Density")
rug(x); curve(dexp(x, .101), add=T, col="red")
plot(ecdf(x), main="ECDF with EXP(.101) CDF")
curve(pexp(x, .101), add=T, col="red")
par(mfrow=c(1,1))


enter image description here



These favorable results are hardly surprising because data were sampled from an exponential population. If real data performs as well, then you could simulate additional data from a similar population with R code rexp(n, 0.101), where parameter n is the desired number of simulated values.



However, you must realize that you are not gaining additional information about actual patient waiting times by doing that. All the 'information' you have is given by the sample and the assumption that the population is exponential.



Gamma data. If the exponential model does not seem to fit, perhaps the next step is to assume that data are gamma-distributed, to estimate the parameters, and make similar plots to see if you get a better fit to the data.
[Several pages on this site and online discuss estimation of gamma
parameters; one recent page discusses both MMEs and MLEs.]



Just to see what happens if we have gamma data and try to fit an exponential model, I repeat the simulation above, but using x = rgamma(500, 2, .2).



set.seed(714) # for reproducibility
x = rgamma(500, 2, 0.2)
mean(x); var(x)
[1] 10.62662
[1] 59.49749


Pretending that these data are exponential and estimating the rate as
$hat lambda = 0.1062,$ R code similar to that above gives the following
graphs--with noticeably unsatisfactory fits.



enter image description here



Using a gamma model with method-of-moments estimators (MMEs) from the link above, I estimate
the shape parameter as $hat alpha = 1.90$ and the rate parameter as
$hat lambda = 0.178.$ [Maximum likelihood estimators (MLEs) would likely be
more accurate, but MMEs are good enough to give an idea how the graphing procedure works.]



enter image description here







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 6 hours ago

























answered 7 hours ago









BruceETBruceET

11k1 gold badge8 silver badges25 bronze badges




11k1 gold badge8 silver badges25 bronze badges











  • $begingroup$
    Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
    $endgroup$
    – BruceET
    4 hours ago
















  • $begingroup$
    Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
    $endgroup$
    – Bakaburg
    5 hours ago










  • $begingroup$
    I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
    $endgroup$
    – BruceET
    4 hours ago















$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago




$begingroup$
Thank you for the elaborate and formative answer! I need to be sure we are on the same page here: your answer was about modeling the average waiting time per provider isn't it? not the single patient wating time. Because I got only the first as data, while the second (my goal) are to be simulated based only on theoretical information (is this or that distribution good for that process?).
$endgroup$
– Bakaburg
5 hours ago












$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago




$begingroup$
Furthermore I'm afraid I cannot even replicate your analysis on the average watiting times, because it would be wrong to assume that distribution parameters are the same for each context (provider, procedure, urgency, etc). That's why I used a hierarchical model in which I model the average waiting time as from lognormal (is it ok?) distributions whose parameter (eg. $mu_global$) are predicted from the context characteristics. Therefore in my data there's a lot more variance than what to expect from canonical distributions. I don't know if I explained myself.
$endgroup$
– Bakaburg
5 hours ago












$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago




$begingroup$
I think you need to try to understand the distributions of waiting times for individual providers first. Then you might be able to simulate individual patient waiting times depending on provider, diagnosis, and so on. Not knowing whether waiting times for different providers are even governed by the same distribution families, I don't see how you can make sense out of a more comprehensive model. // I don't understand your rationale for using the lognormal distribution. Perhaps that is useful as an overall descriptive tactic, but I don't see how it can help modeling for individual patients.
$endgroup$
– BruceET
4 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417425%2fwaiting-time-distribution-parameters-given-expected-mean%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

199年 目錄 大件事 到箇年出世嗰人 到箇年死嗰人 節慶、風俗習慣 導覽選單