Are the errors in this formulation of the simple linear regression model random variables?Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?
What do mathematicians mean when they say some conjecture can’t be proven using the current technology?
What's the meaning of こそ in this sentence?
What do these three diagonal lines that cross through three measures and both staves mean, and what are they called?
If I stood next to a piece of metal heated to a million degrees, but in a perfect vacuum, would I feel hot?
Doing research in academia and not liking competition
Mathematica function equivalent to Matlab's residue function (partial fraction expansion)
If a player tries to persuade somebody what should that creature roll not to be persuaded?
Why do the faithful have to say "And with your spirit " in Catholic Mass?
Is there an English equivalent for "Les carottes sont cuites", while keeping the vegetable reference?
What does the BBL file-extension stand for in LaTeX?
Construct a pentagon avoiding compass use
Why hasn't the U.S. government paid war reparations to any country it attacked?
Is `curl something | sudo bash -` a reasonably safe installation method?
Conducting exams in which a computer (but no internet) is available
Why did Spider-Man take a detour to Dorset?
What systems of robust steganography are out there?
Animal Shelter Management C++
Can a Resident Assistant Be Told to Ignore a Lawful Order?
Can a polymorphed creature understand languages spoken under the effect of Tongues?
What are the arguments for California’s nonpartisan blanket primaries?
What is "taytottlers"?
Did the First Order follow Poe to Jakku, or did they independently discover that the map fragment was there?
I gave my characters names that are exactly like another book. Is it a problem?
Why did Steve Rogers choose Sam in Endgame?
Are the errors in this formulation of the simple linear regression model random variables?
Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that
The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.
It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?
regression random-variable assumptions
$endgroup$
add a comment |
$begingroup$
On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that
The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.
It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?
regression random-variable assumptions
$endgroup$
add a comment |
$begingroup$
On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that
The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.
It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?
regression random-variable assumptions
$endgroup$
On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that
The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.
It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?
regression random-variable assumptions
regression random-variable assumptions
asked 8 hours ago
VKVVKV
1283 bronze badges
1283 bronze badges
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:
$$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$
Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.
On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"
If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.
However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?
The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.
Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?
Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.
This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)
In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.
You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.
$endgroup$
add a comment |
$begingroup$
In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$
This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:
$$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$
Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.
On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"
If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.
However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?
The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.
Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?
Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.
This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)
In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.
You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.
$endgroup$
add a comment |
$begingroup$
I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:
$$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$
Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.
On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"
If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.
However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?
The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.
Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?
Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.
This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)
In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.
You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.
$endgroup$
add a comment |
$begingroup$
I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:
$$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$
Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.
On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"
If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.
However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?
The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.
Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?
Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.
This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)
In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.
You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.
$endgroup$
I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:
$$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$
Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.
On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"
If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.
However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?
The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.
Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?
Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.
This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)
In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.
You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.
edited 6 hours ago
Tim♦
62.8k9 gold badges140 silver badges237 bronze badges
62.8k9 gold badges140 silver badges237 bronze badges
answered 6 hours ago
olooneyolooney
1,8888 silver badges19 bronze badges
1,8888 silver badges19 bronze badges
add a comment |
add a comment |
$begingroup$
In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$
This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.
$endgroup$
add a comment |
$begingroup$
In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$
This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.
$endgroup$
add a comment |
$begingroup$
In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$
This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.
$endgroup$
In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$
This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.
answered 6 hours ago
gunesgunes
11.4k1 gold badge4 silver badges19 bronze badges
11.4k1 gold badge4 silver badges19 bronze badges
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown