Are the errors in this formulation of the simple linear regression model random variables?Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?

What do mathematicians mean when they say some conjecture can’t be proven using the current technology?

What's the meaning of こそ in this sentence?

What do these three diagonal lines that cross through three measures and both staves mean, and what are they called?

If I stood next to a piece of metal heated to a million degrees, but in a perfect vacuum, would I feel hot?

Doing research in academia and not liking competition

Mathematica function equivalent to Matlab's residue function (partial fraction expansion)

If a player tries to persuade somebody what should that creature roll not to be persuaded?

Why do the faithful have to say "And with your spirit " in Catholic Mass?

Is there an English equivalent for "Les carottes sont cuites", while keeping the vegetable reference?

What does the BBL file-extension stand for in LaTeX?

Construct a pentagon avoiding compass use

Why hasn't the U.S. government paid war reparations to any country it attacked?

Is `curl something | sudo bash -` a reasonably safe installation method?

Conducting exams in which a computer (but no internet) is available

Why did Spider-Man take a detour to Dorset?

What systems of robust steganography are out there?

Animal Shelter Management C++

Can a Resident Assistant Be Told to Ignore a Lawful Order?

Can a polymorphed creature understand languages spoken under the effect of Tongues?

What are the arguments for California’s nonpartisan blanket primaries?

What is "taytottlers"?

Did the First Order follow Poe to Jakku, or did they independently discover that the map fragment was there?

I gave my characters names that are exactly like another book. Is it a problem?

Why did Steve Rogers choose Sam in Endgame?



Are the errors in this formulation of the simple linear regression model random variables?


Expected Value and Variance of Estimation of Slope Parameter $beta_1$ in Simple Linear RegressionIn simple linear regression, what is the covariance between the error term and the residual?How to get the Standard Error of linear-regression parameters?Ridge regression formulation as constrained versus penalized: How are they equivalent?Linear regression and interpretation of random variablesProblem regarding the concept of random error component in simple regression model and the nature of its varianceErrors and residuals in linear regressionConfusion in terminologies for simple linear regression modelShow that target variable is gaussian in simple linear regressionFormulating quantile regression as Linear Programming problem?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










share|cite|improve this question









$endgroup$


















    3












    $begingroup$


    On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




    The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




    It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










    share|cite|improve this question









    $endgroup$














      3












      3








      3


      1



      $begingroup$


      On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




      The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




      It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?










      share|cite|improve this question









      $endgroup$




      On page 21 of Applied Linear Regression, fourth edition, by Sanford Weisberg, the error $e_i$ for case $i$ under the simple linear regression model is defined to be $y_i - E(Y | X = x_i)$, where $E(Y | X = x_i)$ is assumed to equal $beta_0 + beta_1 x_i$ for some unknown $beta_0, beta_1 in mathbbR$. The book says that




      The errors $e_i$ depend on unknown parameters in the mean function and so are not observable quantities. They are random variables and correspond to the vertical distance between the point $y_i$ and the mean function $E(Y | X = x_i)$.




      It doesn't seem to me like $e_i$ is a random variable, because it's a function of $y_i$ and $x_i$, which are non-random, observed values. Why can $e_i$ be considered a random variable?







      regression random-variable assumptions






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 8 hours ago









      VKVVKV

      1283 bronze badges




      1283 bronze badges




















          2 Answers
          2






          active

          oldest

          votes


















          4












          $begingroup$

          I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



          $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



          Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



          On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



          If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



          However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



          The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



          Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



          Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



          This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



          In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



          You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






          share|cite|improve this answer











          $endgroup$




















            1












            $begingroup$

            In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



            This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






            share|cite|improve this answer









            $endgroup$















              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              4












              $begingroup$

              I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



              $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



              Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



              On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



              If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



              However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



              The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



              Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



              Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



              This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



              In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



              You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






              share|cite|improve this answer











              $endgroup$

















                4












                $begingroup$

                I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






                share|cite|improve this answer











                $endgroup$















                  4












                  4








                  4





                  $begingroup$

                  I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                  $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                  Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                  On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                  If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                  However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                  The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                  Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                  Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                  This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                  In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                  You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.






                  share|cite|improve this answer











                  $endgroup$



                  I looked up your citation (4th edition, page 21) because I found it very alarming and was relieved to find is actually given as:



                  $$ hate_i = y_i − widehatE(Y|X=x_i) = y_i - (hatbeta_0 + hatbeta_1) tag2.3 $$



                  Which is still confusing, I grant you, and the difference isn't actually germane to your question, but at least it isn't patently false. I'll explain why I found it alarming before discussing your (unrelated, I think) question. The "hat" indicates "estimated", usually by MLE in the context of linear regression, and there is a crucial distinction between "true errors" which are denoted $epsilon_i$ and are normally distributed and i.i.d., and "residuals which are denoted $e_i$ and are not i.i.d. The formula without the hats would imply the two are exactly equal which is not the case.



                  On to your real question, which boils down to, "are the given data $x_i$ and $y_i$ random or not?"



                  If you believe the pairs $(x_i, y_i)$ are known and not-random, e.g. that is, if you believe that $forall; 1 leq i leq n,, (x_i, y_i) in mathbbR times mathbbR $, then the residuals $e_i$ are also known and non-random, e.g. $forall; 1 leq i leq n,, e_i in mathbbR$. This is because there is a deterministic function for the "best" parameters $hatbeta_0$ and $hatbeta_1$ from those observations, and then a deterministic function for the residuals in terms of those parameters. This point of view is useful and allows us to derive the MLE estimators of $beta$, for example. It is also the most intuitive view to take when your sitting in front of a concrete, real-world dataset.



                  However, it kind of puts the cart before the horse and basically shuts down certain kinds of statistical analysis. For example, we cannot talk about the "distribution" of $hatbeta_1$ because it is not a random variable and therefore has no distribution! How can we then talk something like the Wald test? Likewise, how do we talk about the "distribution" of residuals so that we can say whether one is an outlier or not?



                  The way this is done is treating the dataset itself as random. When we want to do statistical inference on a known dataset, we can then treat the known values as a realization of the random dataset. The exact construction is a little bit pedantic but and is often omitted but it helps to go through it at least once. First, we say that $X$ and $Y$ are two random variables with some joint probability distribution $F_X,Y(mathbfbeta, sigma^2)$ with parameters $mathbfbeta = [beta_0, beta_1]^T $ and $sigma$. $F_X,Y$ is specified by the model $Y = Xbeta_0 + beta_1 + epsilon, epsilon sim mathcal(0, sigma^2)$. Now, imagine that we have $n$ i.i.d. copies of $F_X,Y$ that we combine into one big joint probability function $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$.



                  Now we can imagine the dataset $(x_i, y_i)$ for $i=1,...,n$ not merely as some known set of numbers, but as a realization sampled from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Each time we sample, we don't just get one pair of numbers, we get $n$ pairs of numbers: a brand new dataset. But that means the parameters $hatbeta$ get new estimates, and we then calculate new residuals $e_i$, right?



                  Instead of thinking of this as repeated sampling, which is somewhat crude, we can express this entirely in the algebra of random variables. It can be expressed as two $n$-dimensional random vectors $vecX$ and $vecY$ drawn from $F_X_1,Y_1,X_2,Y_2,...,X_n,Y_n$. Now $hatbeta_0$ and $hatbeta_1$ are random variables because they are functions of $(vecX, vecY)$. Likewise, all the $e_i$ are random variables because they are functions of $(vecX, vecY)$.



                  This state of affairs is much better, because now we can make statements like "The set of residuals $e_i$ cannot be independent because they always sum exactly to zero" or "the standard error of $hatbeta_1$ follows a t-distribution." without talking literal nonsense. (Both of these statements only make sense if their subjects are random variables.)



                  In the real world we can't always go and get a brand-new, randomly sampled dataset. We can approximate this with something like the bootstrap, of course, but doing it for real isn't usually practical. But doing it conceptually allows us to think clearly about how randomness during sampling would affect our regression.



                  You'll note that I did not introduce new notation for $e_i$ and $hatbeta$ but simply said, "now these things, which we previously thought of a concrete realizations, will now be treated as random variables." As far as I can tell, you just have to be on your toes for this kind of signposting - the same kind you found in your textbook - to indicate whether symbols are referring to random or non-random variables because while there are conventions (such as using uppercase roman letters for random variables) they are not consistently applied. If the author tells you $e_i$ is a random variable, he is telling you is also viewing $x_i$ and $y_i$ as random variables.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 6 hours ago









                  Tim

                  62.8k9 gold badges140 silver badges237 bronze badges




                  62.8k9 gold badges140 silver badges237 bronze badges










                  answered 6 hours ago









                  olooneyolooney

                  1,8888 silver badges19 bronze badges




                  1,8888 silver badges19 bronze badges























                      1












                      $begingroup$

                      In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                      This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                      share|cite|improve this answer









                      $endgroup$

















                        1












                        $begingroup$

                        In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                        This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                        share|cite|improve this answer









                        $endgroup$















                          1












                          1








                          1





                          $begingroup$

                          In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                          This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.






                          share|cite|improve this answer









                          $endgroup$



                          In simple linear regression, we assume that the observations are randomly perturbed from the conditional expected value, i.e. $E[Y|X=x_i]$; so, each of your observations are assumed to be generated from a model of the form: $$Y=beta_0+beta_1X+epsilon , epsilonsim N(0,sigma^2)$$



                          This makes each $epsilon_i$ a RV by definition. Think about a box where you give $x_i$ and get $y_i$, and you never know what's inside, how much error is introduced by the box etc. Even if we really know that the relation is of the form given above, we don't know the true $beta_0,beta_1$. If we had known those quantities, we would easily recover $epsilon_i$. Instead, we estimate those, and get residuals.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered 6 hours ago









                          gunesgunes

                          11.4k1 gold badge4 silver badges19 bronze badges




                          11.4k1 gold badge4 silver badges19 bronze badges



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417529%2fare-the-errors-in-this-formulation-of-the-simple-linear-regression-model-random%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                              Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                              François Viète Contents Biography Work and thought Bibliography See also Notes Further reading External links Navigation menup. 21Google Bookspp. 75–77Google BooksDe thou (from University of Saint Andrews)ArchivedGoogle BooksGoogle BooksGoogle BooksGoogle booksGoogle Bookscc-parthenay.frL'histoire universelle (fr)Universal History (en)ArchivedAdsabs.harvard.eduPagesperso-orange.frArchive.orgChikara Sasaki. Descartes' mathematical thought p.259Google BooksGoogle BooksGoogle Bookspp. 152 and onwardGoogle BooksGoogle BooksScribd.comGoogle Books1257-7979Google BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGoogle BooksGallica.bnf.frGoogle BooksGoogle Books"François Viète"Francois Viète: Father of Modern Algebraic NotationThe Lawyer and the GamblerAbout TarporleySite de Jean-Paul GuichardL'algèbre nouvelle"About the Harmonicon"cb120511976(data)1188044800000 0001 0913 5903n82164680ola2013766880073431702w6vt1sb70287374827140948071409480