LASSO Regression - p-values and coefficientsStandard errors for lasso prediction using RUsing regularization when doing statistical inferenceTesting for coefficients significance in Lasso logistic regressionExplanation for unstable lasso regression coefficients?How to handle with missing values in order to prepare data for feature selection with LASSO?Interpretation of coefficients of glmnet - LASSO/Cox model?Difference between trace plot and cv.glmnetInterpretation of LASSO regression coefficientsTesting for coefficients significance in Lasso logistic regressionA generalized LASSO constraintRelaxed Lasso Logistic Regression: Estimating second penalty parameterWhy Lasso classification results changes in Matlab?Implementing Lasso Regression in Numpy

How can I find where certain bash function is defined?

Rests in pickup measure (anacrusis)

Why are these traces shaped in such way?

Apparent Ring of Craters on the Moon

Is healing by fire possible?

Why is this Simple Puzzle impossible to solve?

Does this degree 12 genus 1 curve have only one point over infinitely many finite fields?

What are these (utility?) boxes at the side of the house?

What is the most important source of natural gas? coal, oil or other?

What is the largest (size) solid object ever dropped from an airplane to impact the ground in freefall?

Is CD audio quality good enough for the final delivery of music?

Canon 70D often overexposing or underexposing shots

Geological aftereffects of an asteroid impact on a large mountain range?

Ticket sales for Queen at the Live Aid

Python program to convert a 24 hour format to 12 hour format

Command to Search for Filenames Exceeding 143 Characters?

How to prevent bad sectors?

How do I align equations in three columns, justified right, center and left?

Where is the logic in castrating fighters?

Full backup on database creation

Are there situations when self-assignment is useful?

Can you heal a summoned creature?

What do different value notes on the same line mean?

When and what was the first 3D acceleration device ever released?



LASSO Regression - p-values and coefficients


Standard errors for lasso prediction using RUsing regularization when doing statistical inferenceTesting for coefficients significance in Lasso logistic regressionExplanation for unstable lasso regression coefficients?How to handle with missing values in order to prepare data for feature selection with LASSO?Interpretation of coefficients of glmnet - LASSO/Cox model?Difference between trace plot and cv.glmnetInterpretation of LASSO regression coefficientsTesting for coefficients significance in Lasso logistic regressionA generalized LASSO constraintRelaxed Lasso Logistic Regression: Estimating second penalty parameterWhy Lasso classification results changes in Matlab?Implementing Lasso Regression in Numpy






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$











  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    4 hours ago

















3












$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$











  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    4 hours ago













3












3








3





$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$




I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?







p-value lasso






share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|cite|improve this question




share|cite|improve this question








edited 9 hours ago









Richard Hardy

28.6k647134




28.6k647134






New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 12 hours ago









jpryan28jpryan28

162




162




New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    4 hours ago
















  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    4 hours ago















$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago




$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago












$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
4 hours ago




$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
4 hours ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






share|cite|improve this answer









$endgroup$




















    1












    $begingroup$

    Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



    library(ISLR)
    library(glmnet)
    ds <- na.omit(Hitters)
    X <- as.matrix(ds[,1:10])
    lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
    intercept=TRUE, alpha=1, nfolds=nrow(ds),
    parallel = T)
    opt_lam <- lM_LASSO$
    lambda.min
    lM_LASSO <- glmnet(X,y = log(ds$Salary),
    intercept=TRUE, alpha=1, lambda = opt_lam)
    W <- as.matrix(coef(lM_LASSO))
    W

    1
    (Intercept) 4.5630727825
    AtBat -0.0021567122
    Hits 0.0115095746
    HmRun 0.0055676901
    Runs 0.0003147141
    RBI 0.0001307846
    Walks 0.0069978218
    Years 0.0485039070
    CHits 0.0003636287

    keep_X <- rownames(W)[W!=0]
    keep_X <- keep_X[!keep_X == "(Intercept)"]
    X <- X[,keep_X]
    summary(lm(log(ds$Salary)~X))

    Call:
    lm(formula = log(ds$Salary) ~ X)

    Residuals:
    Min 1Q Median 3Q Max
    -2.23409 -0.45747 0.06435 0.40762 3.02005

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
    XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
    XHits 0.0126216 0.0039645 3.184 0.00164 **
    XHmRun 0.0057538 0.0103619 0.555 0.57919
    XRuns 0.0003510 0.0048428 0.072 0.94228
    XRBI 0.0002455 0.0045771 0.054 0.95727
    XWalks 0.0072372 0.0026936 2.687 0.00769 **
    XYears 0.0487293 0.0206030 2.365 0.01877 *
    XCHits 0.0003622 0.0001564 2.316 0.02138 *
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.6251 on 254 degrees of freedom
    Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
    F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


    Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



    stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
    ===============================================
    Dependent variable:
    ---------------------------
    Salary)
    -----------------------------------------------
    XAtBat -0.003**
    (0.001)

    XHits 0.013***
    (0.004)

    XHmRun 0.006
    (0.010)

    XRuns 0.0004
    (0.005)

    XRBI 0.0002
    (0.005)

    XWalks 0.007***
    (0.003)

    XYears 0.049**
    (0.021)

    XCHits 0.0004**
    (0.0002)

    Constant 4.580***
    (0.156)

    -----------------------------------------------
    Observations 263
    R2 0.521
    Adjusted R2 0.506
    Residual Std. Error 0.625 (df = 254)
    F Statistic 34.521*** (df = 8; 254)
    ===============================================
    Note: *p<0.1; **p<0.05; ***p<0.01


    Bootstrap



    Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



    library(boot)

    W_boot <- function(ds, indices)
    ds_boot <- ds[indices,]
    X <- as.matrix(ds_boot[,1:10])
    y <- log(ds$Salary)
    lM_LASSO <- glmnet(X,y = log(ds$
    Salary),
    intercept=TRUE, alpha=1, lambda = opt_lam)
    W <- as.matrix(coef(lM_LASSO))
    return(W)


    results <- boot(data=ds, statistic=W_boot,
    R=10000)

    se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
    se2 <- apply(results$t,2,sd)
    se2 <- se2[W!=0]
    plot(se2~se1)
    abline(a=0,b=1)


    enter image description here
    There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






    share|cite|improve this answer











    $endgroup$








    • 4




      $begingroup$
      I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
      $endgroup$
      – Ben Bolker
      7 hours ago










    • $begingroup$
      Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
      $endgroup$
      – majeed simaan
      5 hours ago











    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



    These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



    Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






    share|cite|improve this answer









    $endgroup$

















      2












      $begingroup$

      To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



      These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



      Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






      share|cite|improve this answer









      $endgroup$















        2












        2








        2





        $begingroup$

        To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



        These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



        Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






        share|cite|improve this answer









        $endgroup$



        To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



        These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



        Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 5 hours ago









        EdMEdM

        23.4k234101




        23.4k234101























            1












            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices)
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)


            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$








            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago















            1












            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices)
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)


            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$








            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago













            1












            1








            1





            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices)
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)


            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$



            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices)
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)


            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 5 hours ago

























            answered 7 hours ago









            majeed simaanmajeed simaan

            1464




            1464







            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago












            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago







            4




            4




            $begingroup$
            I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
            $endgroup$
            – Ben Bolker
            7 hours ago




            $begingroup$
            I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
            $endgroup$
            – Ben Bolker
            7 hours ago












            $begingroup$
            Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
            $endgroup$
            – majeed simaan
            5 hours ago




            $begingroup$
            Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
            $endgroup$
            – majeed simaan
            5 hours ago










            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.












            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.











            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

            Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

            Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її