Can AIC be used on out-of-sample data in cross-validation to select a model over another?How can one empirically demonstrate in R which cross-validation methods the AIC and BIC are equivalent to?AIC, BIC and GCV: what is best for making decision in penalized regression methods?AIC: relative versus absolute predictive errorAIC versus cross validation in time series: the small sample caseTime series model selection: AIC vs. out-of-sample SSE and their equivalencecross-validation and over-fittingAIC, model selection and overfittingBetter AIC but worse cross validation error rate

Tikz diagonal filling pattern

How to tension rope between two trees?

How to calculate Limit of this sequence

How is the speed of nucleons in the nucleus measured?

Quote to show students don't have to fear making mistakes

Scorched receptacle

How to catch creatures that can predict the next few minutes?

Anonymous reviewer disclosed his identity. Should I thank him by name?

how would i use rm to delete all files without certain wildcard?

How are characteristic classes morphisms of infinite loop spaces? (if they are)

Is Zhent just the term for any member of the Zhentarim?

How to print variable value in next line using echo command

Can 35 mm film which went through a washing machine still be developed?

What is the origin of the minced oath “Jiminy”?

Why Does this Limit as V approaches Infinity Equal Zero

I've been fired, was allowed to announce it as if I quit and given extra notice, how to handle the questions?

Was there an autocomplete utility in MS-DOS?

Coffee Grounds and Gritty Butter Cream Icing

Has Boris Johnson ever referred to any of his opponents as "traitors"?

Use floats or doubles when writing mobile games

Why do many websites hide input when entering a OTP

Using the Grappler feat, can you grapple and pin (restrain) in the same action?

Why didn't Trudy wear a breathing mask in Avatar?

Are there manual immigration checks for non EU citizens in airports when travelling inside the EU?



Can AIC be used on out-of-sample data in cross-validation to select a model over another?


How can one empirically demonstrate in R which cross-validation methods the AIC and BIC are equivalent to?AIC, BIC and GCV: what is best for making decision in penalized regression methods?AIC: relative versus absolute predictive errorAIC versus cross validation in time series: the small sample caseTime series model selection: AIC vs. out-of-sample SSE and their equivalencecross-validation and over-fittingAIC, model selection and overfittingBetter AIC but worse cross validation error rate






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









1












$begingroup$


Following Gelman's 2017 publication entitled "Understanding predictive information criteria for Bayesian models" I understand cross-validation and information criteria (Bayesian information criterion and Akaike's information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC).










share|cite|improve this question









$endgroup$




















    1












    $begingroup$


    Following Gelman's 2017 publication entitled "Understanding predictive information criteria for Bayesian models" I understand cross-validation and information criteria (Bayesian information criterion and Akaike's information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC).










    share|cite|improve this question









    $endgroup$
















      1












      1








      1


      1



      $begingroup$


      Following Gelman's 2017 publication entitled "Understanding predictive information criteria for Bayesian models" I understand cross-validation and information criteria (Bayesian information criterion and Akaike's information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC).










      share|cite|improve this question









      $endgroup$




      Following Gelman's 2017 publication entitled "Understanding predictive information criteria for Bayesian models" I understand cross-validation and information criteria (Bayesian information criterion and Akaike's information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC).







      cross-validation modeling model-selection aic






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 8 hours ago









      ArmanArman

      1268 bronze badges




      1268 bronze badges























          1 Answer
          1






          active

          oldest

          votes


















          2














          $begingroup$


          Can AIC and BIC be used on the left-out fold in a 10-fold cross-validation?




          No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.



          1. AIC measures the log-likelihood of the entire sample at once, based on parameters estimated using the entire sample, and subsequently adjusts for overfitting (which occurs when estimating log-likelihood on new data by log-likelihood of the same sample on which the estimation was done) via $p$ in $textAIC=-2(textloglik-p)$. Here $textloglik$ is the log-likelihood of the sample data according to the model and $p$ is the number of the model's degrees of freedom (a measure of the model's flexibility);

          2. CV measures the log-likelihood on hold-out subsamples based on parameters estimated on training subsamples. Hence, there is no overfitting unlike the case of AIC.** Therefore, there is no need for replacing the CV estimates of the log-likelihood on hold-out subsamples (folds) by penalized log-likelihood such as AIC.

          Analogous logic holds for BIC.



          *CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.



          **Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.






          share|cite|improve this answer











          $endgroup$
















            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "65"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );














            draft saved

            draft discarded
















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f429126%2fcan-aic-be-used-on-out-of-sample-data-in-cross-validation-to-select-a-model-over%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            $begingroup$


            Can AIC and BIC be used on the left-out fold in a 10-fold cross-validation?




            No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.



            1. AIC measures the log-likelihood of the entire sample at once, based on parameters estimated using the entire sample, and subsequently adjusts for overfitting (which occurs when estimating log-likelihood on new data by log-likelihood of the same sample on which the estimation was done) via $p$ in $textAIC=-2(textloglik-p)$. Here $textloglik$ is the log-likelihood of the sample data according to the model and $p$ is the number of the model's degrees of freedom (a measure of the model's flexibility);

            2. CV measures the log-likelihood on hold-out subsamples based on parameters estimated on training subsamples. Hence, there is no overfitting unlike the case of AIC.** Therefore, there is no need for replacing the CV estimates of the log-likelihood on hold-out subsamples (folds) by penalized log-likelihood such as AIC.

            Analogous logic holds for BIC.



            *CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.



            **Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.






            share|cite|improve this answer











            $endgroup$



















              2














              $begingroup$


              Can AIC and BIC be used on the left-out fold in a 10-fold cross-validation?




              No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.



              1. AIC measures the log-likelihood of the entire sample at once, based on parameters estimated using the entire sample, and subsequently adjusts for overfitting (which occurs when estimating log-likelihood on new data by log-likelihood of the same sample on which the estimation was done) via $p$ in $textAIC=-2(textloglik-p)$. Here $textloglik$ is the log-likelihood of the sample data according to the model and $p$ is the number of the model's degrees of freedom (a measure of the model's flexibility);

              2. CV measures the log-likelihood on hold-out subsamples based on parameters estimated on training subsamples. Hence, there is no overfitting unlike the case of AIC.** Therefore, there is no need for replacing the CV estimates of the log-likelihood on hold-out subsamples (folds) by penalized log-likelihood such as AIC.

              Analogous logic holds for BIC.



              *CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.



              **Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.






              share|cite|improve this answer











              $endgroup$

















                2














                2










                2







                $begingroup$


                Can AIC and BIC be used on the left-out fold in a 10-fold cross-validation?




                No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.



                1. AIC measures the log-likelihood of the entire sample at once, based on parameters estimated using the entire sample, and subsequently adjusts for overfitting (which occurs when estimating log-likelihood on new data by log-likelihood of the same sample on which the estimation was done) via $p$ in $textAIC=-2(textloglik-p)$. Here $textloglik$ is the log-likelihood of the sample data according to the model and $p$ is the number of the model's degrees of freedom (a measure of the model's flexibility);

                2. CV measures the log-likelihood on hold-out subsamples based on parameters estimated on training subsamples. Hence, there is no overfitting unlike the case of AIC.** Therefore, there is no need for replacing the CV estimates of the log-likelihood on hold-out subsamples (folds) by penalized log-likelihood such as AIC.

                Analogous logic holds for BIC.



                *CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.



                **Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.






                share|cite|improve this answer











                $endgroup$




                Can AIC and BIC be used on the left-out fold in a 10-fold cross-validation?




                No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.



                1. AIC measures the log-likelihood of the entire sample at once, based on parameters estimated using the entire sample, and subsequently adjusts for overfitting (which occurs when estimating log-likelihood on new data by log-likelihood of the same sample on which the estimation was done) via $p$ in $textAIC=-2(textloglik-p)$. Here $textloglik$ is the log-likelihood of the sample data according to the model and $p$ is the number of the model's degrees of freedom (a measure of the model's flexibility);

                2. CV measures the log-likelihood on hold-out subsamples based on parameters estimated on training subsamples. Hence, there is no overfitting unlike the case of AIC.** Therefore, there is no need for replacing the CV estimates of the log-likelihood on hold-out subsamples (folds) by penalized log-likelihood such as AIC.

                Analogous logic holds for BIC.



                *CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.



                **Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited 7 hours ago

























                answered 7 hours ago









                Richard HardyRichard Hardy

                30k6 gold badges51 silver badges144 bronze badges




                30k6 gold badges51 silver badges144 bronze badges































                    draft saved

                    draft discarded















































                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f429126%2fcan-aic-be-used-on-out-of-sample-data-in-cross-validation-to-select-a-model-over%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                    Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                    Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її