Which comes first? Multiple Imputation, Splitting into train/test, or Standardization/Normalization

Movie about a boy who was born old and grew young

Should an arbiter claim draw at a K+R vs K+R endgame?

Find duplicated column value in CSV

Was the Tamarian language in "Darmok" inspired by Jack Vance's "The Asutra"?

Words that signal future content

Was the output of the C64 SID chip 8 bit sound?

What's the name of this light airplane?

Compiling c files on ubuntu and using the executable on Windows

Are there downsides to using std::string as a buffer?

Is it possible to 'live off the sea'

Comparing and find out which feature has highest shape area in QGIS?

How do I write "Show, Don't Tell" as a person with Asperger Syndrome?

How would a aircraft visually signal in distress?

Do simulator games use a realistic trajectory to get into orbit?

Find the Factorial From the Given Prime Relationship

If you had a giant cutting disc 60 miles diameter and rotated it 1000 rps, would the edge be traveling faster than light?

How to chain Python function calls so the behaviour is as follows

How Can I Tell The Difference Between Unmarked Sugar and Stevia?

Can a black dragonborn's acid breath weapon destroy objects?

How can drunken, homicidal elves successfully conduct a wild hunt?

What should the arbiter and what should have I done in this case?

What is the giant octopus in the torture chamber for?

HT12e: How is this a 2¹² encoder?

Do any instruments not produce overtones?



Which comes first? Multiple Imputation, Splitting into train/test, or Standardization/Normalization














3












$begingroup$


I am working on a multi-class classification problem, with ~65 features and ~150K instances. 30% of features are categorical and the rest are numerical (continuous). I understand that standardization or normalization should be done after splitting the data into train and test subsets, but I am not still sure about the imputation process. For the classification task, I am planning to use Random Forest, Logistic Regression, and XGBOOST (which are not distance-based).



Could someone please explain which should come first? Split > imputation or imputation>split? In case that split>imputation is correct, should I follow imputation>standardization or standardization>imputation?










share|improve this question











$endgroup$
















    3












    $begingroup$


    I am working on a multi-class classification problem, with ~65 features and ~150K instances. 30% of features are categorical and the rest are numerical (continuous). I understand that standardization or normalization should be done after splitting the data into train and test subsets, but I am not still sure about the imputation process. For the classification task, I am planning to use Random Forest, Logistic Regression, and XGBOOST (which are not distance-based).



    Could someone please explain which should come first? Split > imputation or imputation>split? In case that split>imputation is correct, should I follow imputation>standardization or standardization>imputation?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      I am working on a multi-class classification problem, with ~65 features and ~150K instances. 30% of features are categorical and the rest are numerical (continuous). I understand that standardization or normalization should be done after splitting the data into train and test subsets, but I am not still sure about the imputation process. For the classification task, I am planning to use Random Forest, Logistic Regression, and XGBOOST (which are not distance-based).



      Could someone please explain which should come first? Split > imputation or imputation>split? In case that split>imputation is correct, should I follow imputation>standardization or standardization>imputation?










      share|improve this question











      $endgroup$




      I am working on a multi-class classification problem, with ~65 features and ~150K instances. 30% of features are categorical and the rest are numerical (continuous). I understand that standardization or normalization should be done after splitting the data into train and test subsets, but I am not still sure about the imputation process. For the classification task, I am planning to use Random Forest, Logistic Regression, and XGBOOST (which are not distance-based).



      Could someone please explain which should come first? Split > imputation or imputation>split? In case that split>imputation is correct, should I follow imputation>standardization or standardization>imputation?







      multiclass-classification normalization data-imputation






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 8 hours ago







      Sarah

















      asked 9 hours ago









      SarahSarah

      455




      455




















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          Always split before you do any data pre-processing. Performing pre-processing before splitting will mean that information from your test set will be present during training, causing a data leak.



          Think of it like this, the test set is supposed to be a way of estimating performance on totally unseen data. If it affects the training, then it will be partially seen data.



          I don't think the order of scaling/imputing is as strict. I would impute first if the method might throw of the scaling/centering.



          Your steps should be:



          1. Splitting

          2. Imputing

          3. Scaling

          Here are some related questions to support this:



          Imputation before or after splitting into train and test?



          Imputation of missing data before or after centering and scaling?






          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
            $endgroup$
            – Upper_Case
            8 hours ago










          • $begingroup$
            Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
            $endgroup$
            – Simon Larsson
            8 hours ago











          • $begingroup$
            The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
            $endgroup$
            – Upper_Case
            8 hours ago










          • $begingroup$
            Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
            $endgroup$
            – aranglol
            1 hour ago


















          3












          $begingroup$

          If you impute/standardize before splitting and then split into train/test you are leaking data from your test set (that is supposed to be completely withheld) into your training set. This will yield extremely biased results on model performance.



          The correct way is to split your data first, and to then use imputation/standardization (the order will depend on if the imputation method requires standardization).



          The key here is that you are learning everything from the training set and then "predicting" on to the test set. For nornalization/standardization, you learn the sample mean and sample standard deviation from the training set, treat them as constants, and using these learned values you transform the test set. You don't use the test set mean or the test standard deviation in any of these calculations.



          For imputation the idea is similar. You learn the required parameters from the training set only and then predict the required test set values.



          This way your performance metrics will not be biased optimistically by your methods inadverdently seeing the test set observations.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53138%2fwhich-comes-first-multiple-imputation-splitting-into-train-test-or-standardiz%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            Always split before you do any data pre-processing. Performing pre-processing before splitting will mean that information from your test set will be present during training, causing a data leak.



            Think of it like this, the test set is supposed to be a way of estimating performance on totally unseen data. If it affects the training, then it will be partially seen data.



            I don't think the order of scaling/imputing is as strict. I would impute first if the method might throw of the scaling/centering.



            Your steps should be:



            1. Splitting

            2. Imputing

            3. Scaling

            Here are some related questions to support this:



            Imputation before or after splitting into train and test?



            Imputation of missing data before or after centering and scaling?






            share|improve this answer











            $endgroup$








            • 2




              $begingroup$
              Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
              $endgroup$
              – Simon Larsson
              8 hours ago











            • $begingroup$
              The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
              $endgroup$
              – aranglol
              1 hour ago















            3












            $begingroup$

            Always split before you do any data pre-processing. Performing pre-processing before splitting will mean that information from your test set will be present during training, causing a data leak.



            Think of it like this, the test set is supposed to be a way of estimating performance on totally unseen data. If it affects the training, then it will be partially seen data.



            I don't think the order of scaling/imputing is as strict. I would impute first if the method might throw of the scaling/centering.



            Your steps should be:



            1. Splitting

            2. Imputing

            3. Scaling

            Here are some related questions to support this:



            Imputation before or after splitting into train and test?



            Imputation of missing data before or after centering and scaling?






            share|improve this answer











            $endgroup$








            • 2




              $begingroup$
              Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
              $endgroup$
              – Simon Larsson
              8 hours ago











            • $begingroup$
              The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
              $endgroup$
              – aranglol
              1 hour ago













            3












            3








            3





            $begingroup$

            Always split before you do any data pre-processing. Performing pre-processing before splitting will mean that information from your test set will be present during training, causing a data leak.



            Think of it like this, the test set is supposed to be a way of estimating performance on totally unseen data. If it affects the training, then it will be partially seen data.



            I don't think the order of scaling/imputing is as strict. I would impute first if the method might throw of the scaling/centering.



            Your steps should be:



            1. Splitting

            2. Imputing

            3. Scaling

            Here are some related questions to support this:



            Imputation before or after splitting into train and test?



            Imputation of missing data before or after centering and scaling?






            share|improve this answer











            $endgroup$



            Always split before you do any data pre-processing. Performing pre-processing before splitting will mean that information from your test set will be present during training, causing a data leak.



            Think of it like this, the test set is supposed to be a way of estimating performance on totally unseen data. If it affects the training, then it will be partially seen data.



            I don't think the order of scaling/imputing is as strict. I would impute first if the method might throw of the scaling/centering.



            Your steps should be:



            1. Splitting

            2. Imputing

            3. Scaling

            Here are some related questions to support this:



            Imputation before or after splitting into train and test?



            Imputation of missing data before or after centering and scaling?







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 8 hours ago

























            answered 8 hours ago









            Simon LarssonSimon Larsson

            2,080418




            2,080418







            • 2




              $begingroup$
              Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
              $endgroup$
              – Simon Larsson
              8 hours ago











            • $begingroup$
              The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
              $endgroup$
              – aranglol
              1 hour ago












            • 2




              $begingroup$
              Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
              $endgroup$
              – Simon Larsson
              8 hours ago











            • $begingroup$
              The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
              $endgroup$
              – Upper_Case
              8 hours ago










            • $begingroup$
              Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
              $endgroup$
              – aranglol
              1 hour ago







            2




            2




            $begingroup$
            Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
            $endgroup$
            – Upper_Case
            8 hours ago




            $begingroup$
            Thank you for adding those references, they were very helpful. I am persuaded, and have removed my answer.
            $endgroup$
            – Upper_Case
            8 hours ago












            $begingroup$
            Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
            $endgroup$
            – Simon Larsson
            8 hours ago





            $begingroup$
            Glad it helped, @Upper_Case. I find it odd that ISLR had examples where this was not the case.
            $endgroup$
            – Simon Larsson
            8 hours ago













            $begingroup$
            The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
            $endgroup$
            – Upper_Case
            8 hours ago




            $begingroup$
            The copy I have is a first-printing, so possibly it was updated later, and the example I referenced doesn't deal with imputation, so details may differ with that element. I'm also not clear on how "bad" it is to do it one way versus the other (I agree about the test-training "leakage", which is bad, but post-split data transformation causes arbitrary data segmentation features to "leak" into the model, which is also bad). As I'm not sure which is worse, especially in the general case, I'm deferring to the votes from CrossValidated.SE.
            $endgroup$
            – Upper_Case
            8 hours ago












            $begingroup$
            Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
            $endgroup$
            – aranglol
            1 hour ago




            $begingroup$
            Can you elaborate what "arbitrary data segmentation" features means? Like the training set having a mean/standard deviation that is not reflective of the entire population at whole?
            $endgroup$
            – aranglol
            1 hour ago











            3












            $begingroup$

            If you impute/standardize before splitting and then split into train/test you are leaking data from your test set (that is supposed to be completely withheld) into your training set. This will yield extremely biased results on model performance.



            The correct way is to split your data first, and to then use imputation/standardization (the order will depend on if the imputation method requires standardization).



            The key here is that you are learning everything from the training set and then "predicting" on to the test set. For nornalization/standardization, you learn the sample mean and sample standard deviation from the training set, treat them as constants, and using these learned values you transform the test set. You don't use the test set mean or the test standard deviation in any of these calculations.



            For imputation the idea is similar. You learn the required parameters from the training set only and then predict the required test set values.



            This way your performance metrics will not be biased optimistically by your methods inadverdently seeing the test set observations.






            share|improve this answer









            $endgroup$

















              3












              $begingroup$

              If you impute/standardize before splitting and then split into train/test you are leaking data from your test set (that is supposed to be completely withheld) into your training set. This will yield extremely biased results on model performance.



              The correct way is to split your data first, and to then use imputation/standardization (the order will depend on if the imputation method requires standardization).



              The key here is that you are learning everything from the training set and then "predicting" on to the test set. For nornalization/standardization, you learn the sample mean and sample standard deviation from the training set, treat them as constants, and using these learned values you transform the test set. You don't use the test set mean or the test standard deviation in any of these calculations.



              For imputation the idea is similar. You learn the required parameters from the training set only and then predict the required test set values.



              This way your performance metrics will not be biased optimistically by your methods inadverdently seeing the test set observations.






              share|improve this answer









              $endgroup$















                3












                3








                3





                $begingroup$

                If you impute/standardize before splitting and then split into train/test you are leaking data from your test set (that is supposed to be completely withheld) into your training set. This will yield extremely biased results on model performance.



                The correct way is to split your data first, and to then use imputation/standardization (the order will depend on if the imputation method requires standardization).



                The key here is that you are learning everything from the training set and then "predicting" on to the test set. For nornalization/standardization, you learn the sample mean and sample standard deviation from the training set, treat them as constants, and using these learned values you transform the test set. You don't use the test set mean or the test standard deviation in any of these calculations.



                For imputation the idea is similar. You learn the required parameters from the training set only and then predict the required test set values.



                This way your performance metrics will not be biased optimistically by your methods inadverdently seeing the test set observations.






                share|improve this answer









                $endgroup$



                If you impute/standardize before splitting and then split into train/test you are leaking data from your test set (that is supposed to be completely withheld) into your training set. This will yield extremely biased results on model performance.



                The correct way is to split your data first, and to then use imputation/standardization (the order will depend on if the imputation method requires standardization).



                The key here is that you are learning everything from the training set and then "predicting" on to the test set. For nornalization/standardization, you learn the sample mean and sample standard deviation from the training set, treat them as constants, and using these learned values you transform the test set. You don't use the test set mean or the test standard deviation in any of these calculations.



                For imputation the idea is similar. You learn the required parameters from the training set only and then predict the required test set values.



                This way your performance metrics will not be biased optimistically by your methods inadverdently seeing the test set observations.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 8 hours ago









                aranglolaranglol

                56015




                56015



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53138%2fwhich-comes-first-multiple-imputation-splitting-into-train-test-or-standardiz%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                    Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                    Tom Holland Mục lục Đầu đời và giáo dục | Sự nghiệp | Cuộc sống cá nhân | Phim tham gia | Giải thưởng và đề cử | Chú thích | Liên kết ngoài | Trình đơn chuyển hướngProfile“Person Details for Thomas Stanley Holland, "England and Wales Birth Registration Index, 1837-2008" — FamilySearch.org”"Meet Tom Holland... the 16-year-old star of The Impossible""Schoolboy actor Tom Holland finds himself in Oscar contention for role in tsunami drama"“Naomi Watts on the Prince William and Harry's reaction to her film about the late Princess Diana”lưu trữ"Holland and Pflueger Are West End's Two New 'Billy Elliots'""I'm so envious of my son, the movie star! British writer Dominic Holland's spent 20 years trying to crack Hollywood - but he's been beaten to it by a very unlikely rival"“Richard and Margaret Povey of Jersey, Channel Islands, UK: Information about Thomas Stanley Holland”"Tom Holland to play Billy Elliot""New Billy Elliot leaving the garage"Billy Elliot the Musical - Tom Holland - Billy"A Tale of four Billys: Tom Holland""The Feel Good Factor""Thames Christian College schoolboys join Myleene Klass for The Feelgood Factor""Government launches £600,000 arts bursaries pilot""BILLY's Chapman, Holland, Gardner & Jackson-Keen Visit Prime Minister""Elton John 'blown away' by Billy Elliot fifth birthday" (video with John's interview and fragments of Holland's performance)"First News interviews Arrietty's Tom Holland"“33rd Critics' Circle Film Awards winners”“National Board of Review Current Awards”Bản gốc"Ron Howard Whaling Tale 'In The Heart Of The Sea' Casts Tom Holland"“'Spider-Man' Finds Tom Holland to Star as New Web-Slinger”lưu trữ“Captain America: Civil War (2016)”“Film Review: ‘Captain America: Civil War’”lưu trữ“‘Captain America: Civil War’ review: Choose your own avenger”lưu trữ“The Lost City of Z reviews”“Sony Pictures and Marvel Studios Find Their 'Spider-Man' Star and Director”“‘Mary Magdalene’, ‘Current War’ & ‘Wind River’ Get 2017 Release Dates From Weinstein”“Lionsgate Unleashing Daisy Ridley & Tom Holland Starrer ‘Chaos Walking’ In Cannes”“PTA's 'Master' Leads Chicago Film Critics Nominations, UPDATED: Houston and Indiana Critics Nominations”“Nominaciones Goya 2013 Telecinco Cinema – ENG”“Jameson Empire Film Awards: Martin Freeman wins best actor for performance in The Hobbit”“34th Annual Young Artist Awards”Bản gốc“Teen Choice Awards 2016—Captain America: Civil War Leads Second Wave of Nominations”“BAFTA Film Award Nominations: ‘La La Land’ Leads Race”“Saturn Awards Nominations 2017: 'Rogue One,' 'Walking Dead' Lead”Tom HollandTom HollandTom HollandTom Hollandmedia.gettyimages.comWorldCat Identities300279794no20130442900000 0004 0355 42791085670554170004732cb16706349t(data)XX5557367