How valuable is a categorical feature that has a predominant category over all other ones?Is automatic feature detection feasible?How to train neural network that has different kind of layersHow to find splits in data so that each split has equal weighting according to function fHow can I make a prediction in a regression model if a category has not been observed already?How to continue incremental learning when a categorical variable has been assigned additional category labels?How to implement feature selection for categorical variables (especially with many categories)?ML Models: How to handle categorical feature with over 1000 unique valuesClassification: how to handle reviews/long english words in feature set with all other numerical featureshow to deal with two high correlations feature which both has a low correlation with targetHow to do feature engineering to the stripplot where the target, `tradeMoney`, has obviously lower than 5000 when 'rentType' is 'shared_rent'?

Are there examples of rowers who also fought?

Using roof rails to set up hammock

Counterfeit check

If the mass of the Earth is decreasing by sending debris in space, does its angular momentum also decrease?

Is there a polite way to ask about one's ethnicity?

In windows systems, is renaming files functionally similar to deleting them?

Does knowing the surface area of all faces uniquely determine a tetrahedron?

How much steel armor can you wear and still be able to swim?

Why we can't jump without bending our knees?

How "fast" do astronomical events occur?

What kind of chart is this?

Digital signature that is only verifiable by one specific person

Leaving job close to major deadlines

How can I maintain game balance while allowing my player to craft genuinely useful items?

What is the precise meaning of "подсел на мак"?

Should I email my professor to clear up a (possibly very irrelevant) awkward misunderstanding?

Is there any possible way to get these hearts as Adult Link?

Are there foreign customs agents on US soil?

How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?

I just entered the USA without passport control at Atlanta airport

How to avoid offending original culture when making conculture inspired from original

How to write a nice frame challenge?

Credit card validation in C

Is a sequel allowed to start before the end of the first book?



How valuable is a categorical feature that has a predominant category over all other ones?


Is automatic feature detection feasible?How to train neural network that has different kind of layersHow to find splits in data so that each split has equal weighting according to function fHow can I make a prediction in a regression model if a category has not been observed already?How to continue incremental learning when a categorical variable has been assigned additional category labels?How to implement feature selection for categorical variables (especially with many categories)?ML Models: How to handle categorical feature with over 1000 unique valuesClassification: how to handle reviews/long english words in feature set with all other numerical featureshow to deal with two high correlations feature which both has a low correlation with targetHow to do feature engineering to the stripplot where the target, `tradeMoney`, has obviously lower than 5000 when 'rentType' is 'shared_rent'?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
enter image description here

How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?










share|improve this question









$endgroup$


















    2












    $begingroup$


    Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
    In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
    enter image description here

    How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?










    share|improve this question









    $endgroup$














      2












      2








      2





      $begingroup$


      Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
      In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
      enter image description here

      How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?










      share|improve this question









      $endgroup$




      Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
      In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
      enter image description here

      How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?







      machine-learning data-mining feature-selection data-cleaning feature-engineering






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 8 hours ago









      Ali Majed HAAli Majed HA

      1236




      1236




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53860%2fhow-valuable-is-a-categorical-feature-that-has-a-predominant-category-over-all-o%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.






            share|improve this answer









            $endgroup$

















              2












              $begingroup$

              Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.






              share|improve this answer









              $endgroup$















                2












                2








                2





                $begingroup$

                Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.






                share|improve this answer









                $endgroup$



                Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 8 hours ago









                Brian SpieringBrian Spiering

                4,6861130




                4,6861130



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53860%2fhow-valuable-is-a-categorical-feature-that-has-a-predominant-category-over-all-o%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                    Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                    Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її