How valuable is a categorical feature that has a predominant category over all other ones?Is automatic feature detection feasible?How to train neural network that has different kind of layersHow to find splits in data so that each split has equal weighting according to function fHow can I make a prediction in a regression model if a category has not been observed already?How to continue incremental learning when a categorical variable has been assigned additional category labels?How to implement feature selection for categorical variables (especially with many categories)?ML Models: How to handle categorical feature with over 1000 unique valuesClassification: how to handle reviews/long english words in feature set with all other numerical featureshow to deal with two high correlations feature which both has a low correlation with targetHow to do feature engineering to the stripplot where the target, `tradeMoney`, has obviously lower than 5000 when 'rentType' is 'shared_rent'?
Are there examples of rowers who also fought?
Using roof rails to set up hammock
Counterfeit check
If the mass of the Earth is decreasing by sending debris in space, does its angular momentum also decrease?
Is there a polite way to ask about one's ethnicity?
In windows systems, is renaming files functionally similar to deleting them?
Does knowing the surface area of all faces uniquely determine a tetrahedron?
How much steel armor can you wear and still be able to swim?
Why we can't jump without bending our knees?
How "fast" do astronomical events occur?
What kind of chart is this?
Digital signature that is only verifiable by one specific person
Leaving job close to major deadlines
How can I maintain game balance while allowing my player to craft genuinely useful items?
What is the precise meaning of "подсел на мак"?
Should I email my professor to clear up a (possibly very irrelevant) awkward misunderstanding?
Is there any possible way to get these hearts as Adult Link?
Are there foreign customs agents on US soil?
How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?
I just entered the USA without passport control at Atlanta airport
How to avoid offending original culture when making conculture inspired from original
How to write a nice frame challenge?
Credit card validation in C
Is a sequel allowed to start before the end of the first book?
How valuable is a categorical feature that has a predominant category over all other ones?
Is automatic feature detection feasible?How to train neural network that has different kind of layersHow to find splits in data so that each split has equal weighting according to function fHow can I make a prediction in a regression model if a category has not been observed already?How to continue incremental learning when a categorical variable has been assigned additional category labels?How to implement feature selection for categorical variables (especially with many categories)?ML Models: How to handle categorical feature with over 1000 unique valuesClassification: how to handle reviews/long english words in feature set with all other numerical featureshow to deal with two high correlations feature which both has a low correlation with targetHow to do feature engineering to the stripplot where the target, `tradeMoney`, has obviously lower than 5000 when 'rentType' is 'shared_rent'?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?
machine-learning data-mining feature-selection data-cleaning feature-engineering
$endgroup$
add a comment |
$begingroup$
Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?
machine-learning data-mining feature-selection data-cleaning feature-engineering
$endgroup$
add a comment |
$begingroup$
Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?
machine-learning data-mining feature-selection data-cleaning feature-engineering
$endgroup$
Is a categorical feature that has almost equally distributed in it's category more important or the one which one of it's category is predominant over all other ones?
In data prepossessing step for "House Price" competition, I want to decide if Street feature is important or I can drop it from data set in order to avoid over-fitting. So I have plotted a swarm-plot as follows:
How should I interpret this plot? Does it show that the Street plot can be dropped, or it says it is valuable for creating a model?
machine-learning data-mining feature-selection data-cleaning feature-engineering
machine-learning data-mining feature-selection data-cleaning feature-engineering
asked 8 hours ago
Ali Majed HAAli Majed HA
1236
1236
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53860%2fhow-valuable-is-a-categorical-feature-that-has-a-predominant-category-over-all-o%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.
$endgroup$
add a comment |
$begingroup$
Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.
$endgroup$
add a comment |
$begingroup$
Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.
$endgroup$
Feature importance is an empirical question. Train a model with the feature in it. Train another model without the feature in it. Then see which model does better when predicting new data, aka the test dataset. The difference between model performance on train and test datasets is one way to measure overfitting.
answered 8 hours ago
Brian SpieringBrian Spiering
4,6861130
4,6861130
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53860%2fhow-valuable-is-a-categorical-feature-that-has-a-predominant-category-over-all-o%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown