Pandas aggregate with dynamic column namesSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
Nanomachines exist that enable Axolotl-levels of regeneration - So how can crippling injuries exist as well?
Would Taiwan and China's dispute be solved if Taiwan gave up being the Republic of China?
What can a pilot do if an air traffic controller is incapacitated?
What is the need of methods like GET and POST in the HTTP protocol?
US entry with tourist visa but past alcohol arrest
Paradox regarding phase transitions in relativistic systems
What do solvers like Gurobi and CPLEX do when they run into hard instances of MIP
Are actors contractually obligated to certain things like going nude/ Sensual Scenes/ Gory Scenes?
Where Does VDD+0.3V Input Limit Come From on IC chips?
How to fix folder structure in Windows 7 and 10
Is it really necessary to have 4 hours meeting in Sprint planning?
Create a magic square of 4-digit numbers
Is this a Sherman, and if so what model?
Simulate a 1D Game-of-Life-ish Model
How can I prevent soul energy from dissipating?
How to make interviewee comfortable interviewing in lounge chairs
What did the controller say during my approach to land (audio clip)?
How to ask a man to not take up more than one seat on public transport while avoiding conflict?
Install specific version and arch, without specifying the release
Hiking with a mule or two?
Why are some of the Stunts in The Expanse RPG labelled 'Core'?
Asking an expert in your field that you have never met to review your manuscript
Apple Developer Program Refund Help
Gas leaking in base of new gas range?
Pandas aggregate with dynamic column names
Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be
import pandas as pd
df = pd.DataFrame(
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5],
'val2' : [4, 2, 8, 5, 7]
)
group group_color val1 val2
0 A green 5 4
1 A green 2 2
2 A green 3 8
3 B blue 4 5
4 B blue 5 7
My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use
df.groupby('group').agg("group_color": "first", "val1": "mean", "val2": "mean")
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
but that does not work when the data frame in question has more value columns (val3, val4 etc.).
Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?
python pandas aggregate pandas-groupby
add a comment
|
I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be
import pandas as pd
df = pd.DataFrame(
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5],
'val2' : [4, 2, 8, 5, 7]
)
group group_color val1 val2
0 A green 5 4
1 A green 2 2
2 A green 3 8
3 B blue 4 5
4 B blue 5 7
My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use
df.groupby('group').agg("group_color": "first", "val1": "mean", "val2": "mean")
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
but that does not work when the data frame in question has more value columns (val3, val4 etc.).
Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?
python pandas aggregate pandas-groupby
is group_color always the same for one group?
– Quang Hoang
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago
add a comment
|
I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be
import pandas as pd
df = pd.DataFrame(
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5],
'val2' : [4, 2, 8, 5, 7]
)
group group_color val1 val2
0 A green 5 4
1 A green 2 2
2 A green 3 8
3 B blue 4 5
4 B blue 5 7
My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use
df.groupby('group').agg("group_color": "first", "val1": "mean", "val2": "mean")
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
but that does not work when the data frame in question has more value columns (val3, val4 etc.).
Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?
python pandas aggregate pandas-groupby
I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be
import pandas as pd
df = pd.DataFrame(
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5],
'val2' : [4, 2, 8, 5, 7]
)
group group_color val1 val2
0 A green 5 4
1 A green 2 2
2 A green 3 8
3 B blue 4 5
4 B blue 5 7
My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use
df.groupby('group').agg("group_color": "first", "val1": "mean", "val2": "mean")
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
but that does not work when the data frame in question has more value columns (val3, val4 etc.).
Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?
python pandas aggregate pandas-groupby
python pandas aggregate pandas-groupby
edited 16 mins ago
John Conde
191k84 gold badges383 silver badges436 bronze badges
191k84 gold badges383 silver badges436 bronze badges
asked 9 hours ago
MartijnVanAttekumMartijnVanAttekum
7423 silver badges14 bronze badges
7423 silver badges14 bronze badges
is group_color always the same for one group?
– Quang Hoang
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago
add a comment
|
is group_color always the same for one group?
– Quang Hoang
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago
is group_color always the same for one group?
– Quang Hoang
9 hours ago
is group_color always the same for one group?
– Quang Hoang
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago
add a comment
|
5 Answers
5
active
oldest
votes
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
add a comment
|
If your group_color is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg:
agg_dict = f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
1
You'repivot_tableanswer is the way to go. I used almost the same thing but added areset_index.
– piRSquared
6 hours ago
add a comment
|
Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.
The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:
(df.filter(regex=r'^val').groupby(df.group).mean()
.assign(color = df.group_color.groupby(df.group).first()))
val1 val2 color
group
A 3.333333 4.666667 green
B 4.500000 6.000000 blue
add a comment
|
You can go with 2 dictionaries that you can combine like this:
df.groupby('group').agg(**'group_color': 'first', **c: 'mean' for c in df.columns if c.startswith('val'))
In this case you have one dict with fixed aggregations and other with dynamic column selection.
add a comment
|
Per OP's comment

We can group by both 'group' and 'group_color' without the risk of there being more than one unique 'group_color' per 'group'
Consequently:
df.groupby(['group', 'group_color']).mean().reset_index(level=1)
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57994290%2fpandas-aggregate-with-dynamic-column-names%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
add a comment
|
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
add a comment
|
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
answered 9 hours ago
WeNYoBenWeNYoBen
159k8 gold badges55 silver badges86 bronze badges
159k8 gold badges55 silver badges86 bronze badges
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
add a comment
|
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
nice solution! Can you explain why the dtype of the non-numeric columns is object?
– MartijnVanAttekum
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
@MartijnVanAttekum this is a dtype in panda, string and others all classified as object
– WeNYoBen
8 hours ago
add a comment
|
If your group_color is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg:
agg_dict = f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
1
You'repivot_tableanswer is the way to go. I used almost the same thing but added areset_index.
– piRSquared
6 hours ago
add a comment
|
If your group_color is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg:
agg_dict = f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
1
You'repivot_tableanswer is the way to go. I used almost the same thing but added areset_index.
– piRSquared
6 hours ago
add a comment
|
If your group_color is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg:
agg_dict = f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
If your group_color is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg:
agg_dict = f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
answered 9 hours ago
Quang HoangQuang Hoang
17.9k2 gold badges14 silver badges27 bronze badges
17.9k2 gold badges14 silver badges27 bronze badges
1
You'repivot_tableanswer is the way to go. I used almost the same thing but added areset_index.
– piRSquared
6 hours ago
add a comment
|
1
You'repivot_tableanswer is the way to go. I used almost the same thing but added areset_index.
– piRSquared
6 hours ago
1
1
You're
pivot_table answer is the way to go. I used almost the same thing but added a reset_index.– piRSquared
6 hours ago
You're
pivot_table answer is the way to go. I used almost the same thing but added a reset_index.– piRSquared
6 hours ago
add a comment
|
Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.
The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:
(df.filter(regex=r'^val').groupby(df.group).mean()
.assign(color = df.group_color.groupby(df.group).first()))
val1 val2 color
group
A 3.333333 4.666667 green
B 4.500000 6.000000 blue
add a comment
|
Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.
The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:
(df.filter(regex=r'^val').groupby(df.group).mean()
.assign(color = df.group_color.groupby(df.group).first()))
val1 val2 color
group
A 3.333333 4.666667 green
B 4.500000 6.000000 blue
add a comment
|
Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.
The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:
(df.filter(regex=r'^val').groupby(df.group).mean()
.assign(color = df.group_color.groupby(df.group).first()))
val1 val2 color
group
A 3.333333 4.666667 green
B 4.500000 6.000000 blue
Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.
The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:
(df.filter(regex=r'^val').groupby(df.group).mean()
.assign(color = df.group_color.groupby(df.group).first()))
val1 val2 color
group
A 3.333333 4.666667 green
B 4.500000 6.000000 blue
edited 9 hours ago
answered 9 hours ago
yatuyatu
33.2k6 gold badges26 silver badges58 bronze badges
33.2k6 gold badges26 silver badges58 bronze badges
add a comment
|
add a comment
|
You can go with 2 dictionaries that you can combine like this:
df.groupby('group').agg(**'group_color': 'first', **c: 'mean' for c in df.columns if c.startswith('val'))
In this case you have one dict with fixed aggregations and other with dynamic column selection.
add a comment
|
You can go with 2 dictionaries that you can combine like this:
df.groupby('group').agg(**'group_color': 'first', **c: 'mean' for c in df.columns if c.startswith('val'))
In this case you have one dict with fixed aggregations and other with dynamic column selection.
add a comment
|
You can go with 2 dictionaries that you can combine like this:
df.groupby('group').agg(**'group_color': 'first', **c: 'mean' for c in df.columns if c.startswith('val'))
In this case you have one dict with fixed aggregations and other with dynamic column selection.
You can go with 2 dictionaries that you can combine like this:
df.groupby('group').agg(**'group_color': 'first', **c: 'mean' for c in df.columns if c.startswith('val'))
In this case you have one dict with fixed aggregations and other with dynamic column selection.
answered 9 hours ago
zipazipa
18.7k4 gold badges19 silver badges39 bronze badges
18.7k4 gold badges19 silver badges39 bronze badges
add a comment
|
add a comment
|
Per OP's comment

We can group by both 'group' and 'group_color' without the risk of there being more than one unique 'group_color' per 'group'
Consequently:
df.groupby(['group', 'group_color']).mean().reset_index(level=1)
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
add a comment
|
Per OP's comment

We can group by both 'group' and 'group_color' without the risk of there being more than one unique 'group_color' per 'group'
Consequently:
df.groupby(['group', 'group_color']).mean().reset_index(level=1)
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
add a comment
|
Per OP's comment

We can group by both 'group' and 'group_color' without the risk of there being more than one unique 'group_color' per 'group'
Consequently:
df.groupby(['group', 'group_color']).mean().reset_index(level=1)
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
Per OP's comment

We can group by both 'group' and 'group_color' without the risk of there being more than one unique 'group_color' per 'group'
Consequently:
df.groupby(['group', 'group_color']).mean().reset_index(level=1)
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
answered 6 hours ago
piRSquaredpiRSquared
178k26 gold badges196 silver badges355 bronze badges
178k26 gold badges196 silver badges355 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57994290%2fpandas-aggregate-with-dynamic-column-names%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
is group_color always the same for one group?
– Quang Hoang
9 hours ago
@QuangHoang: yes, that is the case, but I would still like to retain it
– MartijnVanAttekum
9 hours ago