GroupBy operation using an entire dataframe to group valuesDoes Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
What is the maximum number of net attacks that one can make in a round?
Is an entry level DSLR going to shoot nice portrait pictures?
Why does the Mishnah use the terms poor person and homeowner when discussing carrying on Shabbat?
Non-aqueous eyes?
Meaning of 'lose their grip on the groins of their followers'
Does Disney no longer produce hand-drawn cartoon films?
Can I utilise a baking stone to make crepes?
Finding value of expression with roots of a given polynomial.
Why does Sin[b-a] simplify to -Sin[a-b]?
How to ensure color fidelity of the same file on two computers?
Is it safe to change the harddrive power feature so that it never turns off?
Fermat's statement about the ancients: How serious was he?
Live action TV show where High school Kids go into the virtual world and have to clear levels
Is it a bad idea to to run 24 tap and shock lands in standard
Electricity free spaceship
Traversing Oceania: A Cryptic Journey
Artificer Creativity
Why 1,2 printed by a command in $() is not interpolated?
Writing an augmented sixth chord on the flattened supertonic
Why can my keyboard only digest 6 keypresses at a time?
Are polynomials with the same roots identical?
How did old MS-DOS games utilize various graphic cards?
Is it possible to fly backward if you have 'really strong' headwind?
Cascading Switches. Will it affect performance?
GroupBy operation using an entire dataframe to group values
Does Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a
for the 4 groups in b
.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
add a comment |
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a
for the 4 groups in b
.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago
add a comment |
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a
for the 4 groups in b
.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
I have 2 dataframes like this...
np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))
I'd like to find the average of values in a
for the 4 groups in b
.
This...
a[b==1].sum().sum() / a[b==1].count().sum()
...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.
My expected result is
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
Thanks.
python pandas group-by pandas-groupby
python pandas group-by pandas-groupby
edited 8 hours ago
cs95
150k26195267
150k26195267
asked 9 hours ago
MJSMJS
5331819
5331819
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago
add a comment |
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago
add a comment |
2 Answers
2
active
oldest
votes
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
add a comment |
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-n
present. I think a fix would be something likef = np.ones(u.max())
, and thenf[u-1] = c
to divide by that instead
– user3483203
9 hours ago
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]
and proceed as planned!
– cs95
9 hours ago
2
You can safeguard withreturn_inverse
...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
great answer. thanks cs.
– MJS
8 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
add a comment |
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
add a comment |
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
answered 9 hours ago
WeNYoBenWeNYoBen
138k84676
138k84676
add a comment |
add a comment |
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-n
present. I think a fix would be something likef = np.ones(u.max())
, and thenf[u-1] = c
to divide by that instead
– user3483203
9 hours ago
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]
and proceed as planned!
– cs95
9 hours ago
2
You can safeguard withreturn_inverse
...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
great answer. thanks cs.
– MJS
8 hours ago
add a comment |
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3
Great answer, worth noting that this will fail if you don't have every group from1-n
present. I think a fix would be something likef = np.ones(u.max())
, and thenf[u-1] = c
to divide by that instead
– user3483203
9 hours ago
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]
and proceed as planned!
– cs95
9 hours ago
2
You can safeguard withreturn_inverse
...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
great answer. thanks cs.
– MJS
8 hours ago
add a comment |
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
edited 8 hours ago
answered 9 hours ago
cs95cs95
150k26195267
150k26195267
3
Great answer, worth noting that this will fail if you don't have every group from1-n
present. I think a fix would be something likef = np.ones(u.max())
, and thenf[u-1] = c
to divide by that instead
– user3483203
9 hours ago
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]
and proceed as planned!
– cs95
9 hours ago
2
You can safeguard withreturn_inverse
...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
great answer. thanks cs.
– MJS
8 hours ago
add a comment |
3
Great answer, worth noting that this will fail if you don't have every group from1-n
present. I think a fix would be something likef = np.ones(u.max())
, and thenf[u-1] = c
to divide by that instead
– user3483203
9 hours ago
3
@user3483203 That's true. In that case we'd have to call bincount onpd.factorize(b.values.ravel())[0]
and proceed as planned!
– cs95
9 hours ago
2
You can safeguard withreturn_inverse
...u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
great answer. thanks cs.
– MJS
8 hours ago
3
3
Great answer, worth noting that this will fail if you don't have every group from
1-n
present. I think a fix would be something like f = np.ones(u.max())
, and then f[u-1] = c
to divide by that instead– user3483203
9 hours ago
Great answer, worth noting that this will fail if you don't have every group from
1-n
present. I think a fix would be something like f = np.ones(u.max())
, and then f[u-1] = c
to divide by that instead– user3483203
9 hours ago
3
3
@user3483203 That's true. In that case we'd have to call bincount on
pd.factorize(b.values.ravel())[0]
and proceed as planned!– cs95
9 hours ago
@user3483203 That's true. In that case we'd have to call bincount on
pd.factorize(b.values.ravel())[0]
and proceed as planned!– cs95
9 hours ago
2
2
You can safeguard with
return_inverse
... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
You can safeguard with
return_inverse
... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)
– piRSquared
8 hours ago
1
1
great answer. thanks cs.
– MJS
8 hours ago
great answer. thanks cs.
– MJS
8 hours ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you please post some expected results? Right now I assume you need 4 values
– BogdanC
9 hours ago