GroupBy operation using an entire dataframe to group valuesDoes Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas

What is the maximum number of net attacks that one can make in a round?

Is an entry level DSLR going to shoot nice portrait pictures?

Why does the Mishnah use the terms poor person and homeowner when discussing carrying on Shabbat?

Non-aqueous eyes?

Meaning of 'lose their grip on the groins of their followers'

Does Disney no longer produce hand-drawn cartoon films?

Can I utilise a baking stone to make crepes?

Finding value of expression with roots of a given polynomial.

Why does Sin[b-a] simplify to -Sin[a-b]?

How to ensure color fidelity of the same file on two computers?

Is it safe to change the harddrive power feature so that it never turns off?

Fermat's statement about the ancients: How serious was he?

Live action TV show where High school Kids go into the virtual world and have to clear levels

Is it a bad idea to to run 24 tap and shock lands in standard

Electricity free spaceship

Traversing Oceania: A Cryptic Journey

Artificer Creativity

Why 1,2 printed by a command in $() is not interpolated?

Writing an augmented sixth chord on the flattened supertonic

Why can my keyboard only digest 6 keypresses at a time?

Are polynomials with the same roots identical?

How did old MS-DOS games utilize various graphic cards?

Is it possible to fly backward if you have 'really strong' headwind?

Cascading Switches. Will it affect performance?



GroupBy operation using an entire dataframe to group values


Does Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








8















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
























  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago

















8















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
























  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago













8












8








8


1






I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.







python pandas group-by pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 8 hours ago









cs95

150k26195267




150k26195267










asked 9 hours ago









MJSMJS

5331819




5331819












  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago

















  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago
















Can you please post some expected results? Right now I assume you need 4 values

– BogdanC
9 hours ago





Can you please post some expected results? Right now I assume you need 4 values

– BogdanC
9 hours ago












2 Answers
2






active

oldest

votes


















9














You can stack then groupby two Series



a.stack().groupby(b.stack()).mean()





share|improve this answer






























    5














    If you want a fast numpy solution, use np.unique and np.bincount:



    c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
    u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

    np.bincount(i, c) / cnt
    # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


    To construct a Series, use



    pd.Series(np.bincount(i, c) / cnt, index=u)

    1 -0.088715
    2 -0.340043
    3 -0.045596
    4 0.582136
    dtype: float64


    For comparison, stack returns,



    a.stack().groupby(b.stack()).mean()

    1 -0.088715
    2 -0.340043
    3 -0.045596
    4 0.582136
    dtype: float64



    %timeit a.stack().groupby(b.stack()).mean()
    %%timeit
    c, d = (a_.to_numpy().ravel() for a_ in [a, b])
    u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
    np.bincount(i, c) / cnt

    5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





    share|improve this answer




















    • 3





      Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

      – user3483203
      9 hours ago







    • 3





      @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

      – cs95
      9 hours ago







    • 2





      You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

      – piRSquared
      8 hours ago






    • 1





      great answer. thanks cs.

      – MJS
      8 hours ago











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    9














    You can stack then groupby two Series



    a.stack().groupby(b.stack()).mean()





    share|improve this answer



























      9














      You can stack then groupby two Series



      a.stack().groupby(b.stack()).mean()





      share|improve this answer

























        9












        9








        9







        You can stack then groupby two Series



        a.stack().groupby(b.stack()).mean()





        share|improve this answer













        You can stack then groupby two Series



        a.stack().groupby(b.stack()).mean()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 9 hours ago









        WeNYoBenWeNYoBen

        138k84676




        138k84676























            5














            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer




















            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago















            5














            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer




















            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago













            5












            5








            5







            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer















            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 8 hours ago

























            answered 9 hours ago









            cs95cs95

            150k26195267




            150k26195267







            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago












            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago







            3




            3





            Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

            – user3483203
            9 hours ago






            Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

            – user3483203
            9 hours ago





            3




            3





            @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

            – cs95
            9 hours ago






            @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

            – cs95
            9 hours ago





            2




            2





            You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

            – piRSquared
            8 hours ago





            You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

            – piRSquared
            8 hours ago




            1




            1





            great answer. thanks cs.

            – MJS
            8 hours ago





            great answer. thanks cs.

            – MJS
            8 hours ago

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

            Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

            Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її