GroupBy operation using an entire dataframe to group valuesDoes Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas

What is the maximum number of net attacks that one can make in a round?

Is an entry level DSLR going to shoot nice portrait pictures?

Why does the Mishnah use the terms poor person and homeowner when discussing carrying on Shabbat?

Non-aqueous eyes?

Meaning of 'lose their grip on the groins of their followers'

Does Disney no longer produce hand-drawn cartoon films?

Can I utilise a baking stone to make crepes?

Finding value of expression with roots of a given polynomial.

Why does Sin[b-a] simplify to -Sin[a-b]?

How to ensure color fidelity of the same file on two computers?

Is it safe to change the harddrive power feature so that it never turns off?

Fermat's statement about the ancients: How serious was he?

Live action TV show where High school Kids go into the virtual world and have to clear levels

Is it a bad idea to to run 24 tap and shock lands in standard

Electricity free spaceship

Traversing Oceania: A Cryptic Journey

Artificer Creativity

Why 1,2 printed by a command in $() is not interpolated?

Writing an augmented sixth chord on the flattened supertonic

Why can my keyboard only digest 6 keypresses at a time?

Are polynomials with the same roots identical?

How did old MS-DOS games utilize various graphic cards?

Is it possible to fly backward if you have 'really strong' headwind?

Cascading Switches. Will it affect performance?



GroupBy operation using an entire dataframe to group values


Does Python have a ternary conditional operator?How do I sort a dictionary by value?Using group by on multiple columnsPeak detection in a 2D arraySelect first row in each GROUP BY group?Group by in LINQConverting a Pandas GroupBy output from Series to DataFrameDelete column from pandas DataFrameHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








8















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
























  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago

















8















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
























  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago













8












8








8


1






I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.










share|improve this question
















I have 2 dataframes like this...



np.random.seed(0)
a = pd.DataFrame(np.random.randn(20,3))
b = pd.DataFrame(np.random.randint(1,5,size=(20,3)))


I'd like to find the average of values in a for the 4 groups in b.



This...



a[b==1].sum().sum() / a[b==1].count().sum()


...works for doing one group at a time, but I was wondering if anyone could think of a cleaner method.



My expected result is



1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64


Thanks.







python pandas group-by pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 8 hours ago









cs95

150k26195267




150k26195267










asked 9 hours ago









MJSMJS

5331819




5331819












  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago

















  • Can you please post some expected results? Right now I assume you need 4 values

    – BogdanC
    9 hours ago
















Can you please post some expected results? Right now I assume you need 4 values

– BogdanC
9 hours ago





Can you please post some expected results? Right now I assume you need 4 values

– BogdanC
9 hours ago












2 Answers
2






active

oldest

votes


















9














You can stack then groupby two Series



a.stack().groupby(b.stack()).mean()





share|improve this answer






























    5














    If you want a fast numpy solution, use np.unique and np.bincount:



    c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
    u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

    np.bincount(i, c) / cnt
    # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


    To construct a Series, use



    pd.Series(np.bincount(i, c) / cnt, index=u)

    1 -0.088715
    2 -0.340043
    3 -0.045596
    4 0.582136
    dtype: float64


    For comparison, stack returns,



    a.stack().groupby(b.stack()).mean()

    1 -0.088715
    2 -0.340043
    3 -0.045596
    4 0.582136
    dtype: float64



    %timeit a.stack().groupby(b.stack()).mean()
    %%timeit
    c, d = (a_.to_numpy().ravel() for a_ in [a, b])
    u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
    np.bincount(i, c) / cnt

    5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





    share|improve this answer




















    • 3





      Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

      – user3483203
      9 hours ago







    • 3





      @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

      – cs95
      9 hours ago







    • 2





      You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

      – piRSquared
      8 hours ago






    • 1





      great answer. thanks cs.

      – MJS
      8 hours ago











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    9














    You can stack then groupby two Series



    a.stack().groupby(b.stack()).mean()





    share|improve this answer



























      9














      You can stack then groupby two Series



      a.stack().groupby(b.stack()).mean()





      share|improve this answer

























        9












        9








        9







        You can stack then groupby two Series



        a.stack().groupby(b.stack()).mean()





        share|improve this answer













        You can stack then groupby two Series



        a.stack().groupby(b.stack()).mean()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 9 hours ago









        WeNYoBenWeNYoBen

        138k84676




        138k84676























            5














            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer




















            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago















            5














            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer




















            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago













            5












            5








            5







            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)





            share|improve this answer















            If you want a fast numpy solution, use np.unique and np.bincount:



            c, d = (a_.to_numpy().ravel() for a_ in [a, b]) 
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)

            np.bincount(i, c) / cnt
            # array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])


            To construct a Series, use



            pd.Series(np.bincount(i, c) / cnt, index=u)

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64


            For comparison, stack returns,



            a.stack().groupby(b.stack()).mean()

            1 -0.088715
            2 -0.340043
            3 -0.045596
            4 0.582136
            dtype: float64



            %timeit a.stack().groupby(b.stack()).mean()
            %%timeit
            c, d = (a_.to_numpy().ravel() for a_ in [a, b])
            u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
            np.bincount(i, c) / cnt

            5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
            113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 8 hours ago

























            answered 9 hours ago









            cs95cs95

            150k26195267




            150k26195267







            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago












            • 3





              Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

              – user3483203
              9 hours ago







            • 3





              @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

              – cs95
              9 hours ago







            • 2





              You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

              – piRSquared
              8 hours ago






            • 1





              great answer. thanks cs.

              – MJS
              8 hours ago







            3




            3





            Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

            – user3483203
            9 hours ago






            Great answer, worth noting that this will fail if you don't have every group from 1-n present. I think a fix would be something like f = np.ones(u.max()), and then f[u-1] = c to divide by that instead

            – user3483203
            9 hours ago





            3




            3





            @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

            – cs95
            9 hours ago






            @user3483203 That's true. In that case we'd have to call bincount on pd.factorize(b.values.ravel())[0] and proceed as planned!

            – cs95
            9 hours ago





            2




            2





            You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

            – piRSquared
            8 hours ago





            You can safeguard with return_inverse... u, i, c = np.unique(b.to_numpy(), return_inverse=True, return_counts=True); pd.Series(np.bincount(i, a.to_numpy().ravel()) / c, u)

            – piRSquared
            8 hours ago




            1




            1





            great answer. thanks cs.

            – MJS
            8 hours ago





            great answer. thanks cs.

            – MJS
            8 hours ago

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56480318%2fgroupby-operation-using-an-entire-dataframe-to-group-values%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

            Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

            Tom Holland Mục lục Đầu đời và giáo dục | Sự nghiệp | Cuộc sống cá nhân | Phim tham gia | Giải thưởng và đề cử | Chú thích | Liên kết ngoài | Trình đơn chuyển hướngProfile“Person Details for Thomas Stanley Holland, "England and Wales Birth Registration Index, 1837-2008" — FamilySearch.org”"Meet Tom Holland... the 16-year-old star of The Impossible""Schoolboy actor Tom Holland finds himself in Oscar contention for role in tsunami drama"“Naomi Watts on the Prince William and Harry's reaction to her film about the late Princess Diana”lưu trữ"Holland and Pflueger Are West End's Two New 'Billy Elliots'""I'm so envious of my son, the movie star! British writer Dominic Holland's spent 20 years trying to crack Hollywood - but he's been beaten to it by a very unlikely rival"“Richard and Margaret Povey of Jersey, Channel Islands, UK: Information about Thomas Stanley Holland”"Tom Holland to play Billy Elliot""New Billy Elliot leaving the garage"Billy Elliot the Musical - Tom Holland - Billy"A Tale of four Billys: Tom Holland""The Feel Good Factor""Thames Christian College schoolboys join Myleene Klass for The Feelgood Factor""Government launches £600,000 arts bursaries pilot""BILLY's Chapman, Holland, Gardner & Jackson-Keen Visit Prime Minister""Elton John 'blown away' by Billy Elliot fifth birthday" (video with John's interview and fragments of Holland's performance)"First News interviews Arrietty's Tom Holland"“33rd Critics' Circle Film Awards winners”“National Board of Review Current Awards”Bản gốc"Ron Howard Whaling Tale 'In The Heart Of The Sea' Casts Tom Holland"“'Spider-Man' Finds Tom Holland to Star as New Web-Slinger”lưu trữ“Captain America: Civil War (2016)”“Film Review: ‘Captain America: Civil War’”lưu trữ“‘Captain America: Civil War’ review: Choose your own avenger”lưu trữ“The Lost City of Z reviews”“Sony Pictures and Marvel Studios Find Their 'Spider-Man' Star and Director”“‘Mary Magdalene’, ‘Current War’ & ‘Wind River’ Get 2017 Release Dates From Weinstein”“Lionsgate Unleashing Daisy Ridley & Tom Holland Starrer ‘Chaos Walking’ In Cannes”“PTA's 'Master' Leads Chicago Film Critics Nominations, UPDATED: Houston and Indiana Critics Nominations”“Nominaciones Goya 2013 Telecinco Cinema – ENG”“Jameson Empire Film Awards: Martin Freeman wins best actor for performance in The Hobbit”“34th Annual Young Artist Awards”Bản gốc“Teen Choice Awards 2016—Captain America: Civil War Leads Second Wave of Nominations”“BAFTA Film Award Nominations: ‘La La Land’ Leads Race”“Saturn Awards Nominations 2017: 'Rogue One,' 'Walking Dead' Lead”Tom HollandTom HollandTom HollandTom Hollandmedia.gettyimages.comWorldCat Identities300279794no20130442900000 0004 0355 42791085670554170004732cb16706349t(data)XX5557367