How to count the number of occurences before a particular value in dataframe python?How to get the current time in PythonHow can I make a time delay in Python?How do I sort a dictionary by value?How to sort a dataframe by multiple column(s)How do I concatenate two lists in Python?Adding new column to existing DataFrame in Python pandasHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas

How does a linear operator act on a bra?

What is the name of this Allen-head furniture fastener?

Output a Super Mario Image

What officially disallows US presidents from driving?

Is there a tool to measure the "maturity" of a code in Git?

Which is the current decimal separator?

What organs or modifications would be needed for a life biological creature not to require sleep?

What do the French say for “Oh, you shouldn’t have”?

Why the car dealer is insisting on loan instead of cash

Some Prime Peerage

Has SHA256 been broken by Treadwell Stanton DuPont?

ColorFunction based on array index in ListLinePlot

2000s space film where an alien species has almost wiped out the human race in a war

Is there a real-world mythological counterpart to WoW's "kill your gods for power" theme?

I am getting "syntax error near unexpected token `'$#''" in a simple Bash script

Python web-scraper to download table of transistor counts from Wikipedia

Why is the year in this ISO timestamp not 2019?

Asked to Not Use Transactions and to Use A Workaround to Simulate One

The Planck constant for mathematicians

Does a succubus' charm end when it dies?

Should you only use colons and periods in dialogues?

Where is it? - The Google Earth Challenge Ep. 1

How to be sure services and researches offered by the University are not becoming cases of unfair competition?

Parallel resistance in electric circuits



How to count the number of occurences before a particular value in dataframe python?


How to get the current time in PythonHow can I make a time delay in Python?How do I sort a dictionary by value?How to sort a dataframe by multiple column(s)How do I concatenate two lists in Python?Adding new column to existing DataFrame in Python pandasHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6















I have a dataframe like below:



A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0


I want the number of occurence of zeroes from df['B'] under the following condition:



if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.


expected output:



A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan


I dont know how to formulate the count part. Any help is really appreciated










share|improve this question


























  • Me too, what does until it sees 1 mean?

    – Joe
    9 hours ago











  • until the first occurence of '1' in B

    – hakuna_code
    9 hours ago

















6















I have a dataframe like below:



A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0


I want the number of occurence of zeroes from df['B'] under the following condition:



if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.


expected output:



A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan


I dont know how to formulate the count part. Any help is really appreciated










share|improve this question


























  • Me too, what does until it sees 1 mean?

    – Joe
    9 hours ago











  • until the first occurence of '1' in B

    – hakuna_code
    9 hours ago













6












6








6








I have a dataframe like below:



A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0


I want the number of occurence of zeroes from df['B'] under the following condition:



if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.


expected output:



A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan


I dont know how to formulate the count part. Any help is really appreciated










share|improve this question
















I have a dataframe like below:



A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0


I want the number of occurence of zeroes from df['B'] under the following condition:



if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.


expected output:



A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan


I dont know how to formulate the count part. Any help is really appreciated







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 8 hours ago









Massifox

5421 silver badge13 bronze badges




5421 silver badge13 bronze badges










asked 9 hours ago









hakuna_codehakuna_code

1518 bronze badges




1518 bronze badges















  • Me too, what does until it sees 1 mean?

    – Joe
    9 hours ago











  • until the first occurence of '1' in B

    – hakuna_code
    9 hours ago

















  • Me too, what does until it sees 1 mean?

    – Joe
    9 hours ago











  • until the first occurence of '1' in B

    – hakuna_code
    9 hours ago
















Me too, what does until it sees 1 mean?

– Joe
9 hours ago





Me too, what does until it sees 1 mean?

– Joe
9 hours ago













until the first occurence of '1' in B

– hakuna_code
9 hours ago





until the first occurence of '1' in B

– hakuna_code
9 hours ago












3 Answers
3






active

oldest

votes


















5
















IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount:



c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)



print(df)

A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN





share|improve this answer
































    6
















    Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)




    m = df['B'][::-1].eq(0)
    d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
    d[::-1].where(df['B'] < df['C'])




    0 NaN
    1 1.0
    2 NaN
    3 NaN
    4 1.0
    5 0.0
    6 NaN
    Name: B, dtype: float64


    And a fast numpy based approach



    def zero_until_one(a, b):
    n = a.shape[0]
    x = np.flatnonzero(a < b)
    y = np.flatnonzero(a == 1)
    d = np.searchsorted(y, x)
    r = y[d] - x - 1
    out = np.full(n, np.nan)
    out[x] = r
    return out

    zero_until_one(df['B'], df['C'])




    array([nan, 1., nan, nan, 1., 0., nan])


    Performance



    df = pd.concat([df]*10_000)

    %timeit chris1(df)
    19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    %timeit yatu(df)
    12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    %timeit zero_until_one(df['B'], df['C'])
    2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





    share|improve this answer






















    • 1





      Great idea for numpy function , Just guess numba may faster

      – WeNYoBen
      8 hours ago


















    1
















    Let us push into one-line



    df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
    Out[80]:
    0 NaN
    1 1.0
    2 NaN
    3 NaN
    4 1.0
    5 0.0
    6 NaN
    dtype: float64





    share|improve this answer



























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );














      draft saved

      draft discarded
















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57925273%2fhow-to-count-the-number-of-occurences-before-a-particular-value-in-dataframe-pyt%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5
















      IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount:



      c1 = df.B.lt(df.C)
      g = df.B.eq(1).cumsum()
      df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)



      print(df)

      A B C out
      0 1 1 1 NaN
      1 2 0 1 1.0
      2 3 0 0 NaN
      3 4 1 0 NaN
      4 5 0 1 1.0
      5 6 0 1 0.0
      6 7 1 0 NaN





      share|improve this answer





























        5
















        IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount:



        c1 = df.B.lt(df.C)
        g = df.B.eq(1).cumsum()
        df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)



        print(df)

        A B C out
        0 1 1 1 NaN
        1 2 0 1 1.0
        2 3 0 0 NaN
        3 4 1 0 NaN
        4 5 0 1 1.0
        5 6 0 1 0.0
        6 7 1 0 NaN





        share|improve this answer



























          5














          5










          5









          IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount:



          c1 = df.B.lt(df.C)
          g = df.B.eq(1).cumsum()
          df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)



          print(df)

          A B C out
          0 1 1 1 NaN
          1 2 0 1 1.0
          2 3 0 0 NaN
          3 4 1 0 NaN
          4 5 0 1 1.0
          5 6 0 1 0.0
          6 7 1 0 NaN





          share|improve this answer













          IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount:



          c1 = df.B.lt(df.C)
          g = df.B.eq(1).cumsum()
          df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)



          print(df)

          A B C out
          0 1 1 1 NaN
          1 2 0 1 1.0
          2 3 0 0 NaN
          3 4 1 0 NaN
          4 5 0 1 1.0
          5 6 0 1 0.0
          6 7 1 0 NaN






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 9 hours ago









          yatuyatu

          32.6k6 gold badges26 silver badges58 bronze badges




          32.6k6 gold badges26 silver badges58 bronze badges


























              6
















              Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)




              m = df['B'][::-1].eq(0)
              d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
              d[::-1].where(df['B'] < df['C'])




              0 NaN
              1 1.0
              2 NaN
              3 NaN
              4 1.0
              5 0.0
              6 NaN
              Name: B, dtype: float64


              And a fast numpy based approach



              def zero_until_one(a, b):
              n = a.shape[0]
              x = np.flatnonzero(a < b)
              y = np.flatnonzero(a == 1)
              d = np.searchsorted(y, x)
              r = y[d] - x - 1
              out = np.full(n, np.nan)
              out[x] = r
              return out

              zero_until_one(df['B'], df['C'])




              array([nan, 1., nan, nan, 1., 0., nan])


              Performance



              df = pd.concat([df]*10_000)

              %timeit chris1(df)
              19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit yatu(df)
              12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit zero_until_one(df['B'], df['C'])
              2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





              share|improve this answer






















              • 1





                Great idea for numpy function , Just guess numba may faster

                – WeNYoBen
                8 hours ago















              6
















              Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)




              m = df['B'][::-1].eq(0)
              d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
              d[::-1].where(df['B'] < df['C'])




              0 NaN
              1 1.0
              2 NaN
              3 NaN
              4 1.0
              5 0.0
              6 NaN
              Name: B, dtype: float64


              And a fast numpy based approach



              def zero_until_one(a, b):
              n = a.shape[0]
              x = np.flatnonzero(a < b)
              y = np.flatnonzero(a == 1)
              d = np.searchsorted(y, x)
              r = y[d] - x - 1
              out = np.full(n, np.nan)
              out[x] = r
              return out

              zero_until_one(df['B'], df['C'])




              array([nan, 1., nan, nan, 1., 0., nan])


              Performance



              df = pd.concat([df]*10_000)

              %timeit chris1(df)
              19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit yatu(df)
              12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit zero_until_one(df['B'], df['C'])
              2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





              share|improve this answer






















              • 1





                Great idea for numpy function , Just guess numba may faster

                – WeNYoBen
                8 hours ago













              6














              6










              6









              Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)




              m = df['B'][::-1].eq(0)
              d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
              d[::-1].where(df['B'] < df['C'])




              0 NaN
              1 1.0
              2 NaN
              3 NaN
              4 1.0
              5 0.0
              6 NaN
              Name: B, dtype: float64


              And a fast numpy based approach



              def zero_until_one(a, b):
              n = a.shape[0]
              x = np.flatnonzero(a < b)
              y = np.flatnonzero(a == 1)
              d = np.searchsorted(y, x)
              r = y[d] - x - 1
              out = np.full(n, np.nan)
              out[x] = r
              return out

              zero_until_one(df['B'], df['C'])




              array([nan, 1., nan, nan, 1., 0., nan])


              Performance



              df = pd.concat([df]*10_000)

              %timeit chris1(df)
              19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit yatu(df)
              12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit zero_until_one(df['B'], df['C'])
              2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





              share|improve this answer















              Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)




              m = df['B'][::-1].eq(0)
              d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
              d[::-1].where(df['B'] < df['C'])




              0 NaN
              1 1.0
              2 NaN
              3 NaN
              4 1.0
              5 0.0
              6 NaN
              Name: B, dtype: float64


              And a fast numpy based approach



              def zero_until_one(a, b):
              n = a.shape[0]
              x = np.flatnonzero(a < b)
              y = np.flatnonzero(a == 1)
              d = np.searchsorted(y, x)
              r = y[d] - x - 1
              out = np.full(n, np.nan)
              out[x] = r
              return out

              zero_until_one(df['B'], df['C'])




              array([nan, 1., nan, nan, 1., 0., nan])


              Performance



              df = pd.concat([df]*10_000)

              %timeit chris1(df)
              19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit yatu(df)
              12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

              %timeit zero_until_one(df['B'], df['C'])
              2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 8 hours ago

























              answered 9 hours ago









              user3483203user3483203

              39k8 gold badges32 silver badges63 bronze badges




              39k8 gold badges32 silver badges63 bronze badges










              • 1





                Great idea for numpy function , Just guess numba may faster

                – WeNYoBen
                8 hours ago












              • 1





                Great idea for numpy function , Just guess numba may faster

                – WeNYoBen
                8 hours ago







              1




              1





              Great idea for numpy function , Just guess numba may faster

              – WeNYoBen
              8 hours ago





              Great idea for numpy function , Just guess numba may faster

              – WeNYoBen
              8 hours ago











              1
















              Let us push into one-line



              df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
              Out[80]:
              0 NaN
              1 1.0
              2 NaN
              3 NaN
              4 1.0
              5 0.0
              6 NaN
              dtype: float64





              share|improve this answer





























                1
















                Let us push into one-line



                df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
                Out[80]:
                0 NaN
                1 1.0
                2 NaN
                3 NaN
                4 1.0
                5 0.0
                6 NaN
                dtype: float64





                share|improve this answer



























                  1














                  1










                  1









                  Let us push into one-line



                  df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
                  Out[80]:
                  0 NaN
                  1 1.0
                  2 NaN
                  3 NaN
                  4 1.0
                  5 0.0
                  6 NaN
                  dtype: float64





                  share|improve this answer













                  Let us push into one-line



                  df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
                  Out[80]:
                  0 NaN
                  1 1.0
                  2 NaN
                  3 NaN
                  4 1.0
                  5 0.0
                  6 NaN
                  dtype: float64






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 8 hours ago









                  WeNYoBenWeNYoBen

                  158k8 gold badges54 silver badges86 bronze badges




                  158k8 gold badges54 silver badges86 bronze badges































                      draft saved

                      draft discarded















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57925273%2fhow-to-count-the-number-of-occurences-before-a-particular-value-in-dataframe-pyt%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Invision Community Contents History See also References External links Navigation menuProprietaryinvisioncommunity.comIPS Community ForumsIPS Community Forumsthis blog entry"License Changes, IP.Board 3.4, and the Future""Interview -- Matt Mecham of Ibforums""CEO Invision Power Board, Matt Mecham Is a Liar, Thief!"IPB License Explanation 1.3, 1.3.1, 2.0, and 2.1ArchivedSecurity Fixes, Updates And Enhancements For IPB 1.3.1Archived"New Demo Accounts - Invision Power Services"the original"New Default Skin"the original"Invision Power Board 3.0.0 and Applications Released"the original"Archived copy"the original"Perpetual licenses being done away with""Release Notes - Invision Power Services""Introducing: IPS Community Suite 4!"Invision Community Release Notes

                      Canceling a color specificationRandomly assigning color to Graphics3D objects?Default color for Filling in Mathematica 9Coloring specific elements of sets with a prime modified order in an array plotHow to pick a color differing significantly from the colors already in a given color list?Detection of the text colorColor numbers based on their valueCan color schemes for use with ColorData include opacity specification?My dynamic color schemes

                      Ласкавець круглолистий Зміст Опис | Поширення | Галерея | Примітки | Посилання | Навігаційне меню58171138361-22960890446Bupleurum rotundifoliumEuro+Med PlantbasePlants of the World Online — Kew ScienceGermplasm Resources Information Network (GRIN)Ласкавецькн. VI : Літери Ком — Левиправивши або дописавши її