How to count the number of occurences before a particular value in dataframe python?How to get the current time in PythonHow can I make a time delay in Python?How do I sort a dictionary by value?How to sort a dataframe by multiple column(s)How do I concatenate two lists in Python?Adding new column to existing DataFrame in Python pandasHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
How does a linear operator act on a bra?
What is the name of this Allen-head furniture fastener?
Output a Super Mario Image
What officially disallows US presidents from driving?
Is there a tool to measure the "maturity" of a code in Git?
Which is the current decimal separator?
What organs or modifications would be needed for a life biological creature not to require sleep?
What do the French say for “Oh, you shouldn’t have”?
Why the car dealer is insisting on loan instead of cash
Some Prime Peerage
Has SHA256 been broken by Treadwell Stanton DuPont?
ColorFunction based on array index in ListLinePlot
2000s space film where an alien species has almost wiped out the human race in a war
Is there a real-world mythological counterpart to WoW's "kill your gods for power" theme?
I am getting "syntax error near unexpected token `'$#''" in a simple Bash script
Python web-scraper to download table of transistor counts from Wikipedia
Why is the year in this ISO timestamp not 2019?
Asked to Not Use Transactions and to Use A Workaround to Simulate One
The Planck constant for mathematicians
Does a succubus' charm end when it dies?
Should you only use colons and periods in dialogues?
Where is it? - The Google Earth Challenge Ep. 1
How to be sure services and researches offered by the University are not becoming cases of unfair competition?
Parallel resistance in electric circuits
How to count the number of occurences before a particular value in dataframe python?
How to get the current time in PythonHow can I make a time delay in Python?How do I sort a dictionary by value?How to sort a dataframe by multiple column(s)How do I concatenate two lists in Python?Adding new column to existing DataFrame in Python pandasHow can I replace all the NaN values with Zero's in a column of a pandas dataframeHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a dataframe like below:
A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0
I want the number of occurence of zeroes from df['B']
under the following condition:
if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.
expected output:
A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan
I dont know how to formulate the count part. Any help is really appreciated
python pandas dataframe
add a comment
|
I have a dataframe like below:
A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0
I want the number of occurence of zeroes from df['B']
under the following condition:
if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.
expected output:
A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan
I dont know how to formulate the count part. Any help is really appreciated
python pandas dataframe
Me too, what doesuntil it sees 1
mean?
– Joe
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago
add a comment
|
I have a dataframe like below:
A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0
I want the number of occurence of zeroes from df['B']
under the following condition:
if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.
expected output:
A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan
I dont know how to formulate the count part. Any help is really appreciated
python pandas dataframe
I have a dataframe like below:
A B C
1 1 1
2 0 1
3 0 0
4 1 0
5 0 1
6 0 0
7 1 0
I want the number of occurence of zeroes from df['B']
under the following condition:
if(df['B']<df['C']):
#count number of zeroes in df['B'] until it sees 1.
expected output:
A B C output
1 1 1 Nan
2 0 1 1
3 0 0 Nan
4 1 0 Nan
5 0 1 1
6 0 1 0
7 1 0 Nan
I dont know how to formulate the count part. Any help is really appreciated
python pandas dataframe
python pandas dataframe
edited 8 hours ago
Massifox
5421 silver badge13 bronze badges
5421 silver badge13 bronze badges
asked 9 hours ago
hakuna_codehakuna_code
1518 bronze badges
1518 bronze badges
Me too, what doesuntil it sees 1
mean?
– Joe
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago
add a comment
|
Me too, what doesuntil it sees 1
mean?
– Joe
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago
Me too, what does
until it sees 1
mean?– Joe
9 hours ago
Me too, what does
until it sees 1
mean?– Joe
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago
add a comment
|
3 Answers
3
active
oldest
votes
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
add a comment
|
Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
add a comment
|
Let us push into one-line
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57925273%2fhow-to-count-the-number-of-occurences-before-a-particular-value-in-dataframe-pyt%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
add a comment
|
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
add a comment
|
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
IIUC one approach would be using a custom grouper and aggregating with groupby.cumcount
:
c1 = df.B.lt(df.C)
g = df.B.eq(1).cumsum()
df['out'] = c1.groupby(g).cumcount(ascending=False).shift().where(c1).sub(1)
print(df)
A B C out
0 1 1 1 NaN
1 2 0 1 1.0
2 3 0 0 NaN
3 4 1 0 NaN
4 5 0 1 1.0
5 6 0 1 0.0
6 7 1 0 NaN
answered 9 hours ago
yatuyatu
32.6k6 gold badges26 silver badges58 bronze badges
32.6k6 gold badges26 silver badges58 bronze badges
add a comment
|
add a comment
|
Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
add a comment
|
Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
add a comment
|
Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Using some masking and a groupby on your reversed series. This assumes binary data (only 0 and 1)
m = df['B'][::-1].eq(0)
d = m.groupby(m.ne(m.shift()).cumsum()).cumsum().sub(1)
d[::-1].where(df['B'] < df['C'])
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
Name: B, dtype: float64
And a fast numpy
based approach
def zero_until_one(a, b):
n = a.shape[0]
x = np.flatnonzero(a < b)
y = np.flatnonzero(a == 1)
d = np.searchsorted(y, x)
r = y[d] - x - 1
out = np.full(n, np.nan)
out[x] = r
return out
zero_until_one(df['B'], df['C'])
array([nan, 1., nan, nan, 1., 0., nan])
Performance
df = pd.concat([df]*10_000)
%timeit chris1(df)
19.3 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(df)
12.8 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit zero_until_one(df['B'], df['C'])
2.32 ms ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
edited 8 hours ago
answered 9 hours ago
user3483203user3483203
39k8 gold badges32 silver badges63 bronze badges
39k8 gold badges32 silver badges63 bronze badges
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
add a comment
|
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
1
1
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
Great idea for numpy function , Just guess numba may faster
– WeNYoBen
8 hours ago
add a comment
|
Let us push into one-line
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
add a comment
|
Let us push into one-line
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
add a comment
|
Let us push into one-line
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
Let us push into one-line
df.groupby(df.B.iloc[::-1].cumsum()).cumcount(ascending=False).shift(-1).where(df.B<df.C)
Out[80]:
0 NaN
1 1.0
2 NaN
3 NaN
4 1.0
5 0.0
6 NaN
dtype: float64
answered 8 hours ago
WeNYoBenWeNYoBen
158k8 gold badges54 silver badges86 bronze badges
158k8 gold badges54 silver badges86 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f57925273%2fhow-to-count-the-number-of-occurences-before-a-particular-value-in-dataframe-pyt%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Me too, what does
until it sees 1
mean?– Joe
9 hours ago
until the first occurence of '1' in B
– hakuna_code
9 hours ago