OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How can I filter groups by comparing the first value of each group and the last cummax that changes conditionally?

  • Thread starter Thread starter AmirX
  • Start date Start date
A

AmirX

Guest
My DataFrame:

Code:
import pandas as pd
df = pd.DataFrame(
    {
        'group': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e'],
        'num': [1, 2, 3, 1, 12, 12, 13, 2, 4, 2, 5, 6, 10, 20, 30]
    }
)

Expected output is getting three groups from above df

Code:
  group  num
0      a    1
1      a    2
2      a    3

   group  num
6      c   13
7      c    2
8      c    4

   group  num
12     e   10
13     e   20
14     e   30

Logic:

I want to compare the first value of each group to the last cummax of num column. I can explain better by this code:

Code:
df['last_num'] = df.groupby('group')['num'].tail(1)
df['last_num'] = df.last_num.ffill().cummax()

But I think what I really need is this desired_cummax:

Code:
   group  num  last_num   desired_cummax
0      a    1       NaN    3
1      a    2       NaN    3
2      a    3       3.0    3
3      b    1       3.0    3
4      b   12       3.0    3
5      b   12      12.0    3 
6      c   13      12.0    3
7      c    2      12.0    3
8      c    4      12.0    4
9      d    2      12.0    4
10     d    5      12.0    4
11     d    6      12.0    4
12     e   10      12.0    4
13     e   20      12.0    4
14     e   30      30.0    30

I don't want a new cummax if the first value of num for each group is less than last_num.

For example for group b, the first value of num is 1. Since it is less that its last_num, when it reaches the end of the group b it should not put 12. It should still be 3.

Now for group c, since its first value is more than last_num, when it reaches at the end of group c, a new cummax will be set.

After that I want to filter the groups. If df.num.iloc[0] > df.desired_cummax.iloc[0]

Note that the first group should be in the expected output no matter what.

Maybe there is a better approach to solve this. But this is what I have thought might work.

My attempt was creating last_num but I don't know how to continue.
<p>My DataFrame:</p>
<pre><code>import pandas as pd
df = pd.DataFrame(
{
'group': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e'],
'num': [1, 2, 3, 1, 12, 12, 13, 2, 4, 2, 5, 6, 10, 20, 30]
}
)
</code></pre>
<p>Expected output is getting three groups from above <code>df</code></p>
<pre><code> group num
0 a 1
1 a 2
2 a 3

group num
6 c 13
7 c 2
8 c 4

group num
12 e 10
13 e 20
14 e 30
</code></pre>
<p>Logic:</p>
<p>I want to compare the first value of each group to the last <code>cummax</code> of <code>num</code> column. I can explain better by this code:</p>
<pre><code>df['last_num'] = df.groupby('group')['num'].tail(1)
df['last_num'] = df.last_num.ffill().cummax()
</code></pre>
<p>But I think what I really need is this <code>desired_cummax</code>:</p>
<pre><code> group num last_num desired_cummax
0 a 1 NaN 3
1 a 2 NaN 3
2 a 3 3.0 3
3 b 1 3.0 3
4 b 12 3.0 3
5 b 12 12.0 3
6 c 13 12.0 3
7 c 2 12.0 3
8 c 4 12.0 4
9 d 2 12.0 4
10 d 5 12.0 4
11 d 6 12.0 4
12 e 10 12.0 4
13 e 20 12.0 4
14 e 30 30.0 30
</code></pre>
<p>I don't want a new <code>cummax</code> if the first value of <code>num</code> for each group is less than <code>last_num</code>.</p>
<p>For example for group <code>b</code>, the first value of <code>num</code> is 1. Since it is less that its <code>last_num</code>, when it reaches the end of the group <code>b</code> it should not put 12. It should still be 3.</p>
<p>Now for group <code>c</code>, since its first value is more than <code>last_num</code>, when it reaches at the end of group <code>c</code>, a new <code>cummax</code> will be set.</p>
<p>After that I want to filter the groups. If <code>df.num.iloc[0] > df.desired_cummax.iloc[0]</code></p>
<p>Note that the first group should be in the expected output no matter what.</p>
<p>Maybe there is a better approach to solve this. But this is what I have thought might work.</p>
<p>My attempt was creating <code>last_num</code> but I don't know how to continue.</p>
 

Latest posts

Top