OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

pandas groupby expanding mean does not accept missing values

  • Thread starter Thread starter Thanatopseustes
  • Start date Start date
T

Thanatopseustes

Guest
I've been looking to retrieve group-based expanding means from the following dataset:

Code:
df = pd.DataFrame({'id':[1,1,1,2,2,2],'y':[1,2,3,1,2,3]})

and df.groupby('id').expanding().mean().values returns the correct:

Code:
array([[1. ],
       [1.5],
       [2. ],
       [1. ],
       [1.5],
       [2. ]])

However, in my specific case I have to deal with some missing values as well, so that:

Code:
df2 = pd.DataFrame({'id':[1,1,1,2,2,2],'y':[1,pd.NA,3,1,2,3]})

My expected result applying the same logic would be to ignore the NaN in the computation of the mean, so that from df2.groupby('id').expanding().mean().values I would expect

Code:
array([[1. ],
       [1.],
       [2. ],
       [1. ],
       [1.5],
       [2. ]])

Instead, Pandas returns an error due to applying some type assertion to float in the backend. None of my naive attempts (e.g., .expanding().apply(lambda x: np.nansum(x)) are solving this. Any (possibly equally compact) solution?
<p>I've been looking to retrieve group-based expanding means from the following dataset:</p>
<pre><code>df = pd.DataFrame({'id':[1,1,1,2,2,2],'y':[1,2,3,1,2,3]})
</code></pre>
<p>and <code>df.groupby('id').expanding().mean().values</code> returns the correct:</p>
<pre><code>array([[1. ],
[1.5],
[2. ],
[1. ],
[1.5],
[2. ]])
</code></pre>
<p>However, in my specific case I have to deal with some missing values as well, so that:</p>
<pre><code>df2 = pd.DataFrame({'id':[1,1,1,2,2,2],'y':[1,pd.NA,3,1,2,3]})
</code></pre>
<p>My expected result applying the same logic would be to ignore the NaN in the computation of the mean, so that from <code>df2.groupby('id').expanding().mean().values</code> I would expect</p>
<pre><code>array([[1. ],
[1.],
[2. ],
[1. ],
[1.5],
[2. ]])
</code></pre>
<p>Instead, Pandas returns an error due to applying some type assertion to float in the backend. None of my naive attempts (e.g., <code>.expanding().apply(lambda x: np.nansum(x)</code>) are solving this. Any (possibly equally compact) solution?</p>
 
Top