OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Strange behaviour (out of memory) of groupby function in python

  • Thread starter Thread starter Bruno Oliveira
  • Start date Start date
B

Bruno Oliveira

Guest
I have a long code that at some point has a very small dataframe with 813 rows and 16 columns. To this dataframe i apply the groupby function

Code:
fm = fm.groupby(['Tower_ID'                 ,'Cell_ID'                 ,'Alarm ID'                 ,'Severity'                 ,'Alarm Type'                 ,'Alarm Text'                 ,'Supplementary Info'                 ,'PERIOD_START_TIME'                 #,'File_Name'                ]
               ).agg({'Full_Start_Date': 'min'
                      ,'Full_End_Date': 'max'
                      ,'Alarm hold time (sec)': 'sum'
                      ,'End_Date': 'max'
                      ,'Cross_Over': 'max'
                      ,'Cross_Over_diff': 'max'
                     }
                    )

This results in an error of out of memory numpy.core._exceptions._ArrayMemoryError: Unable to allocate 11.1 GiB for an array with shape (1485993600,) and data type int64

Things that i have tried. 1 - Instead of fm = fm i used different variable like bananas = fm.groupby ..... same result 2 - Tried changing the formats of the columns to the exact type i need, category, int, etc.... same result

What worked

Before the groupby i save the fm dataframe into a file and then read the file back into fm

Code:
fm.to_excel('C:\\home\\fm_data.xlsx')
fm = pd.read_excel('C:\\home\\fm_data.xlsx')

And this works!!!

Can anyone have an ideia of why? This is a very very poor solution and i would want to understand what can be the problem. I appreciate the help.
<p>I have a long code that at some point has a very small dataframe with 813 rows and 16 columns.
To this dataframe i apply the groupby function</p>
<pre><code>fm = fm.groupby(['Tower_ID' ,'Cell_ID' ,'Alarm ID' ,'Severity' ,'Alarm Type' ,'Alarm Text' ,'Supplementary Info' ,'PERIOD_START_TIME' #,'File_Name' ]
).agg({'Full_Start_Date': 'min'
,'Full_End_Date': 'max'
,'Alarm hold time (sec)': 'sum'
,'End_Date': 'max'
,'Cross_Over': 'max'
,'Cross_Over_diff': 'max'
}
)
</code></pre>
<p>This results in an error of out of memory
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 11.1 GiB for an array with shape (1485993600,) and data type int64</p>
<p>Things that i have tried.
1 - Instead of fm = fm i used different variable like bananas = fm.groupby ..... same result
2 - Tried changing the formats of the columns to the exact type i need, category, int, etc.... same result</p>
<p>What worked</p>
<p>Before the groupby i save the fm dataframe into a file and then read the file back into fm</p>
<pre><code>fm.to_excel('C:\\home\\fm_data.xlsx')
fm = pd.read_excel('C:\\home\\fm_data.xlsx')
</code></pre>
<p>And this works!!!</p>
<p>Can anyone have an ideia of why?
This is a very very poor solution and i would want to understand what can be the problem.
I appreciate the help.</p>
 

Latest posts

Top