OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to compress pandas dataframe

  • Thread starter Thread starter user13744439
  • Start date Start date
U

user13744439

Guest
Below I am showing few entries of my dataframe. My (each) dataframe has millions row.

Code:
import pandas as pd

data = [{'stamp':'12/31/2020 9:35:42 AM', 'value': 21.99, 'trigger': True}, 
        {'stamp':'12/31/2020 10:35:42 AM', 'value': 22.443, 'trigger': False}, 
        {'stamp':'12/31/2020 11:35:42 AM', 'value': 19.00, 'trigger': False}, 
        {'stamp':'12/31/2020 9:45:42 AM', 'value': 45.02, 'trigger': False}, 
        {'stamp':'12/31/2020 9:55:42 AM', 'value': 48, 'trigger': False}, 
        {'stamp':'12/31/2020 11:35:42 AM', 'value': 48.99, 'trigger': False}]
df = pd.DataFrame(data)

Below is how few ways I can save:

Code:
df.to_parquet('df.parquet', compression = 'gzip')
df.to_csv('df.csv')

I don't see much improvement in to_parquet as compared to to_csv. I wish to minimize the file size on hard drive. Is there any way out?
<p>Below I am showing few entries of my dataframe. My (each) dataframe has millions row.</p>
<pre><code>import pandas as pd

data = [{'stamp':'12/31/2020 9:35:42 AM', 'value': 21.99, 'trigger': True},
{'stamp':'12/31/2020 10:35:42 AM', 'value': 22.443, 'trigger': False},
{'stamp':'12/31/2020 11:35:42 AM', 'value': 19.00, 'trigger': False},
{'stamp':'12/31/2020 9:45:42 AM', 'value': 45.02, 'trigger': False},
{'stamp':'12/31/2020 9:55:42 AM', 'value': 48, 'trigger': False},
{'stamp':'12/31/2020 11:35:42 AM', 'value': 48.99, 'trigger': False}]
df = pd.DataFrame(data)
</code></pre>
<p>Below is how few ways I can save:</p>
<pre><code>df.to_parquet('df.parquet', compression = 'gzip')
df.to_csv('df.csv')
</code></pre>
<p>I don't see much improvement in <code>to_parquet</code> as compared to <code>to_csv</code>. I wish to minimize the file size on hard drive. Is there any way out?</p>
 

Online statistics

Members online
0
Guests online
3
Total visitors
3
Top