OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Get minimum through record iterations in pandas dataframe

  • Thread starter Thread starter Giampaolo Levorato
  • Start date Start date
G

Giampaolo Levorato

Guest
I have created the following pandas dataframe:

Code:
import pandas as pd
import numpy as np

ds = { 'trend' : [1,1,1,1,2,2,3,3,3,3,3,3,4,4,4,4,4], 'price' : [23,43,56,21,43,55,54,32,9,12,11,12,23,3,2,1,1]}

df = pd.DataFrame(data=ds)

The dataframe looks as follows:

Code:
display(df)

    trend   price
0      1     23
1      1     43
2      1     56
3      1     21
4      2     43
5      2     55
6      3     54
7      3     32
8      3     9
9      3     12
10     3     11
11     3     12
12     4     23
13     4     3
14     4     2
15     4     1
16     4     1

I have saved the dataframe to a .csv file called df.csv:

Code:
df.to_csv("df.csv", index = False)

I need to create a new field called minimum which:

  1. iterates through each and every record of the dataframe
  2. takes the mimimum between the price observed at each iteration and the last price observed in the previous trend.

For example:

  • I iterate at record 0 and the minimum price is 23 (there is only that one).
  • I iterate at record 1 and take the minimum between 43 and 23: the result is 23.

Fast forward to record 4.

  • I need to calculate the minimum between the price observed at record 4 (price: 43) and the last price observed for the previous trend (price: 21). The result is 21.

Fast forward to record 14.

  • I need to calculate the minimum between the price observed at record 14 (price: 2) and the last price observed for the previous trend (price: 12). The result is 2.

And so on.

I have then written this code:

Code:
minimum = []

for i in range(len(df)):
    ds = pd.read_csv("df.csv", nrows=i+1)
    d = ds.groupby('trend', as_index=False).agg(
                                                    {'price':'last'})

    d['minimum'] = d['price'].min()
    minimum.append(d['minimum'].iloc[-1])


ds['minimum'] = minimum

The resulting dataframe looks as follows:

display(ds)

Code:
        trend   price   minimum
0          1     23         23
1          1     43         43
2          1     56         56
3          1     21         21
4          2     43         21
5          2     55         21
6          3     54         21
7          3     32         21
8          3      9          9
9          3     12         12
10         3     11         11
11         3     12         12
12         4     23         12
13         4      3          3
14         4      2          2
15         4      1          1
16         4      1          1

The resulting dataframe is correct.

The problem is that I have to apply this process to a dataframe which contains about 1 million records and it will take about 48 years to complete.

Does anybody know a quicker way to obtain the same results above?
<p>I have created the following pandas dataframe:</p>
<pre><code>import pandas as pd
import numpy as np

ds = { 'trend' : [1,1,1,1,2,2,3,3,3,3,3,3,4,4,4,4,4], 'price' : [23,43,56,21,43,55,54,32,9,12,11,12,23,3,2,1,1]}

df = pd.DataFrame(data=ds)
</code></pre>
<p>The dataframe looks as follows:</p>
<pre><code>display(df)

trend price
0 1 23
1 1 43
2 1 56
3 1 21
4 2 43
5 2 55
6 3 54
7 3 32
8 3 9
9 3 12
10 3 11
11 3 12
12 4 23
13 4 3
14 4 2
15 4 1
16 4 1
</code></pre>
<p>I have saved the dataframe to a .csv file called <code>df.csv</code>:</p>
<pre><code>df.to_csv("df.csv", index = False)
</code></pre>
<p>I need to create a new field called <code>minimum</code> which:</p>
<ol>
<li>iterates through each and every record of the dataframe</li>
<li>takes the mimimum between the <code>price</code> observed at each iteration and the last <code>price</code> observed in the previous <code>trend</code>.</li>
</ol>
<p>For example:</p>
<ul>
<li>I iterate at record 0 and the minimum price is 23 (there is only that one).</li>
<li>I iterate at record 1 and take the minimum between 43 and 23: the result is 23.</li>
</ul>
<p>Fast forward to record 4.</p>
<ul>
<li>I need to calculate the minimum between the <code>price</code> observed at record 4 (<code>price: 43</code>) and the last <code>price</code> observed for the previous <code>trend</code> (<code>price: 21</code>). The result is 21.</li>
</ul>
<p>Fast forward to record 14.</p>
<ul>
<li>I need to calculate the minimum between the <code>price</code> observed at record 14 (<code>price: 2</code>) and the last <code>price</code> observed for the previous <code>trend</code> (<code>price: 12</code>). The result is 2.</li>
</ul>
<p>And so on.</p>
<p>I have then written this code:</p>
<pre><code>minimum = []

for i in range(len(df)):
ds = pd.read_csv("df.csv", nrows=i+1)
d = ds.groupby('trend', as_index=False).agg(
{'price':'last'})

d['minimum'] = d['price'].min()
minimum.append(d['minimum'].iloc[-1])


ds['minimum'] = minimum
</code></pre>
<p>The resulting dataframe looks as follows:</p>
<p>display(ds)</p>
<pre><code> trend price minimum
0 1 23 23
1 1 43 43
2 1 56 56
3 1 21 21
4 2 43 21
5 2 55 21
6 3 54 21
7 3 32 21
8 3 9 9
9 3 12 12
10 3 11 11
11 3 12 12
12 4 23 12
13 4 3 3
14 4 2 2
15 4 1 1
16 4 1 1
</code></pre>
<p>The resulting dataframe is correct.</p>
<p>The problem is that I have to apply this process to a dataframe which contains about 1 million records and it will take about 48 years to complete.</p>
<p>Does anybody know a quicker way to obtain the same results above?</p>
 

Latest posts

M
Replies
0
Views
1
Meliodas Dragon
M
Top