OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Efficiently look up a column value in a column containing lists with pandas

  • Thread starter Thread starter Cerealz
  • Start date Start date
C

Cerealz

Guest
Assuming the following pandas data frame:

Code:
                                   data  lookup_val
0       [1.3, 4.5, 6.4, 7.3, 8.9, 10.3]           5
1  [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4]           3
2       [3.3, 4.2, 5.1, 7.8, 9.2, 11.5]           6

I need to look up the value within the list of each 'data' column at the position of the value in the 'lookup_val' column. Expected output would be a new column like this:

Code:
                                   data  lookup_val  output
0       [1.3, 4.5, 6.4, 7.3, 8.9, 10.3]           5     8.9
1  [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4]           3     6.4
2       [3.3, 4.2, 5.1, 7.8, 9.2, 11.5]           6    11.5

What is the most efficient way to do so, assuming the data frame has millions of rows like this with each list having a different length, but no longer than 50 values?

Iterating over the data frame or using apply with a simple indexing takes literally hours and a more performant structure is needed.

Code to generate the above sample:

Code:
import pandas as pd

df = pd.DataFrame(
    [
        {'data': [1.3, 4.5, 6.4, 7.3, 8.9, 10.3], 'lookup_val': 5},
        {'data': [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4], 'lookup_val': 3},
        {'data': [3.3, 4.2, 5.1, 7.8, 9.2, 11.5], 'lookup_val': 6},
    ]
)
<p>Assuming the following pandas data frame:</p>
<pre><code> data lookup_val
0 [1.3, 4.5, 6.4, 7.3, 8.9, 10.3] 5
1 [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4] 3
2 [3.3, 4.2, 5.1, 7.8, 9.2, 11.5] 6
</code></pre>
<p>I need to look up the value within the list of each 'data' column at the position of the value in the 'lookup_val' column.
Expected output would be a new column like this:</p>
<pre><code> data lookup_val output
0 [1.3, 4.5, 6.4, 7.3, 8.9, 10.3] 5 8.9
1 [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4] 3 6.4
2 [3.3, 4.2, 5.1, 7.8, 9.2, 11.5] 6 11.5
</code></pre>
<p>What is the most efficient way to do so, assuming the data frame has millions of rows like this with each list having a different length, but no longer than 50 values?</p>
<p>Iterating over the data frame or using apply with a simple indexing takes literally hours and a more performant structure is needed.</p>
<p>Code to generate the above sample:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd

df = pd.DataFrame(
[
{'data': [1.3, 4.5, 6.4, 7.3, 8.9, 10.3], 'lookup_val': 5},
{'data': [2.5, 4.7, 6.4, 6.6, 8.5, 9.3, 17.4], 'lookup_val': 3},
{'data': [3.3, 4.2, 5.1, 7.8, 9.2, 11.5], 'lookup_val': 6},
]
)
</code></pre>
 

Latest posts

Online statistics

Members online
0
Guests online
5
Total visitors
5
Top