OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Pandas:using list as cell value - different approach to modify values and NaN issues

  • Thread starter Thread starter sebieire
  • Start date Start date
S

sebieire

Guest
This is 50% a question and 50% an observation that baffles me a bit. Maybe someone can enlighten me.

Also I would like to know opinions on using lists as cell values. Yes/No and why please.

Here is a trivial example:

Code:
data = [[['apple', 'banana'],1], [['grape', 'orange'],2], [['banana', 'lemon'],4]]
df = pd.DataFrame(data, columns=['Fruit', 'Count'])

which results in:

Code:
             Fruit  Count
0  [apple, banana]      1
1  [grape, orange]      2
2  [banana, lemon]      4

Given a new list:

Code:
input_list = ['melon', 'kiwi']

The using 'loc' approach:

(A) Outright doesn't work.

Code:
df.loc[df['Count'] == 2, 'Fruit'] = [input_list] # with or without wrapping brackets is both bust

(B) Using Series also doesn't work

Code:
ser = pd.Series(input_list) # NO wrapping which is an incorrect length Series object - fair enough
df.loc[df['Count'] == 2, 'Fruit'] = ser

# wrong result --->

             Fruit  Count
0  [apple, banana]      1
1             kiwi      2
2  [banana, lemon]      4

(C) Series Take 2

Code:
ser = pd.Series([input_list])  # WITH wrapping = Series --> 0 [melon, kiwi]
df.loc[df['Count'] == 2, 'Fruit'] = ser

# wrong result ---> NaN??? HUH?

             Fruit  Count
0  [apple, banana]      1
1              NaN      2
2  [banana, lemon]      4

The using 'at' approach:

(D)

Code:
mask = df['Count'] == 2
mask_match_idx = df[mask].index.values[0] # first match int value
df.at[mask_match_idx, 'Fruit'] = input_list

# results in (finally) the correct result

             Fruit  Count
0  [apple, banana]      1
1    [melon, kiwi]      2
2  [banana, lemon]      4

I understand that B is bust because of the wrong length Series object. But why are (A) (or a version thereof) and (C) wrong? Or how could they work? Especially the NaN result is confusing. Why is that happening?

Is the conclusion to always use 'at' in those kind of cases?

And again: What are the takes for using lists as cell values in regards to stuff like this happening etc. Would love some input here and potential alternative suggestions if lists are a no go.

Thank you!
<p>This is 50% a question and 50% an observation that baffles me a bit. Maybe someone can enlighten me.</p>
<p>Also I would like to know opinions on using lists as cell values. Yes/No and why please.</p>
<p>Here is a trivial example:</p>
<pre><code>data = [[['apple', 'banana'],1], [['grape', 'orange'],2], [['banana', 'lemon'],4]]
df = pd.DataFrame(data, columns=['Fruit', 'Count'])
</code></pre>
<p>which results in:</p>
<pre><code> Fruit Count
0 [apple, banana] 1
1 [grape, orange] 2
2 [banana, lemon] 4
</code></pre>
<p>Given a new list:</p>
<pre><code>input_list = ['melon', 'kiwi']
</code></pre>
<p><strong>The using 'loc' approach:</strong></p>
<p>(A) Outright doesn't work.</p>
<pre><code>df.loc[df['Count'] == 2, 'Fruit'] = [input_list] # with or without wrapping brackets is both bust
</code></pre>
<p>(B) Using Series also doesn't work</p>
<pre><code>ser = pd.Series(input_list) # NO wrapping which is an incorrect length Series object - fair enough
df.loc[df['Count'] == 2, 'Fruit'] = ser

# wrong result --->

Fruit Count
0 [apple, banana] 1
1 kiwi 2
2 [banana, lemon] 4
</code></pre>
<p>(C) Series Take 2</p>
<pre><code>ser = pd.Series([input_list]) # WITH wrapping = Series --> 0 [melon, kiwi]
df.loc[df['Count'] == 2, 'Fruit'] = ser

# wrong result ---> NaN??? HUH?

Fruit Count
0 [apple, banana] 1
1 NaN 2
2 [banana, lemon] 4

</code></pre>
<p><strong>The using 'at' approach:</strong></p>
<p>(D)</p>
<pre><code>mask = df['Count'] == 2
mask_match_idx = df[mask].index.values[0] # first match int value
df.at[mask_match_idx, 'Fruit'] = input_list

# results in (finally) the correct result

Fruit Count
0 [apple, banana] 1
1 [melon, kiwi] 2
2 [banana, lemon] 4
</code></pre>
<p>I understand that B is bust because of the wrong length Series object.
But why are (A) (or a version thereof) and (C) wrong? Or how could they work? Especially the NaN result is confusing. Why is that happening?</p>
<p>Is the conclusion to always use 'at' in those kind of cases?</p>
<p>And again: What are the takes for using lists as cell values in regards to stuff like this happening etc. Would love some input here and potential alternative suggestions if lists are a no go.</p>
<p>Thank you!</p>
 

Online statistics

Members online
0
Guests online
5
Total visitors
5
Top