OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Pandas df.equals() returning False on identical dataframes?

  • Thread starter Thread starter Mahdi
  • Start date Start date
M

Mahdi

Guest
Let df_1 and df_2 be:

Code:
In [1]: import pandas as pd
   ...: df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
   ...: df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

In [2]: df_1
Out[2]:
   a  b
0  1  4
1  2  5
2  3  6

We add a row r to df_1:

Code:
In [3]: r = pd.DataFrame({'a': ['x'], 'b': ['y']})
   ...: df_1 = df_1.append(r, ignore_index=True)

In [4]: df_1
Out[4]:
   a  b
0  1  4
1  2  5
2  3  6
3  x  y

We now remove the added row from df_1 and get the original df_1 back again:

Code:
In [5]: df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)

In [6]: df_1
Out[6]:
   a  b
0  1  4
1  2  5
2  3  6

In [7]: df_2
Out[7]:
   a  b
0  1  4
1  2  5
2  3  6

While df_1 and df_2 are identical, equals() returns False.

Code:
In [8]: df_1.equals(df_2)
Out[8]: False

Did reseach on SO but could not find a related question. Am I doing somthing wrong? How to get the correct result in this case? (df_1==df_2).all().all() returns True but not suitable for the case where df_1 and df_2 have different length.
<p>Let <code>df_1</code> and <code>df_2</code> be:</p>
<pre><code>In [1]: import pandas as pd
...: df_1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
...: df_2 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

In [2]: df_1
Out[2]:
a b
0 1 4
1 2 5
2 3 6
</code></pre>
<p>We add a row <code>r</code> to <code>df_1</code>:</p>
<pre><code>In [3]: r = pd.DataFrame({'a': ['x'], 'b': ['y']})
...: df_1 = df_1.append(r, ignore_index=True)

In [4]: df_1
Out[4]:
a b
0 1 4
1 2 5
2 3 6
3 x y
</code></pre>
<p>We now remove the added row from <code>df_1</code> and get the original <code>df_1</code> back again:</p>
<pre><code>In [5]: df_1 = pd.concat([df_1, r]).drop_duplicates(keep=False)

In [6]: df_1
Out[6]:
a b
0 1 4
1 2 5
2 3 6

In [7]: df_2
Out[7]:
a b
0 1 4
1 2 5
2 3 6
</code></pre>
<p>While <code>df_1</code> and <code>df_2</code> are identical, <code>equals()</code> returns <code>False</code>.</p>
<pre><code>In [8]: df_1.equals(df_2)
Out[8]: False
</code></pre>
<p>Did reseach on SO but could not find a related question.
Am I doing somthing wrong? How to get the correct result in this case?
<code>(df_1==df_2).all().all()</code> returns <code>True</code> but not suitable for the case where <code>df_1</code> and <code>df_2</code> have different length.</p>
 

Latest posts

Online statistics

Members online
0
Guests online
6
Total visitors
6
Top