OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Remove duplicate rows in DataFrame based on one column containing a substring

  • Thread starter Thread starter Andy Paling
  • Start date Start date
A

Andy Paling

Guest
I have a dataframe like the following:

Code:
ID, Components
1,  "Room 1, ABC"
2,  "Room 2, ABC"
3,  "Room 3, DEF"
4,  "Room 1, DEF"
5,  "Room 3, DEF"

I need to filter the dataframe so that there is only one row per room and the first occurrence of a given room is kept:

Code:
ID, Components
1,  "Room 1, ABC"
2,  "Room 2, ABC"
3,  "Room 3, DEF"

As shown above, we can see the rows with ID 4&5 have been removed as "Room 1" and "Room 3" is used in row's with IDs 1 and 3.

Alternatively, a count of unique rooms would also work, however the remainder of the string for Components can repeat, e.g. there can be numerous ABCs and DEFs but only 1 Room 1/2/3...

Therefore counting unique entries in the Components column will not work. It must be unique only for "Room n".
<p>I have a dataframe like the following:</p>
<pre><code>ID, Components
1, "Room 1, ABC"
2, "Room 2, ABC"
3, "Room 3, DEF"
4, "Room 1, DEF"
5, "Room 3, DEF"

</code></pre>
<p>I need to filter the dataframe so that there is only one row per room and the first occurrence of a given room is kept:</p>
<pre><code>ID, Components
1, "Room 1, ABC"
2, "Room 2, ABC"
3, "Room 3, DEF"
</code></pre>
<p>As shown above, we can see the rows with ID 4&5 have been removed as "Room 1" and "Room 3" is used in row's with IDs 1 and 3.</p>
<p>Alternatively, a count of unique rooms would also work, however the remainder of the string for Components can repeat, e.g. there can be numerous ABCs and DEFs but only 1 Room 1/2/3...</p>
<p>Therefore counting unique entries in the Components column will not work. It must be unique only for "Room n".</p>
 

Latest posts

Top