OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Pandas: Querying for rows that share multiple values in a large dataset

  • Thread starter Thread starter d-fws
  • Start date Start date
D

d-fws

Guest
Context: I am working with sports data specifically, and I have a dataset of over 700k rows. Each row corresponds to an athlete's performance in a season within a league in a particular year. I want to look for athletes who changed divisions (1) within a season, and (2) between one season and another season.

Consider the following example dataframe, in which the Id is equivalent to a unique identifier for a given athlete:

Code:
Id  Name       Division    Year
-------------------------------
01 Edgar      D2          2015
01 Edgar      D1          2016
01 Edgar      D1          2017
01 Edgar      D2          2018
02 Charlie    D2          2015
02 Charlie    D2          2016
02 Charlie    D1          2016
02 Charlie    D1          2017
02 Charlie    D1          2018
03 Will       D2          2015
03 Will       D1          2015
03 Will       D1          2016
03 Will       D1          2017
03 Will       D2          2018
04 Frank      D2          2015
04 Frank      D2          2016
04 Frank      D1          2016
04 Frank      D2          2017
04 Frank      D1          2017
04 Frank      D2          2018

The first expected output contains rows where an athlete appears in more than one division in one season:

Code:
Id  Name       Division    Year
-------------------------------
02 Charlie    D2          2016
02 Charlie    D1          2016
03 Will       D2          2015
03 Will       D1          2015
04 Frank      D2          2016
04 Frank      D1          2016
04 Frank      D2          2017
04 Frank      D1          2017

The second expected output would look for a player changing divisions year over year. Expected output:

Code:
Id  Name       Division    Year
-------------------------------
01 Edgar      D2          2015
01 Edgar      D1          2016
01 Edgar      D1          2017
01 Edgar      D2          2018
02 Charlie    D2          2015
02 Charlie    D2          2016
02 Charlie    D1          2016
02 Charlie    D1          2017
03 Will       D2          2015
03 Will       D1          2015
03 Will       D1          2016
03 Will       D1          2017
03 Will       D2          2018
04 Frank      D2          2015
04 Frank      D2          2016
04 Frank      D1          2016
04 Frank      D2          2017
04 Frank      D1          2017
04 Frank      D2          2018
<p>Context: I am working with sports data specifically, and I have a dataset of over 700k rows. Each row corresponds to an athlete's performance in a season within a league in a particular year. I want to look for athletes who changed divisions (1) within a season, and (2) between one season and another season.</p>
<p>Consider the following example dataframe, in which the Id is equivalent to a unique identifier for a given athlete:</p>
<pre><code>Id Name Division Year
-------------------------------
01 Edgar D2 2015
01 Edgar D1 2016
01 Edgar D1 2017
01 Edgar D2 2018
02 Charlie D2 2015
02 Charlie D2 2016
02 Charlie D1 2016
02 Charlie D1 2017
02 Charlie D1 2018
03 Will D2 2015
03 Will D1 2015
03 Will D1 2016
03 Will D1 2017
03 Will D2 2018
04 Frank D2 2015
04 Frank D2 2016
04 Frank D1 2016
04 Frank D2 2017
04 Frank D1 2017
04 Frank D2 2018
</code></pre>
<p>The first expected output contains rows where an athlete appears in more than one division in one season:</p>
<pre><code>Id Name Division Year
-------------------------------
02 Charlie D2 2016
02 Charlie D1 2016
03 Will D2 2015
03 Will D1 2015
04 Frank D2 2016
04 Frank D1 2016
04 Frank D2 2017
04 Frank D1 2017
</code></pre>
<p>The second expected output would look for a player changing divisions year over year. Expected output:</p>
<pre><code>Id Name Division Year
-------------------------------
01 Edgar D2 2015
01 Edgar D1 2016
01 Edgar D1 2017
01 Edgar D2 2018
02 Charlie D2 2015
02 Charlie D2 2016
02 Charlie D1 2016
02 Charlie D1 2017
03 Will D2 2015
03 Will D1 2015
03 Will D1 2016
03 Will D1 2017
03 Will D2 2018
04 Frank D2 2015
04 Frank D2 2016
04 Frank D1 2016
04 Frank D2 2017
04 Frank D1 2017
04 Frank D2 2018
</code></pre>
 

Latest posts

Top