OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

I have a dataframe of songs that I am trying to answer questions based on the timestamp

  • Thread starter Thread starter sonofjeffgoldblum
  • Start date Start date
S

sonofjeffgoldblum

Guest
I have imported, sorted, and combined 7 files, one per day of week. From here I have identified there are some duplicates and now searching for specifics of a program that scraped a daily listing of songs where there may be a commercial break (every 5 minutes) or gap between files/records longer than 13 hours. Each time going through, I cannot find where my datetime or format does not allow a .dt datetimelike values to answer some of these questions below. Pretty much lost how to start a few of these as I am still a bit of a newbie learning about Python.

  1. Find missing records - gap of 13 or more hours
  2. Using the average song length in minutes - found as 3.18 minutes, find the average commercial duration by subtracting the average length from the delta to the next song start.
  3. How many songs are played in a 24-hour period? Any difference between day and night? Between weekdays and weekends?
  4. How many hours of commercials (or conversely, music) are played in a 24-hour period? Any difference: day vs. night, weekdays vs. weekends?
  5. Describe the patterns of commercial breaks, e.g., how many breaks, usually at what times, etc.
  6. During the 1-week period, list a. top 10 songs and their number of air plays b. top 10 artists and their number of air plays c. top 10 artists in terms of number of distinct songs (i.e., for each artist, determine number of distinct songs)

sample of data 1 Started with:`

Code:
    df['time_interval_hr'] = df['ts'].diff().fillna(timedelta(0)).apply(lambda x: x.total_seconds() // 3600

From here, how to show only those rows with >= 13 hour difference. I have tried several versions but either get a .dt needs datetimelike values or other errors. Otherwise, I get a full datetime instead of hours alone. enter image description here 2 3 Started with: `

Code:
    df['24-HourCount'] = df.transform(lambda x:\
          len(df[(df['ts'].between(x['ts'] - timedelta(days=1),
                                                  x['ts']))]['artist'].unique()), axis=1)                                                       
    df = df.set_index('ts')                                               
    df = df[['24-HourCount']].resample('30T').max()

` How can I count through each file to get an average of how many songs are played? 4 Started with same as #3 above, but figure I would need to do some aggregation to sum commercial time from the average song length. 5 Not sure how to start, but would be interested in seeing this figure 6 Over a week period, same as #5, interested to see this billboard type figure.
<p>I have imported, sorted, and combined 7 files, one per day of week. From here I have identified there are some duplicates and now searching for specifics of a program that scraped a daily listing of songs where there may be a commercial break (every 5 minutes) or gap between files/records longer than 13 hours. Each time going through, I cannot find where my datetime or format does not allow a .dt datetimelike values to answer some of these questions below. Pretty much lost how to start a few of these as I am still a bit of a newbie learning about Python.</p>
<ol>
<li>Find missing records - gap of 13 or more hours</li>
<li>Using the average song length in minutes - found as 3.18 minutes, find the average commercial duration by subtracting the average length from the delta to the next song start.</li>
<li>How many songs are played in a 24-hour period? Any difference between day and night?
Between weekdays and weekends?</li>
<li>How many hours of commercials (or conversely, music) are played in a 24-hour period?
Any difference: day vs. night, weekdays vs. weekends?</li>
<li>Describe the patterns of commercial breaks, e.g., how many breaks, usually at what
times, etc.</li>
<li>During the 1-week period, list
a. top 10 songs and their number of air plays
b. top 10 artists and their number of air plays
c. top 10 artists in terms of number of distinct songs (i.e., for each artist, determine
number of distinct songs)</li>
</ol>
<p><a href="https://i.sstatic.net/xFOCFA8i.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/xFOCFA8i.png" alt="sample of data" /></a>
1 Started with:`</p>
<pre><code> df['time_interval_hr'] = df['ts'].diff().fillna(timedelta(0)).apply(lambda x: x.total_seconds() // 3600
</code></pre>
<p>From here, how to show only those rows with >= 13 hour difference. I have tried several versions but either get a .dt needs datetimelike values or other errors. Otherwise, I get a full datetime instead of hours alone.
<a href="https://i.sstatic.net/3nqyR2lD.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/3nqyR2lD.png" alt="enter image description here" /></a>
2
3 Started with:
`</p>
<pre><code> df['24-HourCount'] = df.transform(lambda x:\
len(df[(df['ts'].between(x['ts'] - timedelta(days=1),
x['ts']))]['artist'].unique()), axis=1)
df = df.set_index('ts')
df = df[['24-HourCount']].resample('30T').max()

</code></pre>
<p>`
How can I count through each file to get an average of how many songs are played?
4 Started with same as #3 above, but figure I would need to do some aggregation to sum commercial time from the average song length.
5 Not sure how to start, but would be interested in seeing this figure
6 Over a week period, same as #5, interested to see this billboard type figure.</p>
 

Latest posts

А
Replies
0
Views
1
Али-Мухаммад Закарьяев
А
M
Replies
0
Views
1
Marcos R. Guevara
M
M
Replies
0
Views
1
Marcos R. Guevara
M
Top