The Spotify integration now also collects further metadata on the tracks you listen to, such as the key & mode of the songs and predictions whether songs are live recordings, instrumental, etc. Explore those new data with this notebook!
This Notebook requires you to have data from the Spotify integration in your Open Humans account.
With the notebook we want to look into
To get started we import some libraries we need and then access your spotify data
Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a
dataframe. The lines below do this:
We can now look at the
dataframe and look at some example data we have:
|On the Road
|On the Road
|Joxer Goes to Stuttgart
|On the Road
|Black Is the Colour
|On the Road
|Don't Forget Your Shovel
|On the Road
We got the time at which we listened to a song, the title of the song and album on which it was released along with the artist that made the recording. Furthermore we can see whether the song features explicit lyrics, its popularity and a unique track ID (which we can use to identify whether we listened to the same song more than once)
We start out by having a look at how long the songs we listen to are. For that we plot the distribution of song lengths found in the archive - along with the average song length:
We see that in my case songs on average clock a bit at over 4 minutes in length. Those there is an interesting peak in the data below that. We will look into that later on!
Let us have a look at when we listen to music. Unfortunately the data from Spotify isn't timezone aware and reports the time in
UTC by default. While not on daylight savings time (DST) this is the same time zone as London. During DST it is one hour before London time.
Due to this the best we can do is to approximate the timezone. To do this we can adjust the
UTC_OFFSET variable on top of the next cell. In my case the predominant timezone for the archive is California, so I set it to
UTC_OFFSET <- -7. If you are in a different timezone adjust this accordingly.
In my case the data contains lots of listening done in Europe, which is why there are songs played in the middle of the night from midnight to 5AM. But besides this there are two peaks around 9-12am and 1pm to 4pm. Which is the time I'm usually clearly in the office and listening to music.
We have seen that the Spotify archive contains some details on how popular individual tracks are. Let's have a look on whether the songs I listen to over time are more or less popular:
While there is some small downward trend, this seems more an artifact of the last few days where the average is a bit lower and not some systematic drop in popularity.
But this begs the question, which are the artists I listen to most? Let's have a look at the artists most played in my own Spotify data:
That picture looks just as skewed, with the track Paper Forest by Emmy The Great topping the list. And if you look closely you'll see that this song alone explains all the plays for this artist as shown in the figure above! Similarly, the song Long Way From Home by the Lumineers accounts for most of the plays done by this artist.
Now we can wonder, can the song Paper Forest maybe explain the bump in the average song playtime distribution further above? Let's have a look:
Yep, that fits the peak at less than 4 minutes in the song-length distribution pretty well!
Let's now investigate whether I listen more or less to music over time by plotting the number of songs played on a given day:
There are two things going on: First of all there is not much data for much of September. Which makes sense, as I was traveling a lot during that time and didn't have much time to listen to music during conferences etc.
But there is also some trend to be seen at the beginning of September, which days with lots of plays being interrupted by days with very little music consumption. Could this be an effect of weekdays vs. weekends?
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: Attaching package: ‘lubridate’ warnings.warn(x, RRuntimeWarning) /opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: The following object is masked from ‘package:base’: date warnings.warn(x, RRuntimeWarning)
There seems to be at least a small effect that I play more music during weekdays (
FALSE in the plot above) compared to weekends.
Let's try to see whether
repeat one can explain why the two songs we saw above have been played so much more often than the other ones.
For that we use the unique track ID that Spotify gives to each song and check whether the song played just before had the same track ID. If so we store this repetition and can then plot the data later on.
Below we create the
repeat table of songs that have been played more than once after each other along with the times and dates this happened:
Now we can plot this data. On the X axis we keep the date/time of when the song was played and on the Y axis we have the different songs that were repeated at least once. The plot then shows us when these repeats happened:
And indeed, we can see that both
The Lumineers song
Long Way From Home, as well as
Emmy The Great's Paper Forest have ramped up their counts through playing them on Repeat One.
And at least for
The Lumineers my calendar proves an easy reason why: I fell asleep on my flight from San Francisco to Frankfurt without ever turning off Spotify. 😂
Spotify not only gives you the metadata about how long songs are, but also provides some automatic classifications of those songs. Amongst more traditional musical dimensions, like which key a song is in, whether it is in major or minor mode and how loud the recording is, they also have some further characteristics: These include things like the danceability of a song, its energy and valence (basically a mood-score) as well as scores for whether the song is instrumental, a live recording or an acoustic track.
We can dig into those characteristics. Let's start out by looking at the classical measures: Key, Mode & Volume:
Measuring the average volume of a track in decibels (dB). Values typical range between -60 and 0 db.
The score ranges from 0-1, with 0 most unlikely to be an acoustic track and 1 being most likely
This tries to predict whether a song is a live recording or not. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
according to Spotify A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
According to spotify: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.