Details for spotify-seasonal-mood.ipynb

Published by gedankenstuecke

Description

The economist and Spotify released a data analysis, showing how the global mood of songs played throughout the year changes. I tried to see if I can see the same effect for my own listening habits.

0

Tags & Data Sources

music valence mood seasonal effects Spotify integration

Comments

After @madprime had some issue with metadata not being present, I made a small modification to ensure that it doesn’t crash when this happens. Instead, it now ignores those songs.

Please log in to comment.

Notebook
Last updated 4 months, 1 week ago

The mood of songs I listen to the seasons.

The Economist published some data visualization in collaboration with Spotify (unfortunately paywalled), showing how February happens to be the month with the 'gloomiest' songs, while July is the most cheerful one. Thanks to Eric for pointing me too it!

I thought it would be interesting to look into this for my own data to see if I can see the same seasonal effect.

This Notebook requires you to have data from the Spotify integration in your Open Humans account if you plan to run this notebook straight on your data. Otherwise you'll have to adapt the first few cells to read your data in another way.

To get started we import some libraries we need and then access your spotify data

Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a dataframe. The lines below do this:

Valence: How to measure happyness/sadness in songs?

The best measure we have for this is what Spotify calls valence, which according to Spotify is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

For the analysis in the Economist they rescaled the data to be from 0 to 100 instead of being from 0.0 to 1.0. So we will do the same thing to make it easier to compare!

Overall valence distribution

Let's check out how the valence of all the songs I've listened to distributes to get an idea what to expect. The black bars show the distribution, the read line the average valence:

The analysis in the economist found a global average of around 50. My own average comes down below that, at around 37. Which means on average I'm more into gloomy music I guess? According to the data in the Economist, even as a German this makes me stand out as a negative outlier!

Also notice the skew of the distribution, there's very few songs I listen to that come out on the top end (virtually no songs with a score >75).

Average valence ratings per month

Let's now start to see if I can personally reproduce the findings of Spotify and Economist. Let's see the distribution per month as a boxplot:

/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: 
Attaching package: ‘lubridate’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: The following object is masked from ‘package:base’:

    date


  warnings.warn(x, RRuntimeWarning)

There seems to be some variation between the months, but the means don't show huge variations. There also seems to be no clear pattern, as some months like July & August actually end up having a comparatively low median valence compared to virtually all other months.

Comparing February to July

As it's hard to see, let's just limit ourselves to the main finding of Spotify & Economist and compare the valence of February & July:

That looks very much unlike the results in the Economist! For me the valence of music I listen to during February seems to be significantly higher than for the music I listen to in July! Let's quickly do a statistical analysis to see if that's actually true.

Given that the distribution doesn't seem to be normal, we'll go for a Mann-Whitney-Wilcoxon Test. That might not be fully justified, but let's be a bit lazy here:

	Wilcoxon rank sum test with continuity correction

data:  valence by month
W = 180170, p-value = 0.0001056
alternative hypothesis: true location shift is not equal to 0

And with a p-value of well below 0.05 we can confidently state that those two months do differ in terms of their average valence!

Did you already connect your Spotify account to Open Humans? Then you can run this notebook right away by clicking on Open in Personal Data Notebooks above!

If you haven't connected your account yet, you can start collecting these kind of data by using the Spotify connection! Unfortunately, Spotify only allows us to get access to the last 50 songs you played. But if you make the connection now, we will start compiling this data over time. And in a few months you can start playing around with your data!

Notebook
Last updated 4 months, 1 week ago

The mood of songs I listen to the seasons.

The Economist published some data visualization in collaboration with Spotify (unfortunately paywalled), showing how February happens to be the month with the 'gloomiest' songs, while July is the most cheerful one. Thanks to Eric for pointing me too it!

I thought it would be interesting to look into this for my own data to see if I can see the same seasonal effect.

This Notebook requires you to have data from the Spotify integration in your Open Humans account if you plan to run this notebook straight on your data. Otherwise you'll have to adapt the first few cells to read your data in another way.

To get started we import some libraries we need and then access your spotify data

In [1]:
from ohapi import api
import os
import requests
import json
import pandas as pd
import datetime

member = api.exchange_oauth2_member(os.environ.get('OH_ACCESS_TOKEN'))
for f in member['data']:
    if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-listening-archive.json':
        sp_songs = requests.get(f['download_url'])
    if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-track-metadata.json':
        sp_meta = requests.get(f['download_url'])
        
sp_data = json.loads(sp_songs.content)
sp_metadata = json.loads(sp_meta.content)

Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a dataframe. The lines below do this:

In [2]:
track_title = []
artist_name = []
album_name = []
played_at = []
popularity = []
duration_ms = []
explicit = []
track_id = []

danceability = []
energy = [] 
key = [] 
loudness = [] 
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []

['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence','tempo']


key_translation = {
    '0': "C",
    "1": "C#",
    "2": "D",
    "3": "D#",
    "4": "E", 
    "5": "F",
    "6": "F#",
    "7": "G",
    "8": "G#",
    "9": "A",
    "10": "A#",
    "11": "B"
}

mode_translation = {'1':'major', '0': 'minor'}

for sp in sp_data:
    if sp['track']['id'] in sp_metadata.keys():
        track_title.append(sp['track']['name'])
        artist_name.append(sp['track']['artists'][0]['name'])
        album_name.append(sp['track']['album']['name'])
        played_at.append(sp['played_at'])
        popularity.append(sp['track']['popularity'])
        duration_ms.append(sp['track']['duration_ms'])
        explicit.append(sp['track']['explicit'])
        track_id.append(sp['track']['id'])
        danceability.append(sp_metadata[sp['track']['id']]['danceability'])
        energy.append(sp_metadata[sp['track']['id']]['energy'])
        key.append(key_translation[str(sp_metadata[sp['track']['id']]['key'])])
        loudness.append(sp_metadata[sp['track']['id']]['loudness'])
        mode.append(mode_translation[str(sp_metadata[sp['track']['id']]['mode'])])
        speechiness.append(sp_metadata[sp['track']['id']]['speechiness'])
        acousticness.append(sp_metadata[sp['track']['id']]['acousticness'])
        instrumentalness.append(sp_metadata[sp['track']['id']]['instrumentalness'])
        liveness.append(sp_metadata[sp['track']['id']]['liveness'])
        valence.append(sp_metadata[sp['track']['id']]['valence'])
        tempo.append(sp_metadata[sp['track']['id']]['tempo'])
    
def parse_timestamp(lst):
    timestamps = []
    for item in lst:
        try:
            timestamp = datetime.datetime.strptime(
                            item,
                            '%Y-%m-%dT%H:%M:%S.%fZ')
        except ValueError:
            timestamp = datetime.datetime.strptime(
                    item,
                    '%Y-%m-%dT%H:%M:%SZ')
        timestamps.append(timestamp)
    return timestamps
    
played_at = parse_timestamp(played_at)

dataframe = pd.DataFrame(data={
    'track_id': track_id,
    'track': track_title,
    'artist': artist_name,
    'album': album_name,
    'popularity': popularity,
    'duration_ms': duration_ms,
    'explicit': explicit,
    'played_at': played_at,
    'danceability': danceability,
    'energy': energy,
    'key': key,
    'loudness': loudness,
    'mode': mode,
    'speechiness': speechiness,
    'acousticness': acousticness,
    'instrumentalness': instrumentalness,
    'liveness': liveness,
    'valence': valence,
    'tempo': tempo

})
dataframe = dataframe.set_index(dataframe['played_at'])

Valence: How to measure happyness/sadness in songs?

The best measure we have for this is what Spotify calls valence, which according to Spotify is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

For the analysis in the Economist they rescaled the data to be from 0 to 100 instead of being from 0.0 to 1.0. So we will do the same thing to make it easier to compare!

Overall valence distribution

Let's check out how the valence of all the songs I've listened to distributes to get an idea what to expect. The black bars show the distribution, the read line the average valence:

In [3]:
%load_ext rpy2.ipython
In [4]:
%%R -i dataframe -w 4 -h 2 --units in -r 200
library(ggplot2)
ggplot(dataframe,aes(valence*100)) + 
    geom_histogram(binwidth=10) + 
    scale_x_continuous('valence') + 
    theme_minimal() + 
    geom_vline(xintercept=mean(dataframe$valence*100),color='red') + ggtitle('red bar is average')

The analysis in the economist found a global average of around 50. My own average comes down below that, at around 37. Which means on average I'm more into gloomy music I guess? According to the data in the Economist, even as a German this makes me stand out as a negative outlier!

Also notice the skew of the distribution, there's very few songs I listen to that come out on the top end (virtually no songs with a score >75).

Average valence ratings per month

Let's now start to see if I can personally reproduce the findings of Spotify and Economist. Let's see the distribution per month as a boxplot:

In [5]:
%%R -i dataframe -w 4 -h 3 --units in -r 200

library(ggplot2)
library(lubridate)
dataframe$month <- month(dataframe$played_at)
ggplot(dataframe,aes(x=month,y=valence*100,group=month,fill=as.character(month))) + 
    geom_boxplot(notch=TRUE) +
    scale_x_continuous('month',breaks=c(1,2,3,4,5,6,7,8,9,10,11,12)) + 
    scale_y_continuous('valence') + 
    theme_minimal() + guides(fill=FALSE)
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: 
Attaching package: ‘lubridate’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: The following object is masked from ‘package:base’:

    date


  warnings.warn(x, RRuntimeWarning)

There seems to be some variation between the months, but the means don't show huge variations. There also seems to be no clear pattern, as some months like July & August actually end up having a comparatively low median valence compared to virtually all other months.

Comparing February to July

As it's hard to see, let's just limit ourselves to the main finding of Spotify & Economist and compare the valence of February & July:

In [6]:
%%R -i dataframe -w 4 -h 3 --units in -r 200

library(ggplot2)
library(lubridate)
dataframe$month <- month(dataframe$played_at)
ggplot(subset(dataframe, dataframe$month %in% c(2,7)) ,aes(x=as.character(month),y=valence*100,group=month,fill=as.character(month))) + 
    geom_boxplot(notch=TRUE) +
    scale_x_discrete('month') + 
    scale_y_continuous('valence') + 
    theme_minimal() + guides(fill=FALSE)

That looks very much unlike the results in the Economist! For me the valence of music I listen to during February seems to be significantly higher than for the music I listen to in July! Let's quickly do a statistical analysis to see if that's actually true.

Given that the distribution doesn't seem to be normal, we'll go for a Mann-Whitney-Wilcoxon Test. That might not be fully justified, but let's be a bit lazy here:

In [7]:
%%R -i dataframe -w 4 -h 3 --units in -r 200

library(lubridate)
dataframe$month <- month(dataframe$played_at)

df_sub <- subset(dataframe, dataframe$month %in% c(2,7))
df_sub$month <- as.character(df_sub$month)

wilcox.test(valence ~ month, data=df_sub)
	Wilcoxon rank sum test with continuity correction

data:  valence by month
W = 180170, p-value = 0.0001056
alternative hypothesis: true location shift is not equal to 0

And with a p-value of well below 0.05 we can confidently state that those two months do differ in terms of their average valence!

Did you already connect your Spotify account to Open Humans? Then you can run this notebook right away by clicking on Open in Personal Data Notebooks above!

If you haven't connected your account yet, you can start collecting these kind of data by using the Spotify connection! Unfortunately, Spotify only allows us to get access to the last 50 songs you played. But if you make the connection now, we will start compiling this data over time. And in a few months you can start playing around with your data!