Description

The economist and Spotify released a data analysis, showing how the global mood of songs played throughout the year changes. I tried to see if I can see the same effect for my own listening habits.

0

Tags & Data Sources

music valence mood seasonal effects Spotify integration

Comments

After @madprime had some issue with metadata not being present, I made a small modification to ensure that it doesn’t crash when this happens. Instead, it now ignores those songs.

gedankenstuecke , 4 years, 5 months ago

Please log in to comment.

Output
Output & Code

Notebook
Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

The Economist published some data visualization in collaboration with Spotify (unfortunately paywalled), showing how February happens to be the month with the 'gloomiest' songs, while July is the most cheerful one. Thanks to Eric for pointing me too it!

I thought it would be interesting to look into this for my own data to see if I can see the same seasonal effect.

This Notebook requires you to have data from the Spotify integration in your Open Humans account if you plan to run this notebook straight on your data. Otherwise you'll have to adapt the first few cells to read your data in another way.

To get started we import some libraries we need and then access your spotify data

Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a dataframe. The lines below do this:

Valence: How to measure happyness/sadness in songs?¶

The best measure we have for this is what Spotify calls valence, which according to Spotify is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

For the analysis in the Economist they rescaled the data to be from 0 to 100 instead of being from 0.0 to 1.0. So we will do the same thing to make it easier to compare!

Overall valence distribution¶

Let's check out how the valence of all the songs I've listened to distributes to get an idea what to expect. The black bars show the distribution, the read line the average valence:

The analysis in the economist found a global average of around 50. My own average comes down below that, at around 37. Which means on average I'm more into gloomy music I guess? According to the data in the Economist, even as a German this makes me stand out as a negative outlier!

Also notice the skew of the distribution, there's very few songs I listen to that come out on the top end (virtually no songs with a score >75).

Average valence ratings per month¶

Let's now start to see if I can personally reproduce the findings of Spotify and Economist. Let's see the distribution per month as a boxplot:

/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: 
Attaching package: ‘lubridate’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: The following object is masked from ‘package:base’:

    date


  warnings.warn(x, RRuntimeWarning)

There seems to be some variation between the months, but the means don't show huge variations. There also seems to be no clear pattern, as some months like July & August actually end up having a comparatively low median valence compared to virtually all other months.

Comparing February to July¶

As it's hard to see, let's just limit ourselves to the main finding of Spotify & Economist and compare the valence of February & July:

That looks very much unlike the results in the Economist! For me the valence of music I listen to during February seems to be significantly higher than for the music I listen to in July! Let's quickly do a statistical analysis to see if that's actually true.

Given that the distribution doesn't seem to be normal, we'll go for a Mann-Whitney-Wilcoxon Test. That might not be fully justified, but let's be a bit lazy here:

	Wilcoxon rank sum test with continuity correction

data:  valence by month
W = 180170, p-value = 0.0001056
alternative hypothesis: true location shift is not equal to 0

And with a p-value of well below 0.05 we can confidently state that those two months do differ in terms of their average valence!

Did you already connect your Spotify account to Open Humans? Then you can run this notebook right away by clicking on Open in Personal Data Notebooks above!

If you haven't connected your account yet, you can start collecting these kind of data by using the Spotify connection! Unfortunately, Spotify only allows us to get access to the last 50 songs you played. But if you make the connection now, we will start compiling this data over time. And in a few months you can start playing around with your data!

Notebook
Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

I thought it would be interesting to look into this for my own data to see if I can see the same seasonal effect.

To get started we import some libraries we need and then access your spotify data

In [1]:

from ohapi import api
import os
import requests
import json
import pandas as pd
import datetime

member = api.exchange_oauth2_member(os.environ.get('OH_ACCESS_TOKEN'))
for f in member['data']:
    if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-listening-archive.json':
        sp_songs = requests.get(f['download_url'])
    if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-track-metadata.json':
        sp_meta = requests.get(f['download_url'])
        
sp_data = json.loads(sp_songs.content)
sp_metadata = json.loads(sp_meta.content)

In [2]:

track_title = []
artist_name = []
album_name = []
played_at = []
popularity = []
duration_ms = []
explicit = []
track_id = []

danceability = []
energy = [] 
key = [] 
loudness = [] 
mode = []
speechiness = []
acousticness = []
instrumentalness = []
liveness = []
valence = []
tempo = []

['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence','tempo']


key_translation = {
    '0': "C",
    "1": "C#",
    "2": "D",
    "3": "D#",
    "4": "E", 
    "5": "F",
    "6": "F#",
    "7": "G",
    "8": "G#",
    "9": "A",
    "10": "A#",
    "11": "B"
}

mode_translation = {'1':'major', '0': 'minor'}

for sp in sp_data:
    if sp['track']['id'] in sp_metadata.keys():
        track_title.append(sp['track']['name'])
        artist_name.append(sp['track']['artists'][0]['name'])
        album_name.append(sp['track']['album']['name'])
        played_at.append(sp['played_at'])
        popularity.append(sp['track']['popularity'])
        duration_ms.append(sp['track']['duration_ms'])
        explicit.append(sp['track']['explicit'])
        track_id.append(sp['track']['id'])
        danceability.append(sp_metadata[sp['track']['id']]['danceability'])
        energy.append(sp_metadata[sp['track']['id']]['energy'])
        key.append(key_translation[str(sp_metadata[sp['track']['id']]['key'])])
        loudness.append(sp_metadata[sp['track']['id']]['loudness'])
        mode.append(mode_translation[str(sp_metadata[sp['track']['id']]['mode'])])
        speechiness.append(sp_metadata[sp['track']['id']]['speechiness'])
        acousticness.append(sp_metadata[sp['track']['id']]['acousticness'])
        instrumentalness.append(sp_metadata[sp['track']['id']]['instrumentalness'])
        liveness.append(sp_metadata[sp['track']['id']]['liveness'])
        valence.append(sp_metadata[sp['track']['id']]['valence'])
        tempo.append(sp_metadata[sp['track']['id']]['tempo'])
    
def parse_timestamp(lst):
    timestamps = []
    for item in lst:
        try:
            timestamp = datetime.datetime.strptime(
                            item,
                            '%Y-%m-%dT%H:%M:%S.%fZ')
        except ValueError:
            timestamp = datetime.datetime.strptime(
                    item,
                    '%Y-%m-%dT%H:%M:%SZ')
        timestamps.append(timestamp)
    return timestamps
    
played_at = parse_timestamp(played_at)

dataframe = pd.DataFrame(data={
    'track_id': track_id,
    'track': track_title,
    'artist': artist_name,
    'album': album_name,
    'popularity': popularity,
    'duration_ms': duration_ms,
    'explicit': explicit,
    'played_at': played_at,
    'danceability': danceability,
    'energy': energy,
    'key': key,
    'loudness': loudness,
    'mode': mode,
    'speechiness': speechiness,
    'acousticness': acousticness,
    'instrumentalness': instrumentalness,
    'liveness': liveness,
    'valence': valence,
    'tempo': tempo

})
dataframe = dataframe.set_index(dataframe['played_at'])

Valence: How to measure happyness/sadness in songs?¶

For the analysis in the Economist they rescaled the data to be from 0 to 100 instead of being from 0.0 to 1.0. So we will do the same thing to make it easier to compare!

Overall valence distribution¶

Let's check out how the valence of all the songs I've listened to distributes to get an idea what to expect. The black bars show the distribution, the read line the average valence:

In [3]:

%load_ext rpy2.ipython

In [4]:

%%R -i dataframe -w 4 -h 2 --units in -r 200
library(ggplot2)
ggplot(dataframe,aes(valence*100)) + 
    geom_histogram(binwidth=10) + 
    scale_x_continuous('valence') + 
    theme_minimal() + 
    geom_vline(xintercept=mean(dataframe$valence*100),color='red') + ggtitle('red bar is average')

Also notice the skew of the distribution, there's very few songs I listen to that come out on the top end (virtually no songs with a score >75).

Average valence ratings per month¶

Let's now start to see if I can personally reproduce the findings of Spotify and Economist. Let's see the distribution per month as a boxplot:

In [5]:

%%R -i dataframe -w 4 -h 3 --units in -r 200

library(ggplot2)
library(lubridate)
dataframe$month <- month(dataframe$played_at)
ggplot(dataframe,aes(x=month,y=valence*100,group=month,fill=as.character(month))) + 
    geom_boxplot(notch=TRUE) +
    scale_x_continuous('month',breaks=c(1,2,3,4,5,6,7,8,9,10,11,12)) + 
    scale_y_continuous('valence') + 
    theme_minimal() + guides(fill=FALSE)

/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: 
Attaching package: ‘lubridate’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: The following object is masked from ‘package:base’:

    date


  warnings.warn(x, RRuntimeWarning)

Comparing February to July¶

As it's hard to see, let's just limit ourselves to the main finding of Spotify & Economist and compare the valence of February & July:

In [6]:

%%R -i dataframe -w 4 -h 3 --units in -r 200

library(ggplot2)
library(lubridate)
dataframe$month <- month(dataframe$played_at)
ggplot(subset(dataframe, dataframe$month %in% c(2,7)) ,aes(x=as.character(month),y=valence*100,group=month,fill=as.character(month))) + 
    geom_boxplot(notch=TRUE) +
    scale_x_discrete('month') + 
    scale_y_continuous('valence') + 
    theme_minimal() + guides(fill=FALSE)

Given that the distribution doesn't seem to be normal, we'll go for a Mann-Whitney-Wilcoxon Test. That might not be fully justified, but let's be a bit lazy here:

In [7]:

%%R -i dataframe -w 4 -h 3 --units in -r 200

library(lubridate)
dataframe$month <- month(dataframe$played_at)

df_sub <- subset(dataframe, dataframe$month %in% c(2,7))
df_sub$month <- as.character(df_sub$month)

wilcox.test(valence ~ month, data=df_sub)

	Wilcoxon rank sum test with continuity correction

data:  valence by month
W = 180170, p-value = 0.0001056
alternative hypothesis: true location shift is not equal to 0

And with a p-value of well below 0.05 we can confidently state that those two months do differ in terms of their average valence!

Did you already connect your Spotify account to Open Humans? Then you can run this notebook right away by clicking on Open in Personal Data Notebooks above!

Details for `spotify-seasonal-mood.ipynb`

Published by gedankenstuecke

Description

0

Tags & Data Sources

Comments

Notebook
Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

Valence: How to measure happyness/sadness in songs?¶

Overall valence distribution¶

Average valence ratings per month¶

Comparing February to July¶

Notebook
Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

Valence: How to measure happyness/sadness in songs?¶

Overall valence distribution¶

Average valence ratings per month¶

Comparing February to July¶

Details for spotify-seasonal-mood.ipynb Open Edit & Run

Published by gedankenstuecke

Description

1555 0

Tags & Data Sources

Comments

Notebook Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

Valence: How to measure happyness/sadness in songs?¶

Overall valence distribution¶

Average valence ratings per month¶

Comparing February to July¶

Notebook Last updated 4 years, 5 months ago

The mood of songs I listen to the seasons.¶

Valence: How to measure happyness/sadness in songs?¶

Overall valence distribution¶

Average valence ratings per month¶

Comparing February to July¶

Details for `spotify-seasonal-mood.ipynb`

0

Notebook
Last updated 4 years, 5 months ago

Notebook
Last updated 4 years, 5 months ago