Details for correlate-oura-fitbit-sleep.ipynb

Published by gedankenstuecke

Description

How similar are the sleep measurements done by a Fitbit and the Oura ring. And how things can go wrong if different wearables use different models for storing their data.

1

Tags & Data Sources

sleep tracking Fitbit Connection Oura Connect

Comments

Please log in to comment.

Notebook
Last updated 2 months ago

Correlating the sleep durations of Fitbit & the Oura ring

I was wondering: How similar are the sleep time measures that my Fitbit Versa and my Oura ring report as I have plenty of nights where I've been wearing both devices. To find out I wanted to run a quick correlation with the data stored in Open Humans.

To run this notebook you need to have data from both the Fitbit Connection and the Oura connect Open Humans projects in your account.

Loading the data

Let's load the data from both data sources in a first step:

In [1]:
from ohapi import api
import os
import requests
import json 
from collections import defaultdict
import datetime
import matplotlib.pyplot as plt
from numpy.polynomial.polynomial import polyfit

user_details = api.exchange_oauth2_member(os.environ.get('OH_ACCESS_TOKEN'))
for i in user_details['data']:
    if i['source'] == 'direct-sharing-184' and i['basename'] == 'oura-data.json':
        oura = json.loads(requests.get(i['download_url']).content)
    if i['basename'] == 'fitbit-data.json' and i['source'] == 'direct-sharing-102':
        fitbit_data = requests.get(i['download_url']).json()

Now we can throw the data in one joined data structure for doing the plotting. We limit the analysis for data from 2018 and 2019 as these are the only years in which I have data for both devices from the same nights.

In [2]:
sleep_data = defaultdict(dict)

for sleep in oura['sleep']:
    if (sleep["summary_date"][:4] == "2019") or sleep["summary_date"][:4] == "2018":
        date = sleep['summary_date']
        sleep_data[date]['oura'] = sleep['total']/60

        
for sleep in fitbit_data['sleep-minutes']['2019']['sleep-minutesAsleep']:
    sleep_data[sleep['dateTime']]['fitbit'] = sleep['value']

for sleep in fitbit_data['sleep-minutes']['2018']['sleep-minutesAsleep']:
    sleep_data[sleep['dateTime']]['fitbit'] = sleep['value']
    
date = []
oura_values = []
fitbit = []

for k,v in sleep_data.items():
    date.append(k)
    if 'oura' in v.keys():
        oura_values.append(v['oura'])
    else:
        oura_values.append(None)
    if 'fitbit' in v.keys():
        fitbit.append(v['fitbit'])
    else:
        fitbit.append(None)
    
import pandas as pd
dataframe = pd.DataFrame(
    data = {
        'date': date,
        'oura': oura_values,
        'fitbit': fitbit
    }
)

dataframe.head()
Out[2]:
date oura fitbit
0 2018-11-06 484.5 431
1 2018-11-07 397.0 443
2 2018-11-08 448.5 330
3 2018-11-09 424.5 404
4 2018-11-10 422.5 407

Naively correlating Fitbit & Oura sleep values

As you can see we have a table with two columns: The minutes of sleep for Fitbit and Oura. With this we can now plot the data:

In [3]:
dataframe['fitbit'] = pd.to_numeric(dataframe['fitbit'])

dataframe_sub = dataframe[dataframe['fitbit'] > 100]
dataframe_sub = dataframe_sub[dataframe['oura'] > 0]


b, m = polyfit(dataframe_sub['oura'], dataframe_sub['fitbit'], 1)




plt.scatter(dataframe_sub['oura'],dataframe_sub['fitbit'], alpha=0.3)
plt.plot(dataframe_sub['oura'], b + m * dataframe_sub['oura'], '-')
plt.title('Oura vs Fitbit sleep duration')
plt.xlabel('Oura (minutes of sleep)')
plt.ylabel('Fitbit (minutes of sleep)')
plt.show()
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:4: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  after removing the cwd from sys.path.

The plot above is already somewhat cleaned up, as it removes all Fitbit data where Fitbit only measured >100 minutes of sleep, as the lower values are most likely clear mistakes. But interestingly, even with this cleaning up there's no strong correlation between the two! So what's going on here? Let's plot the data for both over time as well as the difference between the two to see if we can spot any issues.

In [4]:
dataframe_sub['oura_minus_fitbit'] = dataframe_sub['oura'] - dataframe_sub['fitbit']
dataframe_sub['date'] = pd.to_datetime(dataframe_sub['date'])
dataframe_sub.plot(x='date',y=['oura', 'fitbit', 'oura_minus_fitbit'])
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f799e786400>

Are we off-by-one?

Oh, that looks like there's something fishy going on indeed! If we look at the raw data from Fitbit & Oura it looks like the Fitbit & Oura data are shifted a tiny bit. Probably by a single day. So can we fix this by just shifting the Oura data by a single day to the right?

In [5]:
sleep_data = defaultdict(dict)

for sleep in oura['sleep']:
    if (sleep["summary_date"][:4] == "2019") or sleep["summary_date"][:4] == "2018":
        date = sleep['summary_date']
        date = date.split("-")
        date = [int(i) for i in date]
        date = datetime.date(year=date[0],month=date[1],day=date[2]) + datetime.timedelta(days=1)
        sleep_data[date.strftime(format="%Y-%m-%d")]['oura'] = sleep['total']/60

        
for sleep in fitbit_data['sleep-minutes']['2019']['sleep-minutesAsleep']:
    sleep_data[sleep['dateTime']]['fitbit'] = sleep['value']

for sleep in fitbit_data['sleep-minutes']['2018']['sleep-minutesAsleep']:
    sleep_data[sleep['dateTime']]['fitbit'] = sleep['value']
    
date = []
oura_values = []
fitbit = []

for k,v in sleep_data.items():
    date.append(k)
    if 'oura' in v.keys():
        oura_values.append(v['oura'])
    else:
        oura_values.append(None)
    if 'fitbit' in v.keys():
        fitbit.append(v['fitbit'])
    else:
        fitbit.append(None)
    
import pandas as pd
dataframe = pd.DataFrame(
    data = {
        'date': date,
        'oura': oura_values,
        'fitbit': fitbit
    }
)

dataframe['fitbit'] = pd.to_numeric(dataframe['fitbit'])

dataframe_sub = dataframe[dataframe['fitbit'] > 100]
dataframe_sub = dataframe_sub[dataframe['oura'] > 0]


b, m = polyfit(dataframe_sub['oura'], dataframe_sub['fitbit'], 1)




plt.scatter(dataframe_sub['oura'],dataframe_sub['fitbit'], alpha=0.3)
plt.plot(dataframe_sub['oura'], b + m * dataframe_sub['oura'], '-')
plt.title('Oura vs Fitbit sleep duration')
plt.xlabel('Oura (minutes of sleep)')
plt.ylabel('Fitbit (minutes of sleep)')
plt.show()
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:45: UserWarning: Boolean Series key will be reindexed to match DataFrame index.

And see there, all of a sudden the correlation looks much better! If we plot the data over time we can see a similar effect, where the data seems to align much better all of a sudden:

In [6]:
dataframe_sub['oura_minus_fitbit'] = dataframe_sub['oura'] - dataframe_sub['fitbit']
dataframe_sub['date'] = pd.to_datetime(dataframe_sub['date'])
dataframe_sub.plot(x='date',y=['oura', 'fitbit', 'oura_minus_fitbit'])
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f799d144cc0>

What the timezone?

So what's going on, why does Oura store the data for one day earlier compared to Fitbit? When creating our original data set we aligned the data using the sleep['summary_date'] dates for Oura and the sleep['dateTime'] values for Fitbit. So in theory at least they should already be aligned correctly! But apparently that's not the case, so we'll explore the metadata for the Oura records a bit closer! Let's check some of the metadata for the last data point we have for the Oura ring:

In [7]:
print("Oura's recorded start of bedtime: {}".format(oura['sleep'][-1]['bedtime_start']))
print("Oura's recorded end of bedtime: {}".format(oura['sleep'][-1]['bedtime_end']))
print('----')
print("Oura's recorded 'summary date' for this sleep: {}".format(oura['sleep'][-1]['summary_date']))
Oura's recorded start of bedtime: 2019-08-12T00:12:12+02:00
Oura's recorded end of bedtime: 2019-08-12T08:49:12+02:00
----
Oura's recorded 'summary date' for this sleep: 2019-08-11

Huh, that's strange. The start and end of my bedtime are clearly on the 12th of August in my own data set. Nevertheless the summary date is set to the 11th! What's going on here?! My first guess when things like this happen: Somewhere the handling of time zones is getting screwed up.

We can see that the start and end values do encode my current timezone (Central European Summer Time or UTC+2) as +02:00 at the end. If we shift the start of the bed time back by two hours to encode the value as UTC it would start on the 11th of August instead of the 12th.

Might this be the reason for the one-by-off summary_date? Luckily we can check that easily by comparing it to earlier sleep data I collected while living in California, which is UTC-7 during Daylight Saving Time. Which means that if I go to sleep at 11pm in California on a given date X, it should already be date X+1 in the UTC.

To check for this we'll pick a random date in April 2019:

In [8]:
print("Oura's recorded start of bedtime: {}".format(oura['sleep'][150]['bedtime_start']))
print("Oura's recorded end of bedtime: {}".format(oura['sleep'][150]['bedtime_end']))
print('----')
print("Oura's recorded 'summary date' for this sleep: {}".format(oura['sleep'][150]['summary_date']))
Oura's recorded start of bedtime: 2019-04-07T23:30:47-07:00
Oura's recorded end of bedtime: 2019-04-08T07:18:47-07:00
----
Oura's recorded 'summary date' for this sleep: 2019-04-07

This is where things are getting really confusing! The bedtime start is recorded as 2019-04-07T23:30:47-07:00, so the 7th of April at 11pm in California time. According to the hypothesis that Oura might use UTC dates for recording the summary_date, we'd expect it to be the 8th of April. But nope, it's marked down as the 7th!

So might Oura calculate the summary date based on the time zone in California instead of UTC time? Let's grab another example to see what's going on:

In [9]:
print("Oura's recorded start of bedtime: {}".format(oura['sleep'][0]['bedtime_start']))
print("Oura's recorded end of bedtime: {}".format(oura['sleep'][0]['bedtime_end']))
print('----')
print("Oura's recorded 'summary date' for this sleep: {}".format(oura['sleep'][0]['summary_date']))
Oura's recorded start of bedtime: 2018-11-06T22:51:27-08:00
Oura's recorded end of bedtime: 2018-11-07T08:04:27-08:00
----
Oura's recorded 'summary date' for this sleep: 2018-11-06

This seems to support our theory regarding using California's time zone: The start time of the sleep is recorded on the 6th of November in California time and the summary date matches. But are we just lucky here? Let's grab a third entry as an example:

In [10]:
print("Oura's recorded start of bedtime: {}".format(oura['sleep'][1]['bedtime_start']))
print("Oura's recorded end of bedtime: {}".format(oura['sleep'][1]['bedtime_end']))
print('----')
print("Oura's recorded 'summary date' for this sleep: {}".format(oura['sleep'][1]['summary_date']))
Oura's recorded start of bedtime: 2018-11-08T00:25:00-08:00
Oura's recorded end of bedtime: 2018-11-08T07:39:00-08:00
----
Oura's recorded 'summary date' for this sleep: 2018-11-07

And here our theory of using California time breaks down. The sleep start date is already on the 8th of November, but the summary_date is still recorded as the 7th?! I'm at a loss here and seems that timezone magic isn't to blame here.

My best guess at this point is that the Oura data collection just likes to add your night's sleep through the summary_date to the day that preceded the actual night's sleep. While Fitbit goes for attaching the sleep to the new day in which you woke up. To make things even more confusing Oura's own app displays the sleep it stores for a summary date of X in their own app as belonging to date X+1.

tl;dr (too long, didn't read)

Oura & Fitbit have different approaches to which date they associate their sleep data:

  • Oura stores it (roughly) associated to the day on which you've gone to sleep (or should have gone to sleep as it will even put it to the day before if you went to sleep after midnight).
  • Fitbit (roughly) stores the sleep data associated to the day on which you woke up.

Be cautious when trying to merge data from both sources