app-related-stress-rescuetime-apple-health.ipynb
Does using different applications/program have an influence on my stress levels? I was wondering whether I could combine my computer usage data from RescueTime (which records which programs and apps I use and when I use them) with the heart rate recordings my Apple Watch to find out, thinking that higher heart rates might be a sign of more stress when using them?
If you want to run this notebook and run into problems or have questions: Reach out to Bastian on Twitter or Slack
I was wondering whether I could combine my computer usage data from RescueTime (which records which programs and apps I use and when I use them) with the heart rate recordings my Apple Watch makes in semi-regular intervals. The idea being that higher heart rates might be a sign of more stress when using them?
Combining the data isn't fully trivial for a number of reasons:
Luckily RescueTime saves data by just using the local system time and Apple HealthKit saves in local time (plus saving the UTC-offset), which means those values are easily alignable (point 3: check!). Point 2 isn't too bad either, as each HR recording will fall within a given 5-minute window that RescueTime reports, so we can fit those nicely too.
Point 1 is a bit trickier: For each 5 minute bin RescueTime reports how many seconds you spent on a given app/program within that window. So there's some judgement calls to be made about how many seconds inside a given window are enough to justify assigning that HR record to an app. For now I decided to with "at least 101 seconds", which comes out as just above 1/3rd of the time available in that interval. Which means that at most 2 different applications can be assigned the same heart rate record. But I'd be happy to hear arguments or ideas for different ways of addressing this!
This notebook makes use of data from a two sources. If you want to run this analysis for your data you need the following data sources connected to your Open Humans account:
We get started by importing the raw data for RescueTime & Apple Health into our notebook:
The code below takes the raw data from RescueTime and converts it into a dataframe/table.
For my own data RescueTime only provided a one-hour resolution before the 9th of July 2019. Since then the data has a 5-minute resolution. As the hour-long values are hard to interpret/align with heart rate recordings in a meaningful way we limit our analysis to data points recorded to during the 5-minute interval range. (You can adjust this by editign the STARTING_DATE
variable below).
Now we import the heart rate data from Apple Health. We also remove older heart rate recordings from before the STARTING_DATE
, just to keep the size of the data more manageable. We also drop the timezone information from the HR recordings as the local time is what we're after.
Now for the important step, merging the two tables for the RescueTime usage and the heart rate recordings. We do this by matching heart rate recordings that are within a 3 minute window of the RescueTime timestamps.
Now we got our merged dataframe/table. We removed all records for which we didn't have heart rate information, making the data set even more manageable. Below are two records of how this new joint table looks like, we have information for the time spent in a given activity, the activity itself, its category and how productive that application is (scored from -2 to +2). And last but not least we have the heart rate:
We're using R
and ggplot2
(with the ggridges
extension) to make some nice visualizations of our data.
First, we load the R environment and install/load the packages we need:
As there are tons of activities in this time frame – too many to plot them – we only highlight those activities that were used more than 30,000 seconds in total (a bit over 8 hours of usage), which should give us a manageable sized list of activities. Similarly, for the categories we only use those with at least 40,000 seconds usage in total (over 11 hours of usage).
And here's our joint plot of the heart rate in relation to different applications or application types!
Our resting HR can vary over time, and in my case I know it has been dropping quite a bit in the last 1 1/2 years. So let's see if we can normalize our data by just taking the "excess" HR by taking the actual HR records minus the resting HR.
If you want to run this notebook and run into problems or have questions: Reach out to Bastian on Twitter or Slack
I was wondering whether I could combine my computer usage data from RescueTime (which records which programs and apps I use and when I use them) with the heart rate recordings my Apple Watch makes in semi-regular intervals. The idea being that higher heart rates might be a sign of more stress when using them?
Combining the data isn't fully trivial for a number of reasons:
Luckily RescueTime saves data by just using the local system time and Apple HealthKit saves in local time (plus saving the UTC-offset), which means those values are easily alignable (point 3: check!). Point 2 isn't too bad either, as each HR recording will fall within a given 5-minute window that RescueTime reports, so we can fit those nicely too.
Point 1 is a bit trickier: For each 5 minute bin RescueTime reports how many seconds you spent on a given app/program within that window. So there's some judgement calls to be made about how many seconds inside a given window are enough to justify assigning that HR record to an app. For now I decided to with "at least 101 seconds", which comes out as just above 1/3rd of the time available in that interval. Which means that at most 2 different applications can be assigned the same heart rate record. But I'd be happy to hear arguments or ideas for different ways of addressing this!
This notebook makes use of data from a two sources. If you want to run this analysis for your data you need the following data sources connected to your Open Humans account:
We get started by importing the raw data for RescueTime & Apple Health into our notebook:
from ohapi import api
import os
import requests
import json
import pandas as pd
from datetime import datetime
import arrow
member = api.exchange_oauth2_member(os.environ.get('OH_ACCESS_TOKEN'))
for f in member['data']:
if f['source'] == "direct-sharing-149":
rescuetime_data = json.loads(requests.get(f['download_url']).content)
if f['source'] == 'direct-sharing-453':
hr_df = pd.read_csv(f['download_url'],names=['heart_rate', 'time', 'type'])
The code below takes the raw data from RescueTime and converts it into a dataframe/table.
For my own data RescueTime only provided a one-hour resolution before the 9th of July 2019. Since then the data has a 5-minute resolution. As the hour-long values are hard to interpret/align with heart rate recordings in a meaningful way we limit our analysis to data points recorded to during the 5-minute interval range. (You can adjust this by editign the STARTING_DATE
variable below).
STARTING_DATE = '2019-07-09'
date = []
time_spent_seconds = []
activity = []
category = []
productivity = []
for element in rescuetime_data['rows']:
date.append(element[0])
time_spent_seconds.append(element[1])
activity.append(element[3])
category.append(element[4])
productivity.append(element[5])
date = [datetime.strptime(dt,"%Y-%m-%dT%H:%M:%S") for dt in date]
rt_df = pd.DataFrame(data={
'date': date,
'time_spent_seconds': time_spent_seconds,
'activity': activity,
'category': category,
'productivity': productivity
})
rt_df_filtered = rt_df[rt_df['date'] > datetime.fromisoformat(STARTING_DATE)]
rt_df_filtered['time'] = rt_df_filtered['date']
rt_df_filtered = rt_df_filtered.sort_values(by='time')
Now we import the heart rate data from Apple Health. We also remove older heart rate recordings from before the STARTING_DATE
, just to keep the size of the data more manageable. We also drop the timezone information from the HR recordings as the local time is what we're after.
hr_df['time'] = pd.to_datetime(hr_df['time'].str[:-6],utc=True)
resting_hrs = {}
for index, row in hr_df[hr_df['type']=='R'].iterrows():
day = str(row['time'])[:10]
resting_hrs[day] = row['heart_rate']
hr_df = hr_df[hr_df['type']=='H']
hr_df = hr_df.sort_values(by='time')
hr_df = hr_df.set_index(hr_df['time'])
hr_df = hr_df.tz_convert(None)
hr_df = hr_df['heart_rate']
hr_df = hr_df.reset_index()
hr_df = hr_df[hr_df['time'] > datetime.fromisoformat(STARTING_DATE)]
def baseline_hr(row):
day = str(row['time'])[:10]
if day in resting_hrs.keys():
return row['heart_rate'] - resting_hrs[day]
else:
return None
rt_df_filtered.head()
hr_df['hr_normalized'] = hr_df.apply(lambda row: baseline_hr(row), axis=1)
Now for the important step, merging the two tables for the RescueTime usage and the heart rate recordings. We do this by matching heart rate recordings that are within a 3 minute window of the RescueTime timestamps.
merged_df = pd.merge_asof(rt_df_filtered,hr_df, on='time', tolerance=pd.Timedelta('3min'), allow_exact_matches=False)
merged_df = merged_df[merged_df['heart_rate'].notna()]
Now we got our merged dataframe/table. We removed all records for which we didn't have heart rate information, making the data set even more manageable. Below are two records of how this new joint table looks like, we have information for the time spent in a given activity, the activity itself, its category and how productive that application is (scored from -2 to +2). And last but not least we have the heart rate:
merged_df.head(2)
We're using R
and ggplot2
(with the ggridges
extension) to make some nice visualizations of our data.
First, we load the R environment and install/load the packages we need:
%load_ext rpy2.ipython
%%R -i merged_df -w 10 -h 10 --units in
library(ggplot2)
install.packages('ggridges', repos='http://cran.us.r-project.org')
install.packages('cowplot', repos='http://cran.us.r-project.org')
library(ggridges)
library(cowplot)
library(tidyverse)
library(lubridate)
%%R
merged_df$hour <- hour(merged_df$time)
merged_df$minute <- minute(merged_df$time)
merged_df$time <- merged_df$hour + merged_df$minute/60