Details for mapping-noise-overland-apple-watch.ipynb

Published by gedankenstuecke


This notebook expects environmental noise tracking data from an Apple Watch as an additional data source in addition to GPS data from Overland. It then maps out where in your environment you encounter noise.


Tags & Data Sources

apple watch environmental noise mapping noise Overland connection


Please log in to comment.

Last updated 6 months ago

Where do I encounter environmental noise?

I saw that the latest Apple Watch hardware/software passively keeps track of environmental noise you encounter. I thought it would be interesting to see where around the city (in my case in Paris) I encounter environmental noise.

Prerequisites for this notebook

This notebook makes use of two data sources:

  1. The Overland connection for Open Humans. It passively tracks your GPS data and stores the data in Open Humans.

  2. The environmental noise data as collected by your Apple Watch. Right now there is no easy way to perform this extraction of data through Open Humans. Instead you will have to manually export the data from your phone, process it locally on your computer and then upload a correctly formatted file. Otherwise this notebook will not be able to run.

A description of how to get your environmental noise data from your Apple Watch can be found further down in this notebook. This notebook itself is written in R to perform the analysis ^ visualization of the data.

Getting started

For a start let's load our required packages. This can take a bit of time, as two packages need to be installed.

In [1]:
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘purrr’

The following object is masked from ‘package:jsonlite’:


Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:


Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Google's Terms of Service:
Please cite ggmap if you use it! See citation("ggmap") for details.
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following object is masked from ‘package:purrr’:


With this out of the way, we can in a first step load our Overland data from Open Humans. As the GPS records are can grow pretty large, each Year-Month will get it's own file. You can select 3 months of data by editing the year-month data in the bit below, to make sure to grab the data you are interested in. In my case I'm getting the data from October to December 2019.

In [2]:
month <- '2019-10'
month2 <- '2019-11'
month3 <- '2019-12'

Now we can start downloading the data. In the end this data will be stored in the variable loc

In [3]:
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")
month <- paste('overland-data-',month,sep='')

for (data_source in user$data){
    if (grepl(month, data_source$basename)){
        loc <- read.csv(url(data_source$download_url))
    if (grepl(month2, data_source$basename)){
        loc2 <- read.csv(url(data_source$download_url))
    if (grepl(month3, data_source$basename)){
        loc3 <- read.csv(url(data_source$download_url))


loc <- rbind(loc, loc2)
loc <- rbind(loc, loc3)
loc$velocity <- loc$speed
loc$date <- loc$timestamp
loc$lon <- loc$longitude
loc$lat <- loc$latitude

2.369964 48.88470 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:25Z10 Bbox-31F92D2C -1 2019-09-30T23:58:25Z2.369964 48.88470
2.369953 48.88472 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:31Z10 Bbox-31F92D2C -1 2019-09-30T23:58:31Z2.369953 48.88472
2.369967 48.88470 other 52 0.31 charging 0 100 9 stationary False 0 0 2019-09-30T23:58:35Z 9 Bbox-31F92D2C 0 2019-09-30T23:58:35Z2.369967 48.88470
2.369960 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:54Z10 Bbox-31F92D2C -1 2019-09-30T23:58:54Z2.369960 48.88469
2.369982 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:59Z10 Bbox-31F92D2C -1 2019-09-30T23:58:59Z2.369982 48.88469
2.369866 48.88475 other 47 0.34 charging 0 100 65 stationary False 0 -1 2019-10-01T00:00:39Z10 Bbox-31F92D2C -1 2019-10-01T00:00:39Z2.369866 48.88475

We have columns for our latitude & longitude, along with data on if/how we moved around and the speed. For further analyses we might be interested in a number of things:

  1. Was the given data collected on a weekend or weekday?
  2. At which hour was the data collected?

That way we can plot our maps in a way that tells us when we were at a given space, helping us to better understand the noise measured at that time. The cell below performs this processing:

In [4]:
loc$timestamp <- as.POSIXct(loc$timestamp,format="%Y-%m-%dT%H:%M:%SZ")
loc$weekday <- weekdays(loc$timestamp)
loc$weekend <- loc$weekday %in% c('Sunday','Saturday')
loc$weekend <- ifelse(loc$weekend, 'weekend', 'weekday')
loc$hour <- hour(loc$timestamp)

Now comes one of the most tricky parts of using this notebook: For the visualization to properly work you need to define the boundaries of the map, by giving the correct boundary_ values below. Those are the latitude & longitude values which will define how big/small the map piece is we will see.

There is no easy way at this point to find 'good' boundaries and it will take some fiddeling around with those numbers to get the map you are actually interested in. The values provided by default give a good view of central Paris, but you are likely interested in a different place.

In [5]:

my_map <- get_stamenmap(bbox=c(boundary_west,
42 tiles needed, this may take a while (try a smaller zoom).
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :
Source :

The cell above will download the map according to your boundaries. Run the cell below to see the map and evaluate whether it matches the area you are interested in. Otherwise adjust the boundaries above, run the cell above again and then plot again to see if you're having the right area. Rinse & repeat until you are happy with the map itself.

In [6]:

Loading the noise data

Okay, you are happy with the map. Now it's time to load the Noise data that you got from your Apple Watch. To export the data from your iPhone you have to open the Health app and then click on your user profile image, from there you will get an option to export the data. A more detailed instruction on where to find it can be found here.

Creating this export will take a while, depending on how much data is in your phone. In my case it took between 5-10 minutes. Once this file is created you get a regular iOS sharing option. beware: the export you create will be a Zip file that will potentially be big! My own Zip archive with all health data was 117 MB (and blew up to over 2 GB after the unzipping)!

The best way forward with this data is to Airdrop it to a Mac, if you have one handy. Once that is done you should open your terminal and process the data inside the export:

cd apple_health_export
cat export.xml|grep dBASPL|grep -v Headphone|grep Record |sed "s/.*startDate=\"//"|sed "s/\" endDate=\"/,/"|sed "s/\" value=\"/,/"|sed "s/\"\/>//" > environmental_noise.csv

This will unzip the whole Apple Health archive, go into the folder this creates and then process the large XML dump with all data.

It finds all data points for environmental noise and stores those records as a simple CSV file with 3 columns:

  1. Start date/time of recording
  2. End date/time of recording
  3. Noise level in dB

Upload this data to your own notebook server and then you can run the code below to read it:

In [7]:
noise <- read.csv(file='environmental_noise.csv',head=TRUE)
noise$start <- as.POSIXct(noise$start)
noise$end <- as.POSIXct(noise$end)
noise$diff <- noise$end - noise$start
noise$halfway <- noise$diff / 2
noise$halfway <- noise$start + noise$halfway
noise$datetime <- noise$halfway

In addition loading the data, this also identifies the halfway date/time point of each data point (individual recordings can have a total length of around 30 minutes. By calculating the halfway point we just pretend that the dB value was recorded in the middle of it.

Now we can look at our noise data:

In [8]:
2019-09-26 11:39:482019-09-26 12:09:4680.5925 1798 secs 2019-09-26 11:54:472019-09-26 11:54:47
2019-09-26 12:09:462019-09-26 12:39:4665.5902 1800 secs 2019-09-26 12:24:462019-09-26 12:24:46
2019-09-26 12:39:462019-09-26 13:09:4660.8837 1800 secs 2019-09-26 12:54:462019-09-26 12:54:46
2019-09-26 13:09:462019-09-26 13:39:4655.3789 1800 secs 2019-09-26 13:24:462019-09-26 13:24:46
2019-09-26 13:39:462019-09-26 14:09:4163.3795 1795 secs 2019-09-26 13:54:432019-09-26 13:54:43
2019-09-26 14:09:412019-09-26 14:39:3666.8587 1795 secs 2019-09-26 14:24:382019-09-26 14:24:38

Merging the GPS & Noise data

We're close to doing our first map. The only thing we need to do is to join the data. We do this by matching each GPS entry we recorded to the noise recording that was done most closely to the recording of that GPS data point. As the Noise data is most likely much more coarse grained than the GPS data, we will end up assigning the same noise level recording to many GPS points, but that's the best we can do.

In [9]:
loc$datetime <- loc$timestamp
loc$noise_level <- setDT(noise)[loc, noise_level, roll = "nearest", on = "datetime"]

Time to map!

For a start let's look at the noise levels in rough categories across town. To this end we bin the individual dB values into different groups

  1. 0-40 dB (really quiet)
  2. 40-70 dB (conversational levels)
  3. 70-85 db (this is close to the boundary of being too loud)
  4. 85+ dB (definitely too loud for longer periods of time)

For each of those categories we create one map, showing where most of those recordings where done:

In [10]:
options(repr.plot.width=30, repr.plot.height=7)
noisebreaks <- c(0,40,70,85,120)
noiselabels <- c("0-40 dB",'40-70 dB',"70-85 dB","85+ dB")

setDT(loc)[ , noisegroups := cut(noise_level, 
                                breaks = noisebreaks, 
                                right = FALSE, 
                                labels = noiselabels)]
options(repr.plot.width=30, repr.plot.height=20)
ggmap(my_map) + 
    geom_bin2d(data = subset(loc, 
                                    loc$motion %in% c('stationary', 'walking', 'cycling')), 
                    aes(x = lon, 
                        y = lat,
                       fill = (..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..]),bins=60, alpha=0.7) + 
  theme(legend.position = "right") + theme_minimal(base_size=25) + facet_grid(. ~ noisegroups) + scale_fill_continuous('Frequency')
Warning message:
“Removed 10501 rows containing non-finite values (stat_bin2d).”