Details for iMoodJournal_visualization.ipynb

Published by Wenqiu999

Description

Visualize the data exported from iMoodJournal.

0

Tags & Data Sources

visualization mood change mood variance iMoodJournal

Comments

Please log in to comment.

Notebook
Last updated 4 weeks ago

This notebook is used to analyze mood changed over time based on iMoodJournal data. It used a data file exported from iMoodJournal. You can upload your own Mood data and use the code to analyze. And This notebook is free to reuse and adapt, distributed under an MIT license: https://opensource.org/licenses/MIT

Let's import packages first.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import time

Load the data.

In [2]:
Mood = pd.read_csv('mood-Jul 8, 2019.csv', index_col=False)

In order to organize and access the data using date index, we will transform all columns that are related to time into a DateIndex called DateTime.

In [3]:
date_strings = ['%d. %b %Y', '%b %d, %Y']
date_format = None
datetime_format = None

Mood['Time'] = Mood['Hour'].map(str).str.cat(Mood['Minute'].map(str), sep = ':')
Mood['Date'] = Mood['Date'].map(str)
Mood['DateTime'] = Mood['Date'].str.cat(Mood['Time'], sep=' ')


while date_strings:
    date_format_test = date_strings.pop()
    datetime_string = '{} %H:%M'.format(date_format_test)
    try:
        Mood['DateTime']=pd.to_datetime(Mood['DateTime'], format=datetime_string)
        date_format = date_format_test
        datetime_format = datetime_string
        break
    except ValueError:
        continue

if not datetime_format:
    raise Exception('Failed to parse datetime - maybe we need another datetime_string?')


mood = Mood.set_index('DateTime', drop= False)
DateTime = mood.pop('DateTime')
mood.insert(0, 'DateTime', DateTime)
Time = mood.pop('Time')
mood.insert(1, 'Time', Time)

First, let's start with a line plot of the full time period to show the changes over time.

In [4]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})

mood['Level'].plot(linewidth=1);
plt.show()

Now, let's take a look at how our mood changes daily. Calculate the average level of daily mood.

In [5]:
moodlevel_daily= mood['Level'].resample('D')
moodlevel_daily_mean = moodlevel_daily.mean()

moodlevel_daily_mean.plot(linewidth=1);
plt.show()

Then, compare the daily average mood with the mood log.

In [6]:
fig, ax = plt.subplots(figsize=(30, 14))
ax.plot(mood['Level'],
marker='.', linestyle='-', linewidth=0.5, label='Mood Level')
ax.plot(moodlevel_daily_mean,
marker='o', markersize=8, linestyle='-', label='Daily Mean')
plt.show()

To explore how your mood changes everyday and also compare across the whole period, let's use heat map to plot in a more colorful way. First, we need to convert the incomplete time list into a complete time list. The mood level of the moment will be recorded as NAN if there isn't any data point at that time.

In [7]:
mood.loc[mood.Minute>30,'Hour']= mood['Hour'] + 1        
mood.head(10)
hourly_mood = mood[['Date','Day of week','Hour','Level']]
In [8]:
def get_date_list(begin_date,end_date):
    date_list = [x.strftime(datetime_format) for x in list(pd.date_range(start=begin_date, end=end_date, freq='H'))]
    return date_list

begin_date = Mood.iloc[[0]]['Date'].apply(lambda x: datetime.strptime(x,date_format))
end_date = Mood.iloc[[-1]]['Date'].apply(lambda x: datetime.strptime(x,date_format))
Time_list = pd.DataFrame({'DateTime':get_date_list(begin_date[0],end_date[len(Mood)-1])})
Time_list['DateTime'] = pd.to_datetime(Time_list['DateTime'], format=datetime_format)
Time_list['Time'] = Time_list['DateTime'].apply(lambda x: x.strftime(date_format)) + ' ' + Time_list['DateTime'].apply(lambda x: x.strftime('%H'))
Time_list['Date']= Time_list['DateTime'].apply(lambda x: x.strftime(date_format) )

hourly_mood['Date'] = pd.to_datetime(hourly_mood['Date'], format=date_format)
hourly_mood['Time'] = hourly_mood['Date'].apply(lambda x: x.strftime(date_format)) + ' ' + hourly_mood['Hour'].apply(lambda x: str(x))
hourly_mood = hourly_mood.drop(['Date'],axis=1)
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:13: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
In [9]:
Hourly_mood = pd.merge(Time_list, hourly_mood, how='left', on='Time' )
Hourly_mood_heatmap = Hourly_mood.drop(['Time','Date','Day of week','Hour'], axis=1)
Hourly_mood_heatmap = Hourly_mood_heatmap.set_index('DateTime')

Now, we have our complete hourly mood log. Let's transpose the matrix first.

In [10]:
groups = Hourly_mood_heatmap.groupby(pd.Grouper(freq='D'))
Hourly_mood_heatmap = pd.concat([pd.DataFrame(x[1].values) for x in groups], axis=1)
Hourly_mood_heatmap= pd.DataFrame(Hourly_mood_heatmap)
Hourly_mood_heatmap.columns = Hourly_mood['Date'].drop_duplicates(keep='first', inplace=False)

Let's use heatmap to provide a more intuitive, left-to-right data layout, with each row representing the hour and each column representing the day. Color Red stands for good mood, the more red the better. Color Blue stands for bad mood, while color Green means your mood was so-so at that time.

In [11]:
plt.matshow(Hourly_mood_heatmap, interpolation=None, cmap='jet', vmin=1, vmax=8)
plt.xlabel('Date',fontsize=14)
plt.ylabel('Time of a Day',fontsize=14)
plt.xticks(np.arange(Hourly_mood_heatmap.shape[1]),Hourly_mood_heatmap.columns, rotation=90)
plt.show()

We can also further explore your mood changes within one day. For example, We can see there are lots of changes on 2019-5-19. Let's get your mood log of that day first.

In [12]:
Daily_mood = mood.loc['2019-6-19']

Let's try to plot your mood changes within that day.

In [13]:
Daily_mood['Level'].plot(linewidth=1)
plt.show()

To further explore the variation of the mood during this period, we will try to use Pareto chart to highlight the most representative mood levels over the whole period. First, we need to caculate the frequency of the levels of mood.

In [14]:
Levels = mood.groupby('LevelText', as_index=False)[['Date']].count()
#Levels['LevelText'] = Levels['LevelText'].apply(str)
Levels
Out[14]:
LevelText Date
0 Bad 20
1 Good 127
2 Great 3
3 Meh 21
4 Okay 130
5 So-so 58
6 Very bad 1
7 Very good 30

Then let's plot the frequency in Bar chart.

In [15]:
Levels.plot(kind='bar', x='LevelText', y='Date', legend=None, title='Frequency of Mood levels')
plt.show()

Now, we can use a pareto chart to represnt both the frequency and the cumulative percentage of the mood levels.

In [16]:
def create_pareto_plot(df, x=None, y=None, title=None, show_pct_y=False, pct_format='{0:.0%}'):
    xlabel = x
    ylabel = y
    tmp = df.sort_values(y, ascending=False)
    x = tmp[x].values
    y = tmp[y].values
    weights = y.cumsum() / y.sum()
    
    
    fig, ax1 = plt.subplots()
    ax1.bar(x, y)
    ax1.set_xlabel(xlabel)
    ax1.set_ylabel(ylabel)

    ax2 = ax1.twinx()
    ax2.plot(x, weights, '-ro', alpha=0.5)
    ax2.set_ylabel('', color='r')
    ax2.tick_params('y', colors='r')
    
    vals = ax2.get_yticks()
    ax2.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
    
    formatted_weights = [pct_format.format(x) for x in weights]
    for i, txt in enumerate(formatted_weights):
        ax2.annotate(txt, (x[i], weights[i]), fontweight='heavy')    
 
    if not show_pct_y:
        ax2.set_yticks([])
        
    if title:
        plt.title(title)
    
    plt.tight_layout()
    plt.show()
In [17]:
create_pareto_plot(Levels, x='LevelText', y='Date', title='Pareto Chart of Mood Level Frequency')

Events happened in someone's daily life can influence his/her mood. Let's highlight the period and add annotations to help you understand your own mood. Please input the time period and the event happened during the period below.

In [18]:
time_periods = [
    {
        "start": "2019-07-01",
        "end": "2019-07-04",
        "label": "something happened",
    },
]

Now, we can grab this period and the event.

In [19]:
period_selected = mood[time_periods[0]["start"]:time_periods[0]["end"]]
period_selected_event = time_periods[0]["label"]

Let's take a look at the peak and nadir of your mood during this period.

In [20]:
mood_period_max = period_selected['Level'].max()
mood_period_max_idx = period_selected['Level'].idxmax(axis=0, skipna=True)
print('The moment that your felt best during the period:',mood_period_max_idx)
#mood_period_max_event = input('please input the event happened when you felt best during the period:')

mood_period_min = period_selected['Level'].min()
mood_period_min_idx= period_selected['Level'].idxmin(axis=0, skipna=True)
print('The moment that your felt worse during the period:',mood_period_min_idx)
#mood_period_min_event = input('please input the event happened when you felt worse during the period:')
The moment that your felt best during the period: 2019-07-02 12:20:00
The moment that your felt worse during the period: 2019-07-04 20:09:00

According to the time points above, you can input the events happened at that moment below.

In [21]:
events = [
    {
        "event_mood_max": "thing A",
        "event_mood_min": "thing B"
    },
]

Now, we can highlight the period you selected and add your notes to the plot.

In [22]:
fig, ax = plt.subplots(figsize=(30, 14))
ax.plot(mood['Level'],marker='.', linestyle='-', linewidth=0.5, label='Mood Level')
ax.plot(moodlevel_daily_mean,marker='o', markersize=8, linestyle='-', label='Daily Mean')
ax.axvspan(time_periods[0]["start"], time_periods[0]["end"], color=sns.xkcd_rgb['grey'], alpha=0.5)
ax.set_title('Mood Changes Over Time')

ax.set_ylabel('Mood')
ax.set_xlabel('Date')

ax.legend(loc='upper left', fontsize=11, frameon=True).get_frame().set_edgecolor('blue')  

bbox_props0 = dict(boxstyle='square, pad=0.6', fc='mediumvioletred', ec='r', alpha=.4, lw=.5)

ax.text(time_periods[0]["start"], 9, 'Event happened during this period:\n{}'.format(period_selected_event) , size=12,ha='left',
        family = 'serif', color='yellow', style = 'italic', weight = 'bold', bbox = bbox_props0)

bbox_props1 = dict(boxstyle='round4, pad=0.6', fc='cyan', ec='b', lw=.5)

ax.annotate('Mood Max = {}\nEvent = {}\nDate = {}'
                 .format(mood_period_max, events[0]["event_mood_max"], mood_period_max_idx.strftime('%a, %Y-%m-%d')),
            fontsize=12,
            fontweight='demi',
            xy=(mood_period_max_idx, mood_period_max),  
            xycoords='data',
            xytext=(-150, -30),      
            textcoords='offset points',
            arrowprops=dict(arrowstyle="->"), bbox=bbox_props1)    

ax.annotate('Mood Min = {}\nEvent = {}\nDate = {}'
                 .format(mood_period_min, events[0]["event_mood_min"], mood_period_min_idx.strftime('%a, %Y-%m-%d')),
            fontsize=12,
            fontweight='demi',
            xy=(mood_period_min_idx, mood_period_min),  
            xycoords='data',
            xytext=(-150, 30),      
            textcoords='offset points',
            arrowprops=dict(arrowstyle="->"), bbox=bbox_props1) 

plt.tight_layout()

Let's take a look at the tags you added now.

In [23]:
tags = list(mood.columns.values)[10:]
In [24]:
tag_sum = pd.DataFrame(mood[tags].apply(lambda x: x.sum()))
tag_sum['tags'] = tag_sum.index.values 
tag_sum.columns = ['frequency','tags']

First, Let's try to plot a bar chart to explore the frequencies of the tags you used.

In [25]:
tag_sum.plot(kind='bar',x='tags',y='frequency', legend=None, title='Frequencies of Mood Tags')
plt.show()

Let's split the tags into good-mood-related tags and bad-mood-related tags and plot the frequencies of tags in two groups. We can tell if there is any tag highly related to good/bad mood.

In [26]:
mood_good = mood[mood["Level"]>=6]
mood_bad = mood[mood["Level"]<6]
In [27]:
goodmood_tag_sum = pd.DataFrame(mood_good[tags].apply(lambda x: x.sum()))
goodmood_tag_sum['tags'] = goodmood_tag_sum.index.values 
goodmood_tag_sum.columns = ['frequency','tags']
badmood_tag_sum = pd.DataFrame(mood_bad[tags].apply(lambda x: x.sum()))
badmood_tag_sum['tags'] = badmood_tag_sum.index.values 
badmood_tag_sum.columns = ['frequency','tags']
In [28]:
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, sharex=True, sharey=True)
goodmood_tag_sum.plot(kind='bar',x='tags',y='frequency', legend=None, ax=ax1)
ax1.set_title('Frequencies of Good Mood Tags')
badmood_tag_sum.plot(kind='bar',x='tags',y='frequency', legend=None, ax=ax2) 
ax2.set_title('Frequencies of Bad Mood Tags')
plt.show()

To present the frequecies of tags in another way, we can use word cloud. The bigger the font of the tag is, the more frequent you used this tag.

In [29]:
!pip install wordcloud
Collecting wordcloud
  Using cached https://files.pythonhosted.org/packages/ae/af/849edf14d573eba9c8082db898ff0d090428d9485371cc4fe21a66717ad2/wordcloud-1.5.0-cp36-cp36m-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.6.1 in /opt/conda/lib/python3.6/site-packages (from wordcloud)
Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from wordcloud)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.5.0
You are using pip version 9.0.1, however version 19.2.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [30]:
from PIL import Image, ImageSequence
from wordcloud import WordCloud

def DrawWordcloud(df):
    wc = WordCloud(background_color = 'White',width=1000, height=860, margin=2)
    name = list(df.tags)
    value = df.frequency
    for i in range(len(name)):
        name[i] = str(name[i])
    dic = dict(zip(name, value))
    wc.generate_from_frequencies(dic)
    plt.imshow(wc)
    plt.axis("off")
    plt.show()
    wc.to_file('Wordcloud.png')

DrawWordcloud(tag_sum)
In [31]:
mood_perminute = mood   
mood_perminute['Day of week'] = mood_perminute.index.weekday_name
mood_perminute['Date'] = mood_perminute.index.date
mood_perminute['Time'] = mood_perminute.index.time
mood_perminute.loc[mood_perminute.Minute>30,'Hour']= mood_perminute['Hour'] - 1   
mood_perminute['TimeStamp'] = mood_perminute['Hour'] + mood_perminute['Minute']/60
In [32]:
weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday']
weekend = ['Saturday','Sunday']
weekday_mood = mood_perminute.loc[mood_perminute['Day of week'].isin(weekday)]
weekend_mood = mood_perminute.loc[mood_perminute['Day of week'].isin(weekend)]

weekday_mood_hourly_mean = weekday_mood.groupby('Hour')['Level'].mean()
weekend_mood_hourly_mean = weekend_mood.groupby('Hour')['Level'].mean()

weekday_rolling = weekday_mood_hourly_mean.rolling(3, center=True).mean()
weekend_rolling = weekend_mood_hourly_mean.rolling(3, center=True).mean()
In [33]:
fig, ax = plt.subplots(figsize=(30, 10))
for Date, selection in weekday_mood.groupby("Date"):
    selection.plot(x='TimeStamp', y='Level', ax=ax, marker='o', markersize=8, linestyle='None', color ='salmon', legend=False)
    
for Date, selection in weekend_mood.groupby("Date"):
    selection.plot(x='TimeStamp', y='Level', ax=ax,marker='o', markersize=8, linestyle='None', color= 'lightskyblue', legend=False)

ax.plot(weekday_mood_hourly_mean,color='r', linewidth=2, label='Weekday Hourly Average Mood')
ax.plot(weekend_mood_hourly_mean,color='b', linewidth=2, label='Weekend Hourly Average Mood')
ax.plot(weekday_rolling, color='r', linewidth=20, alpha = 0.2, label='Rolling Mean')
ax.plot(weekend_rolling, color='b', linewidth=20, alpha = 0.2, label='Rolling Mean')
ax.set_xlabel('TimeStamp')
ax.set_ylabel('Mood')
plt.show()
In [34]:
fig, ax = plt.subplots(figsize=(30, 15))
dayofweek = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
sns.boxplot(data = mood, x = 'Day of week', y = 'Level')
dayofweek_average_mood = mood.groupby('Day of week')['Level'].apply(lambda x: x.mean())
ax.plot(dayofweek_average_mood,color='r', linewidth=5, label='Average Mood' )
ax.set_ylabel('Mood')
ax.set_xlabel('Day of Week')
plt.xticks(np.arange(len(dayofweek)),