Details for iMoodJournal_visualization.ipynb

Published by Wenqiu999

Description

Visualize the data exported from iMoodJournal.

0

Tags & Data Sources

visualization mood change mood variance iMoodJournal

Comments

Please log in to comment.

Notebook
Last updated 2 days, 1 hour ago

This notebook is used to analyze mood changed over time based on iMoodJournal data. It used a data file exported from iMoodJournal.Let's import packages first.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

Load the data.

In [2]:
Mood = pd.read_csv('mood-Jul 8, 2019.csv', index_col=False)

In order to organize and access the data using date index, we will transform all columns that are related to time into a DateIndex called DateTime.

In [3]:
date_strings = ['%d. %b %Y', '%b %d, %Y']
date_format = None
datetime_format = None

Mood['Time'] = Mood['Hour'].map(str).str.cat(Mood['Minute'].map(str), sep = ':')
Mood['Date'] = Mood['Date'].map(str)
Mood['DateTime'] = Mood['Date'].str.cat(Mood['Time'], sep=' ')



while date_strings:
    date_format_test = date_strings.pop()
    datetime_string = '{} %H:%M'.format(date_format_test)
    try:
        Mood['DateTime']=pd.to_datetime(Mood['DateTime'], format=datetime_string)
        date_format = date_format_test
        datetime_format = datetime_string
        break
    except ValueError:
        continue

if not datetime_format:
    raise Exception('Failed to parse datetime - maybe we need another datetime_string?')

    
mood = Mood.set_index('DateTime')
mood = mood.drop(['Time'], axis=1)

First, let's start with a line plot of the full time period to show the changes over time.

In [4]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})

mood['Level'].plot(linewidth=1);
plt.show()

Now, let's take a look at how our mood changes daily. Calculate the average level of daily mood.

In [5]:
moodlevel_daily= mood['Level'].resample('D')
moodlevel_daily_mean = moodlevel_daily.mean()

moodlevel_daily_mean.plot(linewidth=1);
plt.show()

Then, compare the daily average mood with the mood log.

In [6]:
fig, ax = plt.subplots(figsize=(30, 14))
ax.plot(mood['Level'],
marker='.', linestyle='-', linewidth=0.5, label='Mood Level')
ax.plot(moodlevel_daily_mean,
marker='o', markersize=8, linestyle='-', label='Daily Mean')
plt.show()

To explore how your mood changes everyday and also compare across the whole period, let's use heat map to plot in a more colorful way. First, we need to convert the incomplete time list into a complete time list. The mood level of the moment will be recorded as NAN if there isn't any data point at that time.

In [7]:
mood.loc[mood.Minute>30,'Hour']= mood['Hour'] + 1        
mood.head(10)
hourly_mood = mood[['Date','Day of week','Hour','Level']]
In [8]:
def get_date_list(begin_date,end_date):
    date_list = [x.strftime(datetime_format) for x in list(pd.date_range(start=begin_date, end=end_date, freq='H'))]
    return date_list

# MPB note: can the dates be inferred?
Time_list = pd.DataFrame({'DateTime':get_date_list('2019-04-24','2019-07-08')})
Time_list['DateTime'] = pd.to_datetime(Time_list['DateTime'], format=datetime_format)
Time_list['Time'] = Time_list['DateTime'].apply(lambda x: x.strftime(date_format)) + ' ' + Time_list['DateTime'].apply(lambda x: x.strftime('%H'))
Time_list['Date']= Time_list['DateTime'].apply(lambda x: x.strftime(date_format) )

hourly_mood['Date'] = pd.to_datetime(hourly_mood['Date'], format=date_format)
hourly_mood['Time'] = hourly_mood['Date'].apply(lambda x: x.strftime(date_format)) + ' ' + hourly_mood['Hour'].apply(lambda x: str(x))
hourly_mood = hourly_mood.drop(['Date'],axis=1)
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
In [9]:
Hourly_mood = pd.merge(Time_list, hourly_mood, how='left', on='Time' )
Hourly_mood_heatmap = Hourly_mood.drop(['Time','Date','Day of week','Hour'], axis=1)
Hourly_mood_heatmap = Hourly_mood_heatmap.set_index('DateTime')

Now, we have our complete hourly mood log. Let's transpose the matrix first.

In [10]:
groups = Hourly_mood_heatmap.groupby(pd.Grouper(freq='D'))
Hourly_mood_heatmap = pd.concat([pd.DataFrame(x[1].values) for x in groups], axis=1)
Hourly_mood_heatmap= pd.DataFrame(Hourly_mood_heatmap)
Hourly_mood_heatmap.head(10)
Hourly_mood_heatmap.columns = Hourly_mood['Date'].drop_duplicates(keep='first', inplace=False)

Let's use heatmap to provide a more intuitive, left-to-right data layout, with each row representing the hour and each column representing the day. Color Red stands for good mood, the more red the better. Color Blue stands for bad mood, while color Green means your mood was so-so at that time.

In [11]:
plt.matshow(Hourly_mood_heatmap, interpolation=None, cmap='jet', vmin=1, vmax=8)
plt.xlabel('Date',fontsize=14)
plt.ylabel('Time of a Day',fontsize=14)
plt.xticks(np.arange(Hourly_mood_heatmap.shape[1]),Hourly_mood_heatmap.columns, rotation=90)
plt.show()

We can also further explore your mood changes within one day. For example, We can see there are lots of changes on 2019-5-19. Let's get your mood log of that day first.

In [12]:
Daily_mood = mood.loc['2019-6-19']

Let's try to plot your mood changes within that day.

In [13]:
Daily_mood['Level'].plot(linewidth=1)
plt.show()

To further explore the variation of the mood during this period, we will try to use Pareto chart to highlight the most representative mood levels over the whole period. First, we need to caculate the frequency of the levels of mood.

In [14]:
Levels = mood.groupby('LevelText', as_index=False)[['Date']].count()
#Levels['LevelText'] = Levels['LevelText'].apply(str)
Levels
Out[14]:
LevelText Date
0 Bad 20
1 Good 127
2 Great 3
3 Meh 21
4 Okay 130
5 So-so 58
6 Very bad 1
7 Very good 30

Then let's plot the frequency in Bar chart.

In [15]:
Levels.plot(kind='bar', x='LevelText', y='Date', legend=None, title='Frequency of Mood levels')
plt.show()

Now, we can use a pareto chart to represnt both the frequency and the cumulative percentage of the mood levels.

In [16]:
def create_pareto_plot(df, x=None, y=None, title=None, show_pct_y=False, pct_format='{0:.0%}'):
    xlabel = x
    ylabel = y
    tmp = df.sort_values(y, ascending=False)
    x = tmp[x].values
    y = tmp[y].values
    weights = y.cumsum() / y.sum()
    
    
    fig, ax1 = plt.subplots()
    ax1.bar(x, y)
    ax1.set_xlabel(xlabel)
    ax1.set_ylabel(ylabel)

    ax2 = ax1.twinx()
    ax2.plot(x, weights, '-ro', alpha=0.5)
    ax2.set_ylabel('', color='r')
    ax2.tick_params('y', colors='r')
    
    vals = ax2.get_yticks()
    ax2.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
    
    formatted_weights = [pct_format.format(x) for x in weights]
    for i, txt in enumerate(formatted_weights):
        ax2.annotate(txt, (x[i], weights[i]), fontweight='heavy')    
 
    if not show_pct_y:
        ax2.set_yticks([])
        
    if title:
        plt.title(title)
    
    plt.tight_layout()
    plt.show()
In [17]:
create_pareto_plot(Levels, x='LevelText', y='Date', title='Pareto Chart of Mood Level Frequency')

Events happened in someone's daily life can influence his/her mood. Let's highlight the period and add annotations to help you understand your own mood. Please input the time period and the event happened during the period below.

In [18]:
time_periods = [
    {
        "start": "2019-07-01",
        "end": "2019-07-04",
        "label": "something happened",
    },
]

Now, we can grab this period and the event.

In [19]:
period_selected = mood[time_periods[0]["start"]:time_periods[0]["end"]]
period_selected_event = time_periods[0]["label"]

Let's take a look at the peak and nadir of your mood during this period.

In [20]:
mood_period_max = period_selected['Level'].max()
mood_period_max_idx = period_selected['Level'].idxmax(axis=0, skipna=True)
print('The moment that your felt best during the period:',mood_period_max_idx)
#mood_period_max_event = input('please input the event happened when you felt best during the period:')

mood_period_min = period_selected['Level'].min()
mood_period_min_idx= period_selected['Level'].idxmin(axis=0, skipna=True)
print('The moment that your felt worse during the period:',mood_period_min_idx)
#mood_period_min_event = input('please input the event happened when you felt worse during the period:')
The moment that your felt best during the period: 2019-07-02 12:20:00
The moment that your felt worse during the period: 2019-07-04 20:09:00

According to the time points above, you can input the events happened at that moment below.

In [21]:
events = [
    {
        "event_mood_max": "thing A",
        "event_mood_min": "thing B"
    },
]

Now, we can highlight the period you selected and add your notes to the plot.

In [22]:
fig, ax = plt.subplots(figsize=(30, 14))
ax.plot(mood['Level'],marker='.', linestyle='-', linewidth=0.5, label='Mood Level')
ax.plot(moodlevel_daily_mean,marker='o', markersize=8, linestyle='-', label='Daily Mean')
ax.axvspan(time_periods[0]["start"], time_periods[0]["end"], color=sns.xkcd_rgb['grey'], alpha=0.5)
ax.set_title('Mood Changes Over Time')

ax.set_ylabel('Mood')
ax.set_xlabel('Date')

ax.legend(loc='upper left', fontsize=11, frameon=True).get_frame().set_edgecolor('blue')  

bbox_props0 = dict(boxstyle='square, pad=0.6', fc='mediumvioletred', ec='r', alpha=.4, lw=.5)

ax.text(time_periods[0]["start"], 9, 'Event happened during this period:\n{}'.format(period_selected_event) , size=12,ha='left',
        family = 'serif', color='yellow', style = 'italic', weight = 'bold', bbox = bbox_props0)

bbox_props1 = dict(boxstyle='round4, pad=0.6', fc='cyan', ec='b', lw=.5)

ax.annotate('Mood Max = {}\nEvent = {}\nDate = {}'
                 .format(mood_period_max, events[0]["event_mood_max"], mood_period_max_idx.strftime('%a, %Y-%m-%d')),
            fontsize=12,
            fontweight='demi',
            xy=(mood_period_max_idx, mood_period_max),  
            xycoords='data',
            xytext=(-150, -30),      
            textcoords='offset points',
            arrowprops=dict(arrowstyle="->"), bbox=bbox_props1)    

ax.annotate('Mood Min = {}\nEvent = {}\nDate = {}'
                 .format(mood_period_min, events[0]["event_mood_min"], mood_period_min_idx.strftime('%a, %Y-%m-%d')),
            fontsize=12,
            fontweight='demi',
            xy=(mood_period_min_idx, mood_period_min),  
            xycoords='data',
            xytext=(-150, 30),      
            textcoords='offset points',
            arrowprops=dict(arrowstyle="->"), bbox=bbox_props1) 

plt.tight_layout()

Let's take a look at the tags you added now.

In [23]:
tags = list(mood.columns.values)[8:]
In [63]:
tag_sum = pd.DataFrame(mood[tags].apply(lambda x: x.sum()))
tag_sum['tags'] = tag_sum.index.values 
tag_sum.columns = ['frequency','tags']
In [64]:
tag_sum.plot(kind='bar',x='tags',y='frequency', legend=None, title='Frequencies of Mood Tags')
plt.show()
In [69]:
!pip install wordcloud
Collecting wordcloud
  Downloading https://files.pythonhosted.org/packages/ae/af/849edf14d573eba9c8082db898ff0d090428d9485371cc4fe21a66717ad2/wordcloud-1.5.0-cp36-cp36m-manylinux1_x86_64.whl (361kB)
    100% |████████████████████████████████| 368kB 1.5MB/s eta 0:00:01
Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from wordcloud)
Requirement already satisfied: numpy>=1.6.1 in /opt/conda/lib/python3.6/site-packages (from wordcloud)
Installing collected packages: wordcloud
Successfully installed wordcloud-1.5.0
You are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [72]:
from PIL import Image, ImageSequence
from wordcloud import WordCloud

def DrawWordcloud(df):
    wc = WordCloud(background_color = 'White',width=1000, height=860, margin=2)
    name = list(df.tags)
    value = df.frequency
    for i in range(len(name)):
        name[i] = str(name[i])
    dic = dict(zip(name, value))
    wc.generate_from_frequencies(dic)
    plt.imshow(wc)
    plt.axis("off")
    plt.show()
    wc.to_file('Wordcloud.png')

DrawWordcloud(tag_sum)
In [ ]: