What can you learn from your google searches? Explore your google search history!
In this notebook you can analyze your google search history data. First, you have to request your data from google, with instructions that you can find here.
The goal is to get your data in a more usable format that what google provides when you're requesting it and be able to do some quantitative analyses yourself. And of course to get some nice plots!
All of the plots that are generated from this notebook will be saved in your Open Humans home folder, so check that out once you're done.
Unless you know what you're doing I'd recommend not changing this part:
downloading file reading index
/opt/conda/lib/python3.6/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 193 of the file /opt/conda/lib/python3.6/runpy.py. To get rid of this warning, change code that looks like this: BeautifulSoup([your markup]) to this: BeautifulSoup([your markup], "html5lib") markup_type=markup_type))
Here we're simply retrieving the search queries that you made to google, and the dates when you made them
Here we'll be removing stop words (common words, such as "the" / "they", etc). The goal is to keep only the most infomrative search terms, in the form of a list ('filtered_words').
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data... [nltk_data] Package stopwords is already up-to-date!
Here we are extracting information about the date and time when you did your searches. Also, we're breaking down each search into unique search terms ('single_words')
Now we're ready to combine everything into a dataframe and do our very first plot! We'll be plotting the frequency of searches for the top 10 most searched terms. You can change that by modifying 'items2plot'
Not very surprisingly, my efforts to learn python clearly show here! Also, it wouldn't be too hard to guess that I'm into neuroscience, programming experiments with psychopy and mainly working with electroencephalography data and mne.
Now it's time to use all the date & time information that we've been extracting. We're still keeping the top 10 search terms that we extracted in step 4, but now we're looking how each of them changes over time.
In my case I unfortunately only had google search data from November 2017. My python queries peak in January 2018, and after that they keep a rather low but steady pace. Maybe I did learn something after all!
Going back to the time information that we extracted, you can now group the top search terms by the hour of the day when you searched for them.
I'd say that I'm an python evening kind of person...
Here you can choose specific terms from your searches and see how these evolve over time of the day. You can add your own search terms in the first line of this cell ('my_favourite_searches'). For this we normalize each time course by the total number of searches for a specific term.
In my case, after figuring out that I'm doing most of my python searches in the evening, I wanted to see whether my R or matlab schedules are any different. It looks like I'm mostly searching for help with R earlier in the day compared to python or matlab...
Now you can visualize your searches within a network. Here you can see whether some of your search terms tend to co-occur with others, and in which frequency. The first step for this analysis is to create a graph object containing all search terms.
Time to visualize the connections among the top search terms! For display we're only keeping searches that appear more than 40 times, and are connected with at least 5 other search terms.