NJORD

Njord logo
PREDICT
THE MARKET
WITH
GOOGLE TRENDS
Njord attempts to predict future stock prices based on Google Trends data, using machine learning.

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yield significant profit. With the advent of the digital computer, stock market prediction has since moved into the technological realm. The most prominent technique involves the use of artificial neural networks. ANNs can be thought of as mathematical function approximators.

In 2011, there was experimentation with Google Trends data to predict the stock market. A correlation between the data of multiple search terms and the stock market price had been found. This is the research upon which Njord is based. In this research 98 financial words were found, which had some kind of correlation with the stock price of the Dow Jones Industrial Average. Njord takes the search data for these words and feeds them into a machine learning algorithm. Based on this data, Njord attempts to determine whether the stock price of the Dow Jones Industrial Average will go up (1) or down (0).

by Cristian Perez Jensen

Needed Adjustments

Unadjusted Data

To actually use the data — for feeding to the machine learning algorithm — the data first had to be adjusted. This was needed for two reasons: all data is relative within it's timeframe, and daily data is only available in 6-month increments.

For example, this is the unadjusted daily data for the search term stock market.

As you can see, every 6-month increment, it resets. This is because daily data is only available within that timeframe.

Adjusted Data

Based on the restrictions, adjustments had to be made. In order to eliminate these restrictions, three steps had to be taken.

Step 1 — Compute the percentage changes between all unadjusted daily datapoints.

Step 2 — Insert weekly datapoints into the adjusted data at their corresponding place, and compute the data in between.

Step 3 — Divide the adjusted data by the maximum adjustment datapoint.

For example, this is the adjusted daily data for the search term stock market.

Monthly Data

In order to validate that the adjusted daily data is indeed correct, we can compare it to the monthly data. The monthly data can be used for validation, because it is available for the entire timeframe of Google Trends, without any adjustments.

As you can see, the adjusted daily data certainly correlates with the monthly data. However, some days shoot up, because of an outlier. This causes the spike to be higher than in the monthly data, but that is only for that one day. Afterwards, it settles, which is why the average for the month is lower, and it does not cause a spike in the monthly data.

Explore

Play around with 96 financial search terms.


FILTER BY DATE



FIND A SEARCH TERM

Needed Adjustments

Unadjusted Data

To actually use the data — for feeding to the machine learning algorithm — the data first had to be adjusted. This was needed for two reasons: all data is relative within it's timeframe, and daily data is only available in 6-month increments.


For example, this is the unadjusted daily data for the search term stock market.


As you can see, every 6-month increment, it resets. This is because daily data is only available within that timeframe.


Adjusted Data

Based on the restrictions, adjustments had to be made. In order to eliminate these restrictions, three steps had to be taken.

Step 1 — Compute the percentage changes between all unadjusted daily datapoints.


Step 2 — Insert weekly datapoints into the adjusted data at their corresponding place, and compute the data in between.


Step 3 — Divide the adjusted data by the maximum adjustment datapoint.


For example, this is the adjusted daily data for the search term stock market.


Monthly Data

In order to validate that the adjusted daily data is indeed correct, we can compare it to the monthly data. The monthly data can be used for validation, because it is available for the entire timeframe of Google Trends, without any adjustments.

As you can see, the adjusted daily data certainly correlates with the monthly data. However, some days shoot up, because of an outlier. This causes the spike to be higher than in the monthly data, but that is only for that one day. Afterwards, it settles, which is why the average for the month is lower, and it does not cause a spike in the monthly data.


In conclusion, Google Trends has the ability to predict major dips in the market (using general financial words), however when attempting to predict small changes in the market, it struggles. E.g. the housing market crisis of 2008 was able to be predicted by the Google Trends method (you can see the phenomenon by filtering by date 2008—2010 and searching for the search term stock market).

In a future project, I would like to explore the ability of Google Trends to predict the smaller changes in the market. For example, by taking the hourly data for the search term TSLA, and the stock price data for that ticker, and investigate whether this would be a viable trading strategy.