Apple Stock Predictor
March 2024
Outside collaborators: Sara Antonijevic, Maura Flanagan, Jackson Polston
March 2024
Outside collaborators: Sara Antonijevic, Maura Flanagan, Jackson Polston
Over the past decade, the technology sector has experienced significant growth, marked by a constant influx of innovative ideas, products, and insights. Notably, Apple has emerged as a major player in the tech industry and as one of the largest companies across all market sectors globally. This surge in the industry’s prominence, coupled with Apple’s substantial presence, has piqued interest in the analysis of its stock market performance and its evolution over time. Moreover, this research seeks to explore the feasibility of forecasting future stock market trends to gain insight into Apple’s trajectory and to facilitate informed decision-making when entering or exiting the market. This research has two main objectives: First, it seeks to thoroughly analyze historical data obtained from Apple Inc.’s stock prices in the last decade. The research will contain machine learning techniques to construct a predictive timeseries model capable of forecasting future stock price movements with a high degree of accuracy. The machine learning techniques in focus will be Long ShortTerm Memory (LSTM) and the AutoRegressive Integrated Moving Average (ARIMA). Secondly, this research aims to evaluate the predictive model’s accuracy through a variety of methods to determine its reliability, efficiency, and applicability in guiding financial decisions. By achieving these goals, the research can provide insights for investors, financial analysts, and other professionals. This would enhance their ability to make informed investment choices based on data-driven predictions.
We are using the Apple (AAPL) stock data for our study which is extracted from Yahoo.com and Kaggle as an ‘xts’ object which we converted into a data frame to perform more analysis on this data. The AAPL data contains 4330 observations of the stock market prices of Apple Inc. every day from January 3, 2007, until March 15, 2024. This dataset is updated daily as new values are collected by Yahoo. The 6 variables that are measured daily are:
Date recorded.
High: highest price someone is willing to buy a stock.
Low: lowest price someone is willing to sell a stock.
Close: final price the stock was traded for at the end of the day.
Open: price the stock trades for right when the stock market opens for the day.
Volume: number of shares traded that day.
Adjusted Closing Price: adjustment to closing price to reflect the price of the stock after paying off dividends.
The data cleaning process was performed in R, where the team checked for any duplicate or missing values. Due to the consistent daily collection of the data by the sources, there were no missing or duplicate data points. The research could proceed with further exploration of the dataset. 2.2 Adjusted Price Distribution For a better understanding of how the adjusted price spreads across the time series, visual analysis was performed to generate a distribution model.
The distribution depicts a heavy right skew. This indicates that the majority of the stock data is in the range of $0 to $35. This skew is expected when analyzing the stock as the skew comes from the years 1980 until 2000 when the 2 company was still in its early era of growth. The skew in the stock prices began in the late 2010s, where between 2017 and 2019, the stock price almost doubled.
Due to the long period of our data, this research focused primarily on the stock occurrences of the most current two years. Emphasis on more current events provides a better understanding of how the stock trends shift today.
This time period provides an interesting shift in the adjusted closing price. Whilst throughout 2021-2022 there was a steady increase, there is an opposite effect taking place in the 2022-2023 time span.
Due to the interesting trend presented when looking deeper into a yearly time series, further analysis was performed on the seasonal trends of the stock price.
When examining the seasonal trends, there seems to be a general decrease in the stock price around the summertime. However, the price increases as the winter season approaches. A plausible reason for this could be Black Friday, Christmas, and other seasonal holidays.
To further visualize the correlation between the variables in the dataset, a pair plot of the data was conducted. This plot allows a clearer look into pairwise
relationships of the variables. As depicted by the graph, Open, High, Low, Close, and Adjusted Close variables all have strongly positively correlated features. They also indicate a linear relationship among each other. This was an expected feature of the dataset as all of these variables have direct dependencies on each other. Meanwhile, the Volume variable does not depict a strong correlation.
A further correlation matrix was conducted on the data.
Similar to the pairwise plot, there is a strong linear relationship between all of the variables except for volume.
To create an accurate system to predict apple stock, the team wanted to use a neural network based time series model. The problem with a standard neural network is you run into the problem of the exploding/vanishing gradient, where the process of Gradient Descent stops working. The solution to this comes in the form of Long short-term memory, or LSTM, where the model uses iterative steps with long and short term weights/biases to establish a well educated guess. We employed this system in python using a few different public libraries. First the team plotted the Partial Auto Correlation Function (PACF) to establish how much the y variable is correlated to past values of itself, along with how far back a correlation exists.
Then the default LSTM model was created as a baseline with 5 tested observations. This model was horrible out the gate and needed significant improvement.
Next the team tweaked values in the creation of a model that could reasonably predict the future value of the stock price. This included training intervals, optimization methods, etc. A Multiple Linear Regression (MLR) model was generated and plotted as well in order to properly evaluate the LSTM model compared to another common approach.
We also used an auto-regressive integrated moving average model to analyze the Apple stock due to its effectiveness in capturing time series patterns, trends, and seasonality. We noticed from our first time series plot that our data was not stationary, as shown in Figure 9, with an obvious upward trend. Additionally, the auto covariance plot (ACF) slowly decreases as the lag increases, which is strong evidence that we are dealing with non-stationary data.
After taking the first-order difference of the log-transformed dataset it’s clear that we have stabilized our data and it’s ready for model selection. This stabilized data created better ACF and PACF plots suggesting that there is no need to continue transforming the data. Our ACF plot no longer slowly decays and all lags seem to stay within the blue bounds. While it is harder to visually see which arima model to chose from these acf and pacf plots, we utilize the auto.arima() function to help us pick a great model for our transformed dataset.
The auto.arima() function ran through multiple different ARIMA models and returned back the model that minimized the AIC and BIC values. In our case, the ARIMA (0,1,1)(1,0,0)[12] was what it decided to pick. Interestingly, after using this arima model in the sarima function, our Ljung-box statistics were relatively low among all lags, indicating an improper fit to the model. Figure 11 depicts the low Ljung-box statistics that caused us to continue looking for a better model.
Model Fit Statistics
AIC = -4.990462
AICc = -4.990461
BIC = -4.984601
After sifting through some other model choices, we landed on the ARIMA (0,1,1)(1,1,0)[12] model because for the first 5 lags, the Ljung-box statistic provided a good fit with the model and our coefficients were still significant. With this model being better at predicting, we see that the original data should be differenced at both the non-seasonal and seasonal level to achieve stationarity. Since our goal is to predict future stock, having a reliable model is crucial when it comes to forecasting which we find that Figure 12 depicts the reliable model we were looking for.
Model Fit Statistics:
AIC = -4.603393
AICc = -4.603392
BIC = -4.598987
Looking at Figure 12, it’s easy to see how the first 5 lags are above the blue significance line unlike in Figure 11. This is important to note because for the forecasting phase, we will only go to a maximum of 5 lags as this farthest possible lag that our model will predict relatively well. Our model is now ready to be used to forecast future stock market prices. Now we utilize the sarima.for() function of our ARIMA (0,1,1)(1,1,0)[12] model to obtain the future stock prices for the next 5 days.
To evaluate the performance of each model for predicting Apple Inc.’s future stock market prices, we will split the dataset into training and testing data. The training data will encompass all available data up to June 12, 2024, while the testing data will cover the period from June 13, 2022, to June 17, 2022. We will employ the same training and testing dataset for MLR, LSTM, and ARIMA models to determine whether the ARIMA model outperforms LSTM or MLR in forecasting. The prediction output for the LSTM and MLR model is shown in Figure 13. To further understand how accurate the model is the models were tested for error and accuracy using MAPE and RSME.
From the results you can see that the LSTM model was superior to the MLR model. It was better but it did take far more tweaking and time optimizing. The team’s takeaway from this was that perhaps it is not always best to jump to a potentially over complicated model, but with the right customization and methods you can create a useful tool. The market records for Apple are highly varying and generally unpredictable, much like the stock market as a whole, but this LSTM model seems to do a decent job of wrangling the data in a useful way. The prediction output for the ARIMA model is shown in Figure 14 and the accuracy scores are provided in the table below.
Comparing the accuracy results from the ARIMA model to the LSTM and MLR model, we find that the LSTM is superior to both the MLR and ARIMA model. The team concludes that more testing is needed before this LSTM model should be implemented in the real world, but it is a pretty decent jumping off point.
Since the stock market is a highly volatile market, there is bound to be some error between the prediction values and actual values. Furthermore, if the stock market could be so accurately predicted by people, then becoming rich from the stock market would be a much easier task than it is now. The LSTM model may be a good resource for one analyzing the stock market, but it should not be the only research done before buying or selling a stock.
High inflation is an important environmental factor that erodes consumer purchasing power, curbing spending on non-essential goods like stocks, electronics, and other luxuries. A decrease in consumer demand can lower Apple’s sales, revenues, and profits. Additionally, to curb inflation, banks often raise interest rates, which dampens economic growth and consumer spending even further. The resulting higher borrowing costs for companies and consumers lead investors to prefer less risky assets like bonds over stocks, negatively impacting Apple’s share price.
Figure 17 illustrates the United States interest rates from 2000 until 2024. In the close-up of the graph, an evident spike occurs around 2022. Comparing the timeline to the adjusted close prices from 2021-2022, we can see a correlated fall in the rate prices in the middle of 2022.
Unemployment rates have a direct correlation with consumer spending. Figure 17 displays unemployment rates in the United States since the year 2000. As unemployment rises, fewer people have disposable income leading to a reduction in consumer spending on non-essential items like smartphones, tablets, and laptops. For Apple, a company that relies heavily on consumer spending on its products and services, the economy suffering from higher unemployment can translate into lower sales. This impacts its revenue and profit margins which means that investors anticipate that rising unemployment could affect Apple’s financial performance. Thus, Apple’s stock price might experience downward pressure.
Similar to the inflation rates, the unemployment rates had a slightly correlated timeline to the adjusted closing rates of Apple stocks. The evident increase in unemployment rates in 2020 and 2021 lowered the stock prices of Apple during this time.
Figure 18 displays a visual timeline of Apple events since 2000. This helps create a better picture of how stock prices react after Apple events and launches. Looking into patterns between these events and stock prices allows a better prediction of when to invest in the future. Similarly, this may reflect how willing consumers are to invest in Apple and whether a product will be successful in the market. It can create a good presumption of when an investment in Apple stocks is the better choice.
For a better look at how the stock market changed with launches, Figure 19 depicts the stock prices 15 days before and 15 days after the launch.
It is interesting to see that while in the early 2000s there tends to be a general increase in stock prices immediately after Apple launches, in the 2020s there are generally falls (particularly graphs depicting 2020 and 2021). A plausible reason for this is that the company’s launches now tend to be less innovative in new releases.
This research implemented the LSTM and ARIMA machine learning algorithms on Apple stock to compare which one generated a higher degree of prediction accuracy. While the ARIMA had a good performance, LSTM generated lower error rates.
To enhance further research, more studies could be done testing comparisons between stock prices and their external influences. Additionally, more external influences can be presented, including social media analysis, advertisement and media spending by Apple on its products, and interest rates.
Nagadia. (2022). Apple Stock Price from 1980-2021, Version 3. Retrieved February 14th, 2024 from https://www.kaggle.com/datasets/meetnagadia/applestock-price-from-19802021.
Yahoo Finance. (n.d.). Apple Inc. (AAPL) - Stock Price, Quote and News - Yahoo Finance. Retrieved from https://finance.yahoo.com/quote/AAPL/
Keith, Michael. “Exploring the LSTM Neural Network Model for Time Series.” Medium, Towards Data Science, 20 Sept. 2023, towardsdatascience.com/exploringthe-lstm-neural-network-model-for-time-series-8b7685aa8cf.
In addition to the slides, if you'd like the report as a pdf you can view it here.