Capstone: Time-series & ML

ML
Time-Series
Python
Webscraping
APIs
Forecasting time-series trends with classical models and deep learning.

Problem

Built a forecasting pipeline to predict future metrics for a stock price changes.

Data

Combined multiple data sources from macroeconomic, microeconomic, historical stock price, and sentiment data. Webscraping and APIs were used to gather these

Approach

  • Cleaned and merged time-series
  • Explored patterns and seasonality
  • Trained RandomForest, TFT, and LSTM models

Results

The RandomForest model was the most accurate but not accurate enough when it comes to yearly predictions

Top Correlations by Horizon

Top 10 features correlated with returns at 1‑day/1‑week/1‑month/1‑year. Short‑term is sentiment‑heavy (news/Reddit, PPI), while longer horizons shift to macro fundamentals (housing starts, retail sales, PCE, GDP).

30‑Day MA: Price vs Positive News Sentiment

MSFT closing price plotted against positive‑news sentiment . Broad co‑movement shows optimism tends to align with price uptrends.

High Positive Sentiment Events on Price

Red markers flag dates with unusually high positive sentiment overlaid on price.

1‑Day Ahead: Actual vs Predicted (Random Forest)

RF predictions (orange) versus actual 1‑day returns. Captures direction reasonably but smooths away extreme spikes.

1‑Day Ahead: Actual vs Predicted (LightGBM)

LGBM 1‑day forecasts. Smoother than actuals; tracks trend changes with modest lag and underestimates large moves.

1‑Day Ahead: Actual vs Predicted (XGBoost)

XGB 1‑day forecasts. Similar bias toward conservative magnitudes, with stable tracking of mean‑reverting behavior.

1‑Week Ahead: Actual vs Predicted (Random Forest)

RF 1‑week horizon. Directional fit remains, but volatility amplification at this horizon widens miss on extremes.

1‑Week Ahead: Actual vs Predicted (LightGBM)

LGBM 1‑week forecasts. Predictions stay close to zero and change slowly, good at trend, weak on amplitude.

1‑Week Ahead: Actual vs Predicted (XGBoost)

XGB 1‑week horizon. Best smoothness among the three but still underestimates sharp drops/spikes typical of weekly returns.

Repo / Live