Capstone · Machine Learning

Biodiesel Price Analysis & Prediction

A weekly biodiesel price forecasting framework built on 17 years of commodity market data. Comparing Ridge Regression, XGBoost, Elastic Net, and SARIMAX to identify the dominant structural drivers of biodiesel pricing — and outperform a naïve persistence benchmark by 14.65%.

Project Type Capstone · Forecasting
Completed February 2026
Data Range Weekly, 2007–2024
Stack Python Scikit-Learn XGBoost Statsmodels Tableau
Overview

The problem

Biodiesel prices are highly volatile, driven by feedstock costs (especially soybean oil), Brent crude oil, chemical inputs like methanol and ethanol, carbon pricing mechanisms, and shipping activity. The interdependence across these commodity markets makes short-term forecasting genuinely difficult. The goal: identify the key structural drivers of weekly biodiesel price movements and build a model that meaningfully outperforms a naïve persistence baseline — where next week's price simply equals this week's.

Key findings

Ridge Regression emerged as the winning model. Biodiesel price formation is best characterised as a highly persistent, input-cost-driven linear process — not a nonlinear one.

14.65%
RMSE improvement
Ridge Regression reduced test RMSE from 0.461 (baseline) to 0.402, a meaningful gain in commodity forecasting.
0.876
R² (Ridge model)
Selected variables explain ~87.6% of weekly biodiesel price variation, up from 83.7% for the naïve baseline.
3
Dominant drivers identified
Previous week's biodiesel price, cost of soybean oil, and Brent crude — confirming a linear, cost-driven structure.
−91.75%
XGBoost vs baseline
The tree-based model performed worse than the naïve benchmark, confirming biodiesel prices follow a predominantly linear pattern.
Presentation

Capstone slides

Swipe through the full 26-slide deck below — covering audience persona, methodology, EDA, modelling approach, and final recommendations. Click any slide to view it full-size.

Source Code

Notebook & modelling code

Full Jupyter notebooks covering data cleaning, feature engineering, multicollinearity analysis, model training, and residual diagnostics — hosted on GitHub.

View on GitHub

Browse the complete codebase including Price Prediction Modeling notebooks, data preprocessing scripts, and Tableau workbook exports.

.ipynb Pandas Scikit-Learn XGBoost Statsmodels
Open Repository
Full Report

Report preview

First 5 pages of the 32-page full report. Scroll through to preview the executive summary and problem framing — download the complete report for methodology details, modelling results, and recommendations.

Biodiesel_Full_Report.pdf
Pages 1–5 of 32
p. 1 Report page 1
p. 2 Report page 2
p. 3 Report page 3
p. 4 Report page 4
p. 5 Report page 5

27 more pages — methodology, model results, and recommendations

Download full report
Full Report
PDF · 32 pages · 2.3 MB
Capstone Slides
PDF · 26 slides · 5.0 MB