Capstone · Machine Learning

Biodiesel Price Analysis & Prediction

A weekly biodiesel price forecasting framework built on 17 years of commodity market data. Comparing Ridge Regression, XGBoost, Elastic Net, and SARIMAX to identify the dominant structural drivers of biodiesel pricing — and outperform a naïve persistence benchmark by 14.65%.

Project Type Capstone · Forecasting

Completed February 2026

Data Range Weekly, 2007–2024

Stack Python Scikit-Learn XGBoost Statsmodels Tableau

Overview

The problem

Biodiesel prices are highly volatile, driven by feedstock costs (especially soybean oil), Brent crude oil, chemical inputs like methanol and ethanol, carbon pricing mechanisms, and shipping activity. The interdependence across these commodity markets makes short-term forecasting genuinely difficult. The goal: identify the key structural drivers of weekly biodiesel price movements and build a model that meaningfully outperforms a naïve persistence baseline — where next week's price simply equals this week's.

Results

Key findings

Ridge Regression emerged as the winning model. Biodiesel price formation is best characterised as a highly persistent, input-cost-driven linear process — not a nonlinear one.

14.65%

RMSE improvement

Ridge Regression reduced test RMSE from 0.461 (baseline) to 0.402, a meaningful gain in commodity forecasting.

0.876

R² (Ridge model)

Selected variables explain ~87.6% of weekly biodiesel price variation, up from 83.7% for the naïve baseline.

Dominant drivers identified

Previous week's biodiesel price, cost of soybean oil, and Brent crude — confirming a linear, cost-driven structure.

−91.75%

XGBoost vs baseline

The tree-based model performed worse than the naïve benchmark, confirming biodiesel prices follow a predominantly linear pattern.

Presentation

Capstone slides

Swipe through the full 26-slide deck below — covering audience persona, methodology, EDA, modelling approach, and final recommendations. Click any slide to view it full-size.

Slide 1 of 26

Source Code

Notebook & modelling code

Full Jupyter notebooks covering data cleaning, feature engineering, multicollinearity analysis, model training, and residual diagnostics — hosted on GitHub.

View on GitHub

Browse the complete codebase including Price Prediction Modeling notebooks, data preprocessing scripts, and Tableau workbook exports.

.ipynb Pandas Scikit-Learn XGBoost Statsmodels

Open Repository

Full Report

Report preview

First 5 pages of the 32-page full report. Scroll through to preview the executive summary and problem framing — download the complete report for methodology details, modelling results, and recommendations.

Biodiesel_Full_Report.pdf

Pages 1–5 of 32

p. 1

p. 2

p. 3

p. 4

p. 5