A weekly biodiesel price forecasting framework built on 17 years of commodity market data. Comparing Ridge Regression, XGBoost, Elastic Net, and SARIMAX to identify the dominant structural drivers of biodiesel pricing — and outperform a naïve persistence benchmark by 14.65%.
Biodiesel prices are highly volatile, driven by feedstock costs (especially soybean oil), Brent crude oil, chemical inputs like methanol and ethanol, carbon pricing mechanisms, and shipping activity. The interdependence across these commodity markets makes short-term forecasting genuinely difficult. The goal: identify the key structural drivers of weekly biodiesel price movements and build a model that meaningfully outperforms a naïve persistence baseline — where next week's price simply equals this week's.
Ridge Regression emerged as the winning model. Biodiesel price formation is best characterised as a highly persistent, input-cost-driven linear process — not a nonlinear one.
Swipe through the full 26-slide deck below — covering audience persona, methodology, EDA, modelling approach, and final recommendations. Click any slide to view it full-size.
Full Jupyter notebooks covering data cleaning, feature engineering, multicollinearity analysis, model training, and residual diagnostics — hosted on GitHub.
Browse the complete codebase including Price Prediction Modeling notebooks, data preprocessing scripts, and Tableau workbook exports.
First 5 pages of the 32-page full report. Scroll through to preview the executive summary and problem framing — download the complete report for methodology details, modelling results, and recommendations.
27 more pages — methodology, model results, and recommendations
Download full report