Geographia Technica, Vol 20, Issue 2, 2025, pp. 150-178
PREDICTING FOREST FIRE HOTSPOTS IN KALIMANTAN USING BEST SUBSET VARIABLE SELECTION, REGULARIZED REGRESSION MODEL, AND BAYESIAN MODEL AVERAGING
Sri NURDIATI
, I Wayan MANGKU
, Isnayni Feby HAWARI, Mohamad Khoirun NAJIB
ABSTRACT: The forest area in Kalimantan continues to decrease due to forest and land fires. One way to prevent this situation in Kalimantan is by predicting the number of hotspots based on climate indicators. Many modeling approaches, such as statistical and machine learning models, can be used. This study uses the best subset selection to build a regression model with regularization and Bayesian Model Averaging (BMA). Several predictors are used to predict the number of hotspots, including precipitation, precipitation anomalies, dry spells, El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) indices, and seasonality. The best model is selected based on the performance of the RMSE and R2 values. The results of the best subset selection obtained are a model consisting of six terms in polynomial form and interactions of precipitation anomalies, dry spells, and IOD index. It can be concluded that there is a significant role of dry spells as a predictor for hotspots due to their presence in almost every term of the equation. The BMA model outperforms the regularization model, with an RMSE value on the test data of 664 hotspots and an R2 of 88.58%. Although Ridge, LASSO, and Elastic Net perform similarly to the BMA model during the training phase, their reliance on a single model can restrict their ability to generalize to new data. In contrast, BMA offers a more robust and accurate approach by aggregating predictions from multiple models and accounting for uncertainty. This ensemble method enhances BMA's predictive performance on test datasets, making it a valuable tool for accurate forecasting in complex scenarios.
Keywords: Bayesian model averaging, best subset, machine learning, regression, regularization