A Ship Fuel Consumption Prediction Model Based on Clustering Analysis and Feature Weight Factor Optimization
-
Graphical Abstract
-
Abstract
Objective Against the backdrop of increasingly stringent international emission regulations, ship fuel consumption prediction has become a key link in energy conservation and emission reduction in the shipping industry. To address the feature bias matching issues caused by inconsistent sampling frequencies of multi-source heterogeneous data, this paper proposes a black-box model for fuel consumption prediction that integrates Automatic Identification System (AIS) data, ship noon report (NR) data, and European Centre for Medium-Range Weather Forecasts (ECMWF) meteorological data. Method Aiming at the defects and problems of the mean value downsampling method used in existing studies, this model proposes a frequency matching method based on the integral principle in combination with data characteristics. The model first introduces feature weight factors and uses the weighted Euclidean distance to calculate distances in the feature space; it then combines the K-means algorithm to perform clustering analysis on high-frequency AIS data and calculate weighted cluster centers. Secondly, it constructs weighted inverse distance weights of adjacent cluster centers using the distance weighting idea in KNN, establishing a mapping relationship between feature data points and cluster centers. Based on numerical integration, a regression equation between AIS data and NR data is established, and the L-BFGS-B algorithm is used to solve the parameters of the regression equation under constraint conditions. Finally, the NSGA-II multi-objective algorithm is introduced to perform global optimization on feature weight factors to improve the model fitting effect. In addition, this study verifies the robustness and generalization ability of the model through a five-fold cross-validation strategy. Result The experimental results show that with 160 cluster centers, the average values of MAPE, MAE, and R² of the model on the test set are 5.3721%, 5.6049 t/day, and 0.9767, respectively, which are all improved compared with the effect of traditional mean value downsampling. Conclusion The model can improve the model fitting degree by increasing the number of cluster centers. Under the condition of meeting accuracy requirements, fewer cluster centers can be selected to improve the computational speed of the model. It achieves better anti-interference ability by reducing certain extrapolation ability, and has certain engineering practical value.
-
-