基于聚类分析与特征权重因子优化的船舶油耗预测模型

A Ship Fuel Consumption Prediction Model Based on Clustering Analysis and Feature Weight Factor Optimization

  • 摘要: 【目的】在国际排放法规日益严格的背景下,船舶油耗预测成为航运业节能减排的关键环节。本文针对多源异构数据采样频率不一致导致的特征偏差匹配问题,提出了一种融合船舶自动识别系统(AIS)数据、船舶日报(NR)数据及欧洲中期天气预报中心(ECMWF)气象数据的油耗预测黑箱模型。【方法】针对现有研究中采用均值降频处理方法存在的缺陷与问题,本模型结合数据特点提出基于积分原理的频率匹配方法。模型首先引入特征权重因子,使用加权欧式距离计算特征空间的距离;并且结合K-means算法对高频的AIS数据进行聚类分析,计算加权聚类中心;其次利用KNN中距离加权的思想构建临近聚类中心的加权反距离权重,建立特征数据点与聚类中心的映射关系;基于数值积分建立AIS数据与NR数据的回归方程,采用L-BFGS-B算法求解约束条件下回归方程参数;最后引入NSGA-II多目标算法对特征权重因子进行全局优化,以提升模型拟合效果。此外,本研究通过五折交叉验证策略校验模型的鲁棒性与泛化能力。【结果】试验结果表明,在160个聚类中心下,模型的测试集的MAPE,MAE,R2的平均值分别为5.3721% 5.6049t/day,0.9767,对比传统均值降频处理的效果均有提升。【结论】该模型可以通过增加聚类中心个数提高模型拟合程度,在满足精度要求下可以选择更少的聚类中心提高模型计算速度,通过降低了一定的外推能力获得了更好的抗干扰能力,具有一定工程实用价值。

     

    Abstract: Objective Against the backdrop of increasingly stringent international emission regulations, ship fuel consumption prediction has become a key link in energy conservation and emission reduction in the shipping industry. To address the feature bias matching issues caused by inconsistent sampling frequencies of multi-source heterogeneous data, this paper proposes a black-box model for fuel consumption prediction that integrates Automatic Identification System (AIS) data, ship noon report (NR) data, and European Centre for Medium-Range Weather Forecasts (ECMWF) meteorological data. Method Aiming at the defects and problems of the mean value downsampling method used in existing studies, this model proposes a frequency matching method based on the integral principle in combination with data characteristics. The model first introduces feature weight factors and uses the weighted Euclidean distance to calculate distances in the feature space; it then combines the K-means algorithm to perform clustering analysis on high-frequency AIS data and calculate weighted cluster centers. Secondly, it constructs weighted inverse distance weights of adjacent cluster centers using the distance weighting idea in KNN, establishing a mapping relationship between feature data points and cluster centers. Based on numerical integration, a regression equation between AIS data and NR data is established, and the L-BFGS-B algorithm is used to solve the parameters of the regression equation under constraint conditions. Finally, the NSGA-II multi-objective algorithm is introduced to perform global optimization on feature weight factors to improve the model fitting effect. In addition, this study verifies the robustness and generalization ability of the model through a five-fold cross-validation strategy. Result The experimental results show that with 160 cluster centers, the average values of MAPE, MAE, and R² of the model on the test set are 5.3721%, 5.6049 t/day, and 0.9767, respectively, which are all improved compared with the effect of traditional mean value downsampling. Conclusion The model can improve the model fitting degree by increasing the number of cluster centers. Under the condition of meeting accuracy requirements, fewer cluster centers can be selected to improve the computational speed of the model. It achieves better anti-interference ability by reducing certain extrapolation ability, and has certain engineering practical value.

     

/

返回文章
返回