基于改进Rainbow算法的船舶自主避碰行为决策

Ship autonomous collision avoidance behavior decision based on an improved Rainbow algorithm

  • 摘要:目的】为减少船舶航行过程中因人为失误导致的海难事故,提出了一种基于改进彩虹(Rainbow)算法的船舶自主避碰行为决策方法。【方法】首先引入长短期记忆网络(long short-term memory network, LSTM)改进网络结构,提升算法的收敛速度与泛化能力;其次采用自适应优先经验回放(adaptive prioritized experience replay, APER)提高样本利用效率,进一步增强算法的训练稳定性与策略优势;同时将船舶运动数学模型、船舶领域、避让责任认定模型以及《国际海上避碰规则》(international regulations for preventing collisions at sea, COLREGs)深度融入算法框架,构建了一组兼顾安全性、合规性与经济性奖励函数;最后进行了数字仿真以及真实环境的仿真实验,验证该方法的有效性。【结果】仿真结果表明,改进后的Rainbow算法在训练过程中的收敛速度相较于传统Rainbow算法提升37.5%,收敛曲线更加平滑稳定,收敛后每回合的平均奖励有较大提升;训练得到的模型能够准确判断会遇局面,并依据COLREGs采取恰当避碰措施。【结论】改进Rainbow算法训练得到的模型能够使船舶在遵守COLREGs的同时实现自主避碰行为决策。

     

    Abstract: Objectives To reduce maritime accidents caused by human error during ship navigation, a novel autonomous collision avoidance behavior decision-making method for ships based on an improved Rainbow algorithm is proposed. Methods Firstly, the long short-term memory network (LSTM) is introduced to improve the network structure, enhancing the convergence speed and generalization ability of the algorithm. Secondly, adaptive prioritized experience replay (APER) is adopted to improve the efficiency of sample utilization, further strengthening the training stability and policy superiority of the algorithm. At the same time, the ship motion mathematical model, the ship domain, the collision avoidance responsibility determination model, and the "International Regulations for Preventing Collisions at Sea" (COLREGs) are deeply integrated into the algorithm framework, constructing a set of reward functions that balance safety, compliance, and economic benefits. Finally, digital simulations and real environment simulation experiments are conducted to verify the effectiveness of this method. Results Simulation results demonstrate that the improved Rainbow algorithm achieves a 37.5% increase in convergence speed compared with the traditional algorithm during training; the convergence curve is smoother and more stable, and the average reward per round is significantly improved after convergence. The trained model can accurately identify encounter scenarios and take appropriate collision avoidance measures in accordance with COLREGs. Conclusions The improved model trained by the Rainbow algorithm enables ships to make autonomous collision avoidance decisions while complying with the COLREGs.

     

/

返回文章
返回