基于改进深度确定性策略梯度算法的舰载机阻拦着舰控制方法

王可; 赵世祥; 王振鹤; 李哲; 吴乾坤; 贺硕; 李亚飞; 徐明亮

doi:10.19693/j.issn.1673-3185.04901

基于改进深度确定性策略梯度算法的舰载机阻拦着舰控制方法

Carrier-Based Aircraft Arrested Landing Control Method Based on Improved Deep Deterministic Policy Gradient Algorithm

摘要

摘要: 【目的】针对多要素耦合扰动下的舰载机阻拦着舰自动引导，提出一种基于深度强化学习的舰载机阻拦着舰分阶段引导方法。【方法】首先，将舰载机着舰引导过程建模为多要素耦合扰动下的分阶段马尔科夫决策过程；然后，综合考虑舰载机姿态、动力学、甲板运动等关键因素，设计多阶段多维信息动态组合的奖励函数，并针对着舰引导过程存在的奖励稀疏问题，提出了融合势能函数奖励塑形机制的深度确定性策略梯度算法；最后，构建了舰载机着舰仿真实验环境，集成了舰载机动力学建模、环境扰动建模、异构可视化及实时交互控制等功能，并通过仿真实验验证了所提算法的综合性能。【结果】实验结果表明，基于改进深度确定性策略梯度算法的着舰引导方法在收敛速度、着舰成功率和着舰稳定性、抗干扰性等方面均优于基线算法。【结论】研究成果可为基于深度强化学习的舰载机阻拦着舰引导技术提供借鉴和参考。

Abstract: Objectives For the automatic guidance of carrier-based aircraft arrested landing under multi-factor coupled disturbances, a phased guidance method based on deep reinforcement learning is proposed. Methods Firstly, the aircraft landing guidance process is modeled as a phased Markov Decision Process under multi-factor coupled disturbances. Secondly, considering key factors such as aircraft attitude, dynamics, and deck motion, a multi-phase, multi-dimensional information dynamic combination reward function is designed. To address the sparse reward problem in the landing guidance process, a Deep Deterministic Policy Gradient algorithm incorporating a potential function-based reward shaping mechanism is proposed. Finally, a simulation environment for carrier-based aircraft landing is constructed, integrating functions such as aircraft dynamic modeling, environmental disturbance modeling, heterogeneous visualization, and real-time interactive control. The comprehensive performance of the proposed algorithm is verified through simulation experiments. Results The experimental results show that the landing guidance method based on the improved Deep Deterministic Policy Gradient algorithm outperforms baseline algorithms in terms of convergence speed, landing success rate, landing stability, and disturbance resistance. Conclusions The research findings can provide references for deep reinforcement learning-based guidance technologies for carrier-based aircraft arrested landing.

HTML全文

参考文献(0)

施引文献

资源附件(0)