杨远鹏, 宋利飞, 茅嘉琪, 等. 基于混合采样深度Q网络的水面无人艇逃脱策略[J]. 中国舰船研究, 2024, 19(1): 256–263. doi: 10.19693/j.issn.1673-3185.03105
引用本文: 杨远鹏, 宋利飞, 茅嘉琪, 等. 基于混合采样深度Q网络的水面无人艇逃脱策略[J]. 中国舰船研究, 2024, 19(1): 256–263. doi: 10.19693/j.issn.1673-3185.03105
YANG Y P, SONG L F, MAO J Q, et al. Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network[J]. Chinese Journal of Ship Research, 2024, 19(1): 256–263 (in Chinese). doi: 10.19693/j.issn.1673-3185.03105
Citation: YANG Y P, SONG L F, MAO J Q, et al. Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network[J]. Chinese Journal of Ship Research, 2024, 19(1): 256–263 (in Chinese). doi: 10.19693/j.issn.1673-3185.03105

基于混合采样深度Q网络的水面无人艇逃脱策略

Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network

  • 摘要:
    目的 针对敌方船舶采用合围战术,研究我方无人艇(USV)被敌方船舶包围情况下的逃跑策略规划问题。
    方法 提出一种混合采样深度Q网络(HS-DQN)强化学习算法,逐步增加重要样本的回放频率,并保留一定的探索性,防止算法陷入局部最优。设计状态空间、动作空间和奖励函数,通过训练获得最优的USV逃跑策略,并从奖励值和逃脱成功率方面与DQN算法进行对比。
    结果 仿真结果表明,使用HS-DQN算法进行训练,逃脱成功率提高2%,算法的收敛速度提高了20%。
    结论 HS-DQN算法可以减少USV无效探索的次数,并加快算法的收敛速度,仿真实验验证了USV逃跑策略的有效性。

     

    Abstract:
    Objective Aiming at the encirclement tactics adopted by enemy ships, this study focuses on the problem of planning an escape strategy when an unmanned surface vehicle (USV) is surrounded by enemy ships.
    Methods A hybrid sampling deep Q-network (HS-DQN) reinforcement learning algorithm is proposed which gradually increases the playback frequency of important samples and retains a certain level of exploration to prevent it from falling into local optimization. The state space, action space and reward function are designed to obtain the USV's optimal escape strategy, and its performance is compared with that of the deep Q-network (DQN) algorithm in terms of reward and escape success rate.
    Results The simulation results show that using the HS-DQN algorithm for training increases the escape success rate by 2% and the convergence speed by 20%.
    Conclusions The HS-DQN algorithm can reduce the number of useless explorations and speed up the convergence of the algorithm. The simulation results verify the effectiveness of the USV escape strategy.

     

/

返回文章
返回