概率感知型时序差分DQN无人艇智能搜潜决策

Probabilistic perception-based TS-DQN decision-making for autonomous USV submarine search

  • 摘要:
    目的 针对无人水面艇(USV)对潜艇的探查问题,提出一种基于深度强化学习的无人艇智能搜潜算法。
    方法 首先,构建探查环境和运动学模型,并建立综合了距离与角度影响的声呐探测概率模型,明确界定探查任务的成功判定条件,然后,将该问题形式化为马尔科夫决策过程(MDP)框架,其中无人艇作为强化学习中的智能体(agent),结合探查任务的需求设计包含探测概率的状态空间,耦合探测概率、距离和角度的多元奖励函数。最后,为了求解该MDP问题,提出融合双重决斗网络架构与优先经验回放的时序差分Q网络(TS-DQN)算法,引入探测概率感知型ε-贪婪策略,使智能体能够根据当前环境的探测状态自动调整探索倾向,提升其策略学习效率。
    结果 仿真实验表明,所提方法在探测成功率方面达到38.85%,比排名第2的Dueling DQN高出18倍;同时平均路径步长为334.36步,较其他算法减少9.5%以上。
    结论 研究成果在探测效率与有效性方面展现出的综合优势,为无人艇自主探查技术的发展提供了新的解决方案。

     

    Abstract:
    Objective To develop a deep reinforcement learning-based search algorithm for unmanned surface vehicles (USVs) in submarine detection tasks.
    Method The study is conducted in the context of submarines infiltrating key maritime areas, where a search environment and a kinematic model are constructed. A sonar detection probability model is established, incorporating the effects of distance and angle, with well-defined criteria for determining detection success. Based on this framework, a Markov decision process (MDP) is formulated using the deep Q-network (DQN) algorithm. The state space is designed to include detection probability, and a multi-objective reward function is constructed by integrating detection probability, distance, and angle. To enhance learning efficiency, a temporal difference Q-network (TS-DQN) algorithm is introduced, integrating a double-dueling network architecture with prioritized experience replay. Additionally, a probabilistic perception-based ε-greedy strategy is employed, allowing the system to dynamically adjust its exploration behavior based on real-time detection states, thereby significantly improving policy learning efficiency.
    Results Extensive simulation experiments demonstrate that the proposed method achieves a detection success rate of 38.85%, which is 18 times higher than that of the second-best Dueling DQN. The approach also reduces the average path length to 334.36 steps, shortening the search trajectory by more than 9.5% compared to other algorithms.
    Conclusion The proposed algorithm exhibits significant advantages in detection efficiency and effectiveness, providing an innovative solution for advancing autonomous USV-based search and detection technologies.

     

/

返回文章
返回