基于局部状态感知的无人艇深度强化学习路径规划

会鑫; 王宁; 王帅

doi:10.19693/j.issn.1673-3185.04390

摘要:

目的针对海上搜救任务中无人艇感知范围有限导致的路径规划效率低、鲁棒性差的问题，提出一种基于局部状态感知的无人艇路径规划方法。

方法首先，采用Soft Actor-Critic算法，设计基于局部感知的奖励函数，并结合特征增强训练方法，通过提取环境关键特征并在随机特征环境中训练，提升有限感知条件下的路径规划采样效率和鲁棒性。然后，提出一种基于局部感知域的迭代航路点规划方法，有效协调局部与全局目标，最终实现在海上搜救任务中的高效路径规划。

结果仿真结果显示，所提出的方法在特征环境中路径规划成功率达到98%以上，且在海上搜救任务中完成率超过93%，对不确定环境表现出了较好的鲁棒性和适应性。

结论所提出的基于局部状态感知的无人艇路径规划方法解决了深度强化学习在海上搜救任务中应用问题，可为强化学习算法在实际工程中的应用提供技术支持。

Abstract:

Objective Maritime rescue missions require efficient and reliable path planning for unmanned surface vehicles (USVs). However, these missions are challenged by the limited sensing capabilities of USVs operating in vast and uncertain environments with randomly distributed obstacles. This study addresses the issues of low path planning efficiency and poor robustness resulting from restricted perception range. To tackle these challenges, a novel local observation-based path planning approach is proposed for USVs in maritime rescue missions.

Method The proposed approach integrates three key methodological innovations. First, the soft actor-critic (SAC) algorithm is employed with a reward function tailored to local observation, which rewards efficient goal-reaching and penalizes obstacle collisions. This design helps balance exploration and exploitation in uncertain environments. Second, a feature-enhanced soft actor-critic (FESAC) algorithm is introduced to improve training efficiency and model robustness. It extracts key environmental features and employs a randomized training environment with strategically placed obstacles to enhance sampling efficiency. During training, obstacle positions, USV starting points, and goals are randomly reset across episodes, encouraging the model to learn generalizable navigation strategies instead of memorizing specific scenarios. Third, an adaptive waypoint planning algorithm is developed based on local perception domains to effectively coordinate local obstacle avoidance with global goal-reaching behavior. Waypoints are dynamically selected within the USV's perception radius using a weighted objective function that balances proximity to the goal and distance from obstacles. This decomposes the complex global path planning task into a series of manageable local planning problems.

Results Comprehensive simulation experiments validate the effectiveness of the proposed approach. In feature-rich environments with randomly distributed obstacles, the method achieves a success rate exceeding 98%, significantly outperforming traditional methods. In simulated maritime rescue missions over 1,000 m×1,000 m areas with 20-50 randomly placed obstacles, the method maintains a task completion rate exceeding 93% under appropriate parameter configurations. The simulation results also reveal a notable trade-off between path safety and efficiency: increasing the obstacle avoidance weight w_2 yields safer but longer paths, whereas increasing the goal-reaching weight w_1 results in shorter paths at the cost of higher collision risk. Depending on different task requirements, optimal performance metrics can be obtained through proper parameter tuning. Comparative analysis shows that the FESAC algorithm converges significantly faster than standard SAC in complex environments, demonstrating enhanced learning efficiency.

Conclusion The proposed local observation-based path planning method effectively addresses the challenges posed by limited perception in maritime rescue scenarios, exhibiting strong robustness and adaptability to uncertain environments. By decomposing complex global planning tasks into manageable local subtasks and enhancing feature extraction capabilities, the method provides a practical solution for real-world USV operations where complete environmental information is unavailable. This work provides valuable technical insights for the practical application of reinforcement learning algorithms in actual engineering scenarios.

基于局部状态感知的无人艇深度强化学习路径规划

Local perception-based path planning for unmanned surface vehicles using deep reinforcement learning