Abstract:
Objectives In unmanned boat cluster missions, target recognition, situational awareness, and autonomous decision-making control often suffer from issues such as insufficient generalization and slow response in scenarios with small samples, low resources, and complex dynamics. This paper aims to construct an end-to-end multi-agent architecture that integrates the reasoning advantages of large language models with real-time control capabilities to improve the collaborative combat and autonomous control capabilities of unmanned boat clusters in complex environments. Methods Proposing a three-tier collaborative “perception-understanding-decision-making” architecture for the Maritime Commander Agent, this study integrates the Qwen2.5-72B large language model into the entire decision-making process of unmanned vessel clusters. By combining prompt engineering and PID control, it achieves a deep integration of high-level strategic planning and low-level precise control. The system comprises: a target perception agent, which uses YOLOv8l to achieve high-precision detection and localization of multiple types of maritime targets; a situational understanding agent, which utilizes structured prompt templates to convert perception results into high-level natural language situational descriptions; and a decision-making agent, which combines large language model inference with external computational tools to generate control commands and optimizes response speed through PID regulation. This architecture does not require additional model fine-tuning and exhibits low latency and excellent adaptability. Results Experimental results demonstrate that the system achieves high accuracy in object detection on the public dataset ABOships and in degraded scenarios (heavy fog, heavy rain); the correct rate for semantic conversion in situational understanding reaches 93.5%; Simulation experiment results show that the success rate of the 4v1 encirclement task has increased from 20% using traditional rule-based methods to 80%, and the success rate of the 10v10 adversarial task has increased from 25% to 75%, verifying the system's robustness and cross-domain generalization capabilities in complex maritime environments. Conclusions The proposed Maritime Commander Agent architecture maintains the high-level cognitive reasoning capabilities of large language models while enhancing real-time response and execution accuracy through PID control, significantly improving the collaborative decision-making capabilities of unmanned vessel clusters in dynamic tasks. This research provides a feasible technical path and engineering implementation reference for intelligent maritime cluster systems. This paper focuses on constructing an end-to-end multi-agent system based on large language models to address the challenges of target identification, situational understanding, and autonomous decision-making control in unmanned vessel cluster tasks. Existing unmanned vessel decision-making systems primarily rely on rule-based methods, which often prove inadequate in complex scenarios with limited data and resources. While large language models possess robust reasoning and generalization capabilities, they struggle to meet the demands of high-frequency real-time control. To address this, this paper proposes a three-tier collaborative ‘perception-understanding-decision’ architecture called the Maritime Commander Agent, which fully leverages the advantages of large models in intelligent decision-making and combines PID control technology to optimize system response speed. The system comprises three types of agents: target perception agents, responsible for multi-target detection and spatial localization; a situational understanding agent, which converts perception results into high-level natural language situational descriptions; and a decision-making agent, which generates control commands in real time, achieving deep integration between high-level planning and low-level execution. Through simulation task validation, the results demonstrate that this method outperforms traditional rule-based decision-making methods in key metrics such as task completion rate and adversarial success rate, showcasing its immense potential as a new paradigm for intelligent maritime swarm collaboration systems.