Abstract:
Objective An integrated sensing and communications (ISAC) system operating in a dynamic heterogeneous maritime network environment faces multiple challenges, including frequent node mobility, severe time-varying channel interference, and cross-network eavesdropping threats. Conventional optimization methods for sensing and communication scheme design suffer from high computational complexity and lack the capability for real-time adaptation. To address these limitations, this paper proposes an intelligent beamforming optimization framework based on deep reinforcement learning (DRL).
Method The proposed framework first formulates the security energy efficiency (SEE) maximization problem as a Markov decision process. The reward function integrates a core SEE term with penalty terms associated with power constraint violation, quality-of-service and sensing constraint violation, as well as a small incentive for feasible solutions, thereby enabling the agent to learn near-optimal policies under multiple constraints. Second, rate-splitting multiple access (RSMA) is introduced to effectively manage cross-network interference between the ISAC network and a multicast communication network. By splitting user messages into common and private components, RSMA enables flexible interference mitigation with low complexity. Third, the concept of "inherent green interference" is proposed, in which sensing signals are exploited as an effective jamming sources against eavesdroppers. The proximal policy optimization (PPO) algorithm is employed to address the high-dimensional continuous action space. To accelerate training and improve adaptability, a hybrid training mechanism combining supervised pre-training (based on offline data generated by conventional optimization methods) and online fine-tuning is adopted.
Results Simulations are conducted under typical offshore parameters, including a carrier frequency of 18 GHz, a typical transmit power of 35 dBm (within the maximum power limit of 40 dBm), and a multicast user number of N=3. In addition, a parametric sensitivity analysis is performed for N=2 to 12. The proposed DRL−RSMA scheme achieves a median SEE of 2.45 bit/J, representing a 22.5% improvement over the conventional RSMA-based alternating optimization scheme (2.00 bit/J). The online inference latency of DRL−RSMA is only 0.85 ms, with a standard deviation of 0.05 ms, satisfying the sub-millisecond latency requirement for 5GA/6G ultra-reliable low-latency communications. The hybrid training mechanism accelerates convergence by approximately 58.3% compared with conventional DRL approaches. Under periodic topology mutations occurring at time steps 50, 100, and 150, DRL−RSMA maintains an average SEE of 2.41 bit/J, achieving a 14.1% improvement over RSMA-based alternating optimization. In the presence of channel state information (CSI) estimation errors with an error bound of up to 0.3, DRL−RSMA retains 91% of its optimal SEE, demonstrating superior robustness compared with RSMA (83%), NOMA (79%), SDMA (75%), and OMA (69%). Parameter analysis further reveals that SEE initially increases and subsequently decreases with transmit power, reaching a maximum value of 3.10 bit/J at 35 dBm. As the number of multicast users increases from 2 to 8, SEE improves to 2.75 bit/J; however, performance gradually declines when the number exceeds 8. Notably, DRL−RSMA still maintains 94% of its peak performance at N=12, indicating strong scalability.
Conclusion The proposed DRL−RSMA scheme jointly enhances three key performance metrics in complex maritime environments, i.e., SEE, real-time response, and robustness. It provides a novel solution for intelligent resource management in the ISAC and shows strong potential for practical deployment in dynamic maritime scenarios characterized by imperfect CSI and topology variations.