基于深度强化学习的安全绿色近海通信感知一体化波束赋形优化

李旭东; 赵晓楠; 荣寒潇; 崔杨; 姚如贵

doi:10.19693/j.issn.1673-3185.04890

基于深度强化学习的安全绿色近海通信感知一体化波束赋形优化

Secure and green beamforming optimization for maritime integrated sensing and communications based on deep reinforcement learning

摘要

摘要:
目的近海通信感知一体化（ISAC）系统在动态异构网络环境下可靠发挥通信和感知功能时，面临节点机动频繁、信道时变强干扰、跨网窃听威胁等多重挑战。在此背景下，传统优化方法在设计通信与感知方案时存在计算复杂度高且难以实时响应的问题。
方法提出一种基于深度强化学习（DRL）的智能波束赋形优化框架。首先，将安全能效最大化问题建模为马尔可夫决策过程，设计复合奖励函数引导策略优化。其次，引入速率分割多址接入（RSMA）以精细化管理跨网干扰，并利用感知信号作为针对窃听者的“内生绿色干扰”，在无需额外功耗的前提下提升物理层安全。最后，采用近端策略优化，结合监督预训练与在线微调的混合训练机制，实现快速收敛与动态自适应。
结果通过载波频率18 GHz、发射功率35 dBm、用户数3的典型近海参数仿真，结果表明：所提方案的安全能效较传统的RSMA交替优化方案提升22.5%，收敛速度提升58.3%，且在信道估计误差与拓扑突变场景下表现出更强的鲁棒性。
结论所提DRL−RSMA智能波束赋形方法可在复杂海洋环境中实现安全能效、实时响应、鲁棒性能三类性能的联合改善，为ISAC的智能资源管控提供了新思路。

Abstract:
Objective An integrated sensing and communications (ISAC) system operating in a dynamic heterogeneous maritime network environment faces multiple challenges, including frequent node mobility, severe time-varying channel interference, and cross-network eavesdropping threats. Conventional optimization methods for sensing and communication scheme design suffer from high computational complexity and lack the capability for real-time adaptation. To address these limitations, this paper proposes an intelligent beamforming optimization framework based on deep reinforcement learning (DRL).
Method The proposed framework first formulates the security energy efficiency (SEE) maximization problem as a Markov decision process. The reward function integrates a core SEE term with penalty terms associated with power constraint violation, quality-of-service and sensing constraint violation, as well as a small incentive for feasible solutions, thereby enabling the agent to learn near-optimal policies under multiple constraints. Second, rate-splitting multiple access (RSMA) is introduced to effectively manage cross-network interference between the ISAC network and a multicast communication network. By splitting user messages into common and private components, RSMA enables flexible interference mitigation with low complexity. Third, the concept of "inherent green interference" is proposed, in which sensing signals are exploited as an effective jamming sources against eavesdroppers. The proximal policy optimization (PPO) algorithm is employed to address the high-dimensional continuous action space. To accelerate training and improve adaptability, a hybrid training mechanism combining supervised pre-training (based on offline data generated by conventional optimization methods) and online fine-tuning is adopted.
Results Simulations are conducted under typical offshore parameters, including a carrier frequency of 18 GHz, a typical transmit power of 35 dBm (within the maximum power limit of 40 dBm), and a multicast user number of N=3. In addition, a parametric sensitivity analysis is performed for N=2 to 12. The proposed DRL−RSMA scheme achieves a median SEE of 2.45 bit/J, representing a 22.5% improvement over the conventional RSMA-based alternating optimization scheme (2.00 bit/J). The online inference latency of DRL−RSMA is only 0.85 ms, with a standard deviation of 0.05 ms, satisfying the sub-millisecond latency requirement for 5GA/6G ultra-reliable low-latency communications. The hybrid training mechanism accelerates convergence by approximately 58.3% compared with conventional DRL approaches. Under periodic topology mutations occurring at time steps 50, 100, and 150, DRL−RSMA maintains an average SEE of 2.41 bit/J, achieving a 14.1% improvement over RSMA-based alternating optimization. In the presence of channel state information (CSI) estimation errors with an error bound of up to 0.3, DRL−RSMA retains 91% of its optimal SEE, demonstrating superior robustness compared with RSMA (83%), NOMA (79%), SDMA (75%), and OMA (69%). Parameter analysis further reveals that SEE initially increases and subsequently decreases with transmit power, reaching a maximum value of 3.10 bit/J at 35 dBm. As the number of multicast users increases from 2 to 8, SEE improves to 2.75 bit/J; however, performance gradually declines when the number exceeds 8. Notably, DRL−RSMA still maintains 94% of its peak performance at N=12, indicating strong scalability.
Conclusion The proposed DRL−RSMA scheme jointly enhances three key performance metrics in complex maritime environments, i.e., SEE, real-time response, and robustness. It provides a novel solution for intelligent resource management in the ISAC and shows strong potential for practical deployment in dynamic maritime scenarios characterized by imperfect CSI and topology variations.

HTML全文

参考文献(35)

施引文献

资源附件(0)