WANG G J, SHI W Z, LUO W, et al. 3D semantic-enhanced diffusion policy for intelligent shipborne robotsJ. Chinese Journal of Ship Research, 2026, 21(X): 1–14 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04806
Citation: WANG G J, SHI W Z, LUO W, et al. 3D semantic-enhanced diffusion policy for intelligent shipborne robotsJ. Chinese Journal of Ship Research, 2026, 21(X): 1–14 (in Chinese). DOI: 10.19693/j.issn.1673-3185.04806

3D semantic-enhanced diffusion policy for intelligent shipborne robots

  • Objective With the advancement of artificial intelligence and sensor fusion technologies, shipborne robots endowed with target recognition and autonomous execution capabilities are emerging as a key development direction for future combat systems. However, during autonomous operations, such robots are constrained by limited onboard computing resources and deployment conditions, which impede the realization of high-level cognition and decision-making. To address these challenges, this study proposes a lightweight 3D semantic-enhanced framework centered on a diffusion-based policy (SGDP). The proposed framework is designed to ensure recognition accuracy, response speed, and decision-making stability under resource-constrained conditions and in the presence of dynamic disturbances.
    Methods First, a semantic projection mechanism based on 3D Gaussian splatting is introduced to construct dense semantic point clouds centered on target objects. Multi-view semantic fusion is utilized to achieve object-level 3D semantic representations, providing more effective semantic priors for complex manipulation tasks. Subsequently, the semantic field is updated in real-time using the FoundationPose estimator for 6D pose estimation, eliminating the need for repeated extraction of multi-view semantic features, thereby improving inference efficiency while maintaining semantic consistency in dynamic scenes. Furthermore, a multimodal diffusion policy that integrates semantic, geometric, and joint state information is designed to enhance semantic perception while remaining lightweight enough for practical deployment.
    Results Evaluations conducted on a shipborne robot test platform involving three complex tasks—knife placement, marker pen grasping, and water pouring from a bottle—demonstrate that the proposed SGDP algorithm, using only a single depth camera for environmental perception, achieved semantic field update and action inference frequencies of 39.71 Hz and 32.16 Hz, respectively. This represents a fivefold improvement over the GenDP baseline, enabling real-time closed-loop control on computationally constrained platforms. In addition, a 20.25% reduction in GPU memory usage was observed compared to GenDP, indicating a substantially lower computational cost while delivering improved performance. For tasks involving known objects, an average success rate of 81.67% was achieved, while a success rate of 78.33% was maintained for tasks involving unknown objects. These results demonstrate strong zero-shot generalization capability and reduced reliance on extensive task-specific data collection.
    Conclusion The results indicate that the proposed framework offers an efficient and viable integrated perception-decision solution, enabling effective synergy between environmental cognition and autonomous decision-making under resource-constrained conditions. Thus, an effective technical pathway is provided to alleviate the inherent conflict between lightweight deployment requirements and high-level autonomous decision-making capabilities in single-unit unmanned systems, such as shipborne robots.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return