Abstract:
Objective This study proposes a Visual Attribute Prompt Learning (VAPT) mechanism to address challenges such as strong interference, limited data, and inadequate modeling of semantic features in deep learning-based ship image recognition tasks.
Method The framework constructs a large-scale pre-trained visual attribute codebook and incorporates a Multi-head Cross-Attention Mechanism (MCA) for attribute matching and selection, enabling effective alignment with deep visual models to enhance their ability to recognize critical ship features.
Results The proposed method is validated on a meticulously annotated custom ship image dataset, achieving a notable 3.79% improvement in Top-1 accuracy over the baseline Vision Transformer (ViT). On the public FGSCR-42 dataset, the proposed method achieves Top-1 accuracy gains of 2.20% and 0.10% over the state-of-the-art B-CNN and RA-CNN methods, respectively.
Conclusion The research provides a novel technical solution for feature decoupling and knowledge transfer in target recognition tasks under complex marine conditions, offering significant implications for intelligent maritime monitoring systems.