Abstract:
Objective With technological advancements and the increasing demand for water resource exploration, water surface target detection plays a crucial role in various applications, such as ship navigation and maritime safety. However, conventional detection methods encounter several challenges, and existing deep-learning-based algorithms have limitations in this field, including limited datasets and insufficient detection speed even after improvement. This study aims to develop an improved object-detection algorithm based on Deformable DETR for automatic recognition of water surface targets. The algorithm is designed to significantly enhance the inference and training speed of the model while improving detection accuracy, thus achieving more efficient and robust detection of water surface targets.
Methods Firstly, a new water surface target dataset was constructed. Then, the original feature-extraction network of Deformable DETR was replaced with the lightweight MobileNetV3. MobileNetV3 is available in multiple versions and is a lightweight network with high recognition accuracy and small model parameters. MobileNetV3-Small version was chosen as the feature-extraction backbone. It has a series of operations, including depth-separable convolution, as well as SE modules and the Hard-swish activation function. To further reduce the model size and enhance the detection ability, three output feature maps from specific modules of MobileNetV3-Small were directly used for multi-scale feature extraction. Secondly, the CBAM attention mechanism module was introduced. CBAM is a lightweight yet versatile module that integrates both channel attention and spatial attention, allowing it to be seamlessly incorporated into the network. By replacing the SE module in MobileNetV3 with CBAM, the model's feature extraction capability was further improved. CBAM's channel attention module applies both average pooling and max pooling to the input feature map, followed by a shared neural network and a Sigmoid function to generate channel-attention features. The spatial attention module first applies pooling operations along the channel dimension of the feature map refined by the channel attention module, followed by convolution and Sigmoid activation to obtain spatial attention features. Finally, the improved Deformable DETR network was obtained by integrating MobileNetV3 and the CBAM attention mechanism module. The input image processed by the the MobileNetV3-Small network embedded with CBAM, from which three multi-scale feature maps are extracted. These feature maps are further refined and then fed into the Transformer structure of the Deformable DETR for further processing.
Results Ablation experiments were carried out on the self-constructed dataset and the ABOships dataset. On the self-constructed dataset, compared with the original Deformable DETR model, the improved algorithm reduced the model's parameter count and size to about one-third. The mAP0.5:0.95 increased by 2.4%. Training time was reduced to 41.7% of that required by the original algorithm. On the ABOships dataset, the mAP0.5:0.95 increased by 7.5%, and the training time was reduced to 51.9% of that required by the original. During training, the model's loss function exhibited faster and more stable convergence. In the comparison tests with other common algorithms (YOLOv3, Faster R-CNN, Mask R-CNN) on the ABOships dataset, the improved algorithm demonstrated superior performance. For mAP0.5, it reached 50.0%, higher than the other algorithms. For mAP0.5:0.95, it was 21.7%, leading in fine-grained detection. The model's parameter count was only 12.9 M, much lower than other models, indicating high parameter efficiency. Although the frame rate was slightly lower than that of YOLOv3 and Faster R-CNN, it was significantly higher than that of Mask R-CNN, maintaining a reasonable processing speed while ensuring high detection accuracy.
Conclusions The improved Deformable DETR algorithm proposed in this paper effectively improves the performance of water surface target detection. It substantially reduces the model's parameter count and storage footprint, accelerates the training and inference speed, and enhances the recognition accuracy. The experimental results on different datasets verify the effectiveness of the algorithm. This study explores a novel approach to applying DETR-class algorithms in water surface target detection, indicating their potential in this field.