基于改进Deformable DETR的水面目标检测

王鹏九; 龚俊斌; 罗威; 黄骁; 郭俊杰

doi:10.19693/j.issn.1673-3185.03645

基于改进Deformable DETR的水面目标检测

Detection of water surface targets based on improved Deformable DETR

摘要

摘要:
目的旨在提出一种基于改进Deformable DETR的目标检测算法，实现对水面目标的智能识别，并在大幅提升算法模型推理和训练速度的同时提高检测准确率，以实现更加高效鲁棒的水面目标检测。
方法构建一个新的水面目标数据集，使用轻量化的MobileNetV3替换Deformable DETR原有特征提取网络并引入CBAM注意力机制模块，对Deformable DETR算法进行改进。通过在自构建的水面目标数据集和公开数据集ABOships开展消融实验以及横向对比试验验证改进算法的有效性。
结果在自构建数据集和ABOships 2个数据集上的消融实验结果证明，改进算法模型相较原算法模型参数量及大小减少至1/3，mAP_0.5:0.95分别提升2.4%和7.5%，训练耗时分别为原算法的41.7%和51.9%。在ABOships数据集上进行的不同算法性能的对比测试结果进一步证明所提出的改进算法在推理速度和检测精度综合性能上均具有优越性。
结论 DETR类算法在水面目标检测领域具有应用潜力。

Abstract:
Objective With technological advancements and the increasing demand for water resource exploration, water surface target detection plays a crucial role in various applications, such as ship navigation and maritime safety. However, conventional detection methods encounter several challenges, and existing deep-learning-based algorithms have limitations in this field, including limited datasets and insufficient detection speed even after improvement. This study aims to develop an improved object-detection algorithm based on Deformable DETR for automatic recognition of water surface targets. The algorithm is designed to significantly enhance the inference and training speed of the model while improving detection accuracy, thus achieving more efficient and robust detection of water surface targets.
Methods Firstly, a new water surface target dataset was constructed. Then, the original feature-extraction network of Deformable DETR was replaced with the lightweight MobileNetV3. MobileNetV3 is available in multiple versions and is a lightweight network with high recognition accuracy and small model parameters. MobileNetV3-Small version was chosen as the feature-extraction backbone. It has a series of operations, including depth-separable convolution, as well as SE modules and the Hard-swish activation function. To further reduce the model size and enhance the detection ability, three output feature maps from specific modules of MobileNetV3-Small were directly used for multi-scale feature extraction. Secondly, the CBAM attention mechanism module was introduced. CBAM is a lightweight yet versatile module that integrates both channel attention and spatial attention, allowing it to be seamlessly incorporated into the network. By replacing the SE module in MobileNetV3 with CBAM, the model's feature extraction capability was further improved. CBAM's channel attention module applies both average pooling and max pooling to the input feature map, followed by a shared neural network and a Sigmoid function to generate channel-attention features. The spatial attention module first applies pooling operations along the channel dimension of the feature map refined by the channel attention module, followed by convolution and Sigmoid activation to obtain spatial attention features. Finally, the improved Deformable DETR network was obtained by integrating MobileNetV3 and the CBAM attention mechanism module. The input image processed by the the MobileNetV3-Small network embedded with CBAM, from which three multi-scale feature maps are extracted. These feature maps are further refined and then fed into the Transformer structure of the Deformable DETR for further processing.
Results Ablation experiments were carried out on the self-constructed dataset and the ABOships dataset. On the self-constructed dataset, compared with the original Deformable DETR model, the improved algorithm reduced the model's parameter count and size to about one-third. The mAP_0.5:0.95 increased by 2.4%. Training time was reduced to 41.7% of that required by the original algorithm. On the ABOships dataset, the mAP_0.5:0.95 increased by 7.5%, and the training time was reduced to 51.9% of that required by the original. During training, the model's loss function exhibited faster and more stable convergence. In the comparison tests with other common algorithms (YOLOv3, Faster R-CNN, Mask R-CNN) on the ABOships dataset, the improved algorithm demonstrated superior performance. For mAP0.5, it reached 50.0%, higher than the other algorithms. For mAP_0.5:0.95, it was 21.7%, leading in fine-grained detection. The model's parameter count was only 12.9 M, much lower than other models, indicating high parameter efficiency. Although the frame rate was slightly lower than that of YOLOv3 and Faster R-CNN, it was significantly higher than that of Mask R-CNN, maintaining a reasonable processing speed while ensuring high detection accuracy.
Conclusions The improved Deformable DETR algorithm proposed in this paper effectively improves the performance of water surface target detection. It substantially reduces the model's parameter count and storage footprint, accelerates the training and inference speed, and enhances the recognition accuracy. The experimental results on different datasets verify the effectiveness of the algorithm. This study explores a novel approach to applying DETR-class algorithms in water surface target detection, indicating their potential in this field.

HTML全文

参考文献(28)

施引文献

资源附件(0)