基于声呐图像与双流时空注意力的鱼类摄食行为识别

展开
  • (1 同济大学电子与信息工程学院,上海 201804;
    2 中国水产科学研究院渔业机械仪器研究所,上海 200092)
王志俊 (1990—),男,硕士研究生,研究方向:深度学习、信号处理。E-mail: wang_zhijun@tongji.edu.cn

网络出版日期: 2025-12-26

基金资助

国家重点研发计划(2023YFD2401304)

Fish feeding behavior recognition based on sonar images and dual-stream spatio-temporal attention

Expand
  • (1 School of Electronic and Information Engineering, Tongji University, Shanghai 201804,China;
    2Fishery Machinery and Instrument Research Institute, Chinese Academy of Fishery Sciences, Shanghai 200092,China)

Online published: 2025-12-26

摘要

针对鱼类摄食行为识别中存在的声呐图像噪声干扰显著、小样本条件下表征能力不足等问题,本研究提出一种融合领域知识与深度特征的双流时空注意力网络。首先提出改进的小波滤波算法,有效去除声呐图像中的气泡噪声。接着设计了双流特征融合架构,其中,统计特征流包含目标数量、间距标准差等6维特征,深度特征流通过残差网络(ResNet18)提取声呐图像的高阶语义特征。同时引入长短期记忆网络(LSTM)捕获行为序列的时序依赖性,并结合时空交叉注意力机制自适应聚焦关键帧与目标区。在自建数据集上试验结果显示,本网络的分类准确率达77.0%,其中小波去噪、双流融合和时空注意力机制分别贡献了1.8%、5.9%和2.8% 的精度提升,验证了各组件的有效性。该研究为基于图像声呐的水下目标行为识别提供了新方法。

本文引用格式

王志俊1, 2, 赵霞1 . 基于声呐图像与双流时空注意力的鱼类摄食行为识别[J]. 渔业现代化, 2025 , 52(6) : 115 -122 . DOI: 10.26958/j.cnki.1007-9580.2025.06.014

Abstract

Aiming at the problems of significant noise interference in sonar images and insufficient representation capability under small-sample conditions in fish feeding behavior recognition, this paper proposes a dual-stream spatio-temporal attention network that fuses domain knowledge and deep features. First, an improved wavelet filtering algorithm is proposed to effectively remove bubble noise in sonar images. Then, a dual-stream feature fusion architecture is designed, where the statistical feature stream includes 6-dimensional features such as target quantity and spacing standard deviation, and the deep feature stream extracts high-order semantic features of sonar images through the Residual Network (ResNet18). Meanwhile, a Long Short-Term Memory network (LSTM) is introduced to capture the temporal dependency of behavior sequences, and a spatio-temporal cross-attention mechanism is combined to adaptively focus on key frames and target areas. Experiments on the self-built dataset show that the classification accuracy of this network reaches 77.0%, among which wavelet denoising, dual-stream fusion, and spatio-temporal attention mechanism contribute precision improvements of 1.8%, 5.9%, and 2.8% respectively, verifying the effectiveness of each component. This study provides a new method for underwater target behavior recognition.

文章导航

/