Fish feeding behavior recognition based on sonar images and dual-stream spatio-temporal attention

doi:10.26958/j.cnki.1007-9580.2025.06.014

Fishery Modernization ›› 2025, Vol. 52 ›› Issue (6): 115-122. doi: 10.26958/j.cnki.1007-9580.2025.06.014

Fish feeding behavior recognition based on sonar images and dual-stream spatio-temporal attention

WANG Zhijun1,2, ZHAO Xia1（1 School of Electronic and Information Engineering, Tongji University, Shanghai 201804,China;#br# 2Fishery Machinery and Instrument Research Institute, Chinese Academy of Fishery Sciences, Shanghai 200092,China)

（1 School of Electronic and Information Engineering, Tongji University, Shanghai 201804,China;
2Fishery Machinery and Instrument Research Institute, Chinese Academy of Fishery Sciences, Shanghai 200092,China)

Online:2025-12-20 Published:2025-12-26

基于声呐图像与双流时空注意力的鱼类摄食行为识别

王志俊1,2，赵霞1（1 同济大学电子与信息工程学院，上海 201804；
2 中国水产科学研究院渔业机械仪器研究所，上海 200092）

（1 同济大学电子与信息工程学院，上海 201804；
2 中国水产科学研究院渔业机械仪器研究所，上海 200092）

通讯作者: 赵霞（1974—），女, 博士，副教授，研究方向：控制算法、深度学习。E-mail: zhaoxia@tongji.edu.cn
作者简介:王志俊 (1990—)，男，硕士研究生，研究方向：深度学习、信号处理。E-mail: wang_zhijun@tongji.edu.cn
基金资助:
国家重点研发计划（2023YFD2401304）

Abstract

Abstract: Aiming at the problems of significant noise interference in sonar images and insufficient representation capability under small-sample conditions in fish feeding behavior recognition, this paper proposes a dual-stream spatio-temporal attention network that fuses domain knowledge and deep features. First, an improved wavelet filtering algorithm is proposed to effectively remove bubble noise in sonar images. Then, a dual-stream feature fusion architecture is designed, where the statistical feature stream includes 6-dimensional features such as target quantity and spacing standard deviation, and the deep feature stream extracts high-order semantic features of sonar images through the Residual Network (ResNet18). Meanwhile, a Long Short-Term Memory network (LSTM) is introduced to capture the temporal dependency of behavior sequences, and a spatio-temporal cross-attention mechanism is combined to adaptively focus on key frames and target areas. Experiments on the self-built dataset show that the classification accuracy of this network reaches 77.0%, among which wavelet denoising, dual-stream fusion, and spatio-temporal attention mechanism contribute precision improvements of 1.8%, 5.9%, and 2.8% respectively, verifying the effectiveness of each component. This study provides a new method for underwater target behavior recognition.

Key words: sonar image, wavelet denoising, feature fusion, LSTM, spatio-temporal cross-attention

摘要： 针对鱼类摄食行为识别中存在的声呐图像噪声干扰显著、小样本条件下表征能力不足等问题，本研究提出一种融合领域知识与深度特征的双流时空注意力网络。首先提出改进的小波滤波算法，有效去除声呐图像中的气泡噪声。接着设计了双流特征融合架构，其中，统计特征流包含目标数量、间距标准差等6维特征，深度特征流通过残差网络（ResNet18）提取声呐图像的高阶语义特征。同时引入长短期记忆网络（LSTM）捕获行为序列的时序依赖性，并结合时空交叉注意力机制自适应聚焦关键帧与目标区。在自建数据集上试验结果显示，本网络的分类准确率达77.0%，其中小波去噪、双流融合和时空注意力机制分别贡献了1.8%、5.9%和2.8% 的精度提升，验证了各组件的有效性。该研究为基于图像声呐的水下目标行为识别提供了新方法。

关键词: 声呐图像, 小波去噪, 特征融合, LSTM, 时空交叉注意力

WANG Zhijun1, 2, ZHAO Xia1. Fish feeding behavior recognition based on sonar images and dual-stream spatio-temporal attention[J]. Fishery Modernization, 2025, 52(6): 115-122.

王志俊1, 2, 赵霞1. 基于声呐图像与双流时空注意力的鱼类摄食行为识别[J]. 渔业现代化, 2025, 52(6): 115-122.

/ / Recommend / Download Citations

URL: https://fm.fmiri.ac.cn/EN/10.26958/j.cnki.1007-9580.2025.06.014

https://fm.fmiri.ac.cn/EN/Y2025/V52/I6/115

[1]	ZHOU Tao1, ZHAO Shuang1, MIAO Yubin2. Prediction of feeding volume of Penaeus vannamei shrimp based on GA-LSTM-ATTN [J]. Fishery Modernization, 2025, 52(6): 106-114.
[2]	PAN Guangzhen, WANG Xuankai, LI Ziyue. Improving the underwater biological object detection algorithm of RT-DETR [J]. Fishery Modernization, 2025, 52(5): 107-116.
[3]	SUI Jianghua, ZHANG Yanxu. Prediction of fishing vessel mooring trajectory based on beidou ship position data [J]. Fishery Modernization, 2025, 52(4): 132-.
[4]	SU Biyi, MEI Haibin, YUAN Hongchun. Image segmentation method for underwater aquaculture fish based on segformer and feature fusion [J]. Fishery Modernization, 2024, 51(6): 80-90.
[5]	YUAN Hongchun, ZHANG Yong, ZHANG Tianjiao. Research on forecast model of pacific Thunnus obesus fishing ground based on EMD-BiLSTM [J]. Fishery Modernization, 2021, 48(1): 87-95.
[6]	SHEN Wei1, 2, ZHU Zhenhong1, ZHANG Jin1, et al. Fish target recognition and counting based on Dual-frequency Identification Sonar [J]. Fishery Modernization, 2020, 47(6): 81-87.
[7]	SHANG Yanhong, ZHANG Jing. Aquaculture water quality prediction based on local Bi-LSTM and state transformation constraint#br# [J]. Fishery Modernization, 2019, 46(2): 28-34.

Fish feeding behavior recognition based on sonar images and dual-stream spatio-temporal attention

基于声呐图像与双流时空注意力的鱼类摄食行为识别

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 7

Recommended Articles

Metrics

Comments