ZHANG Xin, YU Hong, WU Zijian, CHENG Zhiao, GAO Chencheng, YANG Zongyi, WANG Yue
For factory farming environments characterized by relatively limited computing resources, enhancing identification speed and reducing model size while ensuring high accuracy has emerged as an urgent technical challenge. This necessitates the optimization of both operational efficiency and model scale to accommodate various constraints inherent in practical applications, all while maintaining a high level of accuracy. Traditional object recognition algorithms, such as those within the YOLO family, although demonstrating excellent performance across multiple application scenarios, typically depend on powerful computing platforms capable of supporting their extensive model parameters and complex computational processes. The substantial computational demands present a significant bottleneck for factory farming settings where equipment is often ill-equipped to manage
large deep learning models; yet real-time monitoring and rapid response are essential for effective management of fish growth and health. To address this issue, existing research into lightweight models offers potential solutions. However, these approaches frequently compromise a certain degree of recognition accuracy in their pursuit of smaller model sizes and reduced computational costs. In the context of the factory farming industry, such declines in accuracy are unacceptable since precise fish monitoring directly impacts both farming effectiveness and economic viability. Consequently, there is a pressing need to explore innovative methods that can significantly reduce model complexity without substantially affecting recognition performance. In light of this necessity, we propose an innovative lightweight fish school identification model named FasterYOLOv9-Slim. This model aims to achieve efficient operation alongside a lightweight design while ensuring high precision through a series of targeted improvements and optimizations.Specifically, the FasterYOLOv9-Slim model is based on the improvement of YOLOv9 and FasterNet, aiming to solve the balance between recognition accuracy and speed through a series of innovative network architecture optimizations. First, on the basis of YOLOv9 model, FasterNet is introduced as a lightweight backbone network to replace the complex backbone structure used in the original model. The core of this change is to take advantage of the more efficient network structure of FasterNet to effectively reduce the large number of parameters and computational requirements brought by the traditional convolutional operation of the YOLOv9 backbone network, which not only significantly improves the detection speed of the model, but also reduces the burden of the model while maintaining a good recognition ability of the target, ensuring the practicality and effectiveness of the model in practical applications. Secondly, in order to further optimize the overall performance of the model, high-dimensional detector head pruning (HDPrune) is used to prune a pair of high-dimensional detector heads in the head network of the YOLOv9 model to reduce the depth of the network to reduce the amount of computation, which not only helps to speed up the operation of the model, but also effectively reduces the accumulation of interfering information in the network to ensure the stability of the model in the face of complex background. The model is stabilized in the face of complex background. Finally, the original feature fusion module RepNCSPELAN4 is improved based on partial convolution (PConv) to obtain a lighter and more efficient version of FasterRepNCSPELAN4, and combined with the advanced downsampling modules ADown and DownSimper, the neck network structure of the model is redesigned to construct a more efficient feature fusion framework. fusion framework. While reducing the computational amount, the network's ability to express features at different scales is enhanced, and the model's efficient coordination between feature extraction, fusion, information transfer, and detection head output is realized.To fully verify the effectiveness of the improvement, ablation and comparison tests are designed. In the ablation test, the effects of three key improvements, FasterNet, high-dimensional detector head pruning (HDPrune), and DFA-Neck, on the model performance are tested, and the results show that FasterNet effectively reduces the number of parameters in the model, proving its significant advantage in reducing the consumption of computational resources, while HDPrune plays an important role in weakening the accumulation of interfering information in the network, and DFA-Neck plays an important role in weakening the accumulation of interfering information in the network. plays an important role in this regard, and DFA-Neck successfully coordinates the functions of FasterNet and HDPrune in the overall network, ensuring the model's high efficiency in the process of feature extraction and information transfer. In the comparison test, the model was compared in detail with the advanced identification models of the same size in the YOLOv7, YOLOv8 and YOLOv10 series models in terms of performance, and the results showed that FasterYOLOv9-Slim achieved 34.14%, 64.02% and 22.22% significant reductions, and demonstrates excellent overall performance in terms of model size, inference speed, and recognition accuracy in comparison with state-of-the-art lightweight networks such as ShuffleNet, MobileNet, and RepViT. The study verifies the effectiveness of the FasterYOLOv9-Slim model in balancing accuracy and speed in the fish identification task under factory farming conditions with limited computational resources, and also provides valuable experience and guidance for the design and optimization of the model in similar application scenarios in the future.