In aquaculture, precise fish image segmentation is crucial for growth management. However, the intricate underwater environment, plagued by image blurriness and low quality, poses significant challenges to existing segmentation methods, often leading to reduced accuracy and limited generalization capabilities. To address these issues, we propose an underwater fish image segmentation approach based on an improved Segformer model, designated as FT-Segformer (SegFT for brevity). Our methodology meticulously extracts multi-scale features, spanning from fine-grained high resolutions to coarse-grained low resolutions, utilizing a sophisticated four-layered transformer block structure. Within the decoder, a feature pyramid fusion mechanism seamlessly integrates these features, bolstering contextual understanding. Subsequently, transposed convolutions refine the feature maps, restoring their dimensions and amplifying feature learning capabilities. To evaluate the model, we constructed the UAGF (Underwater Aquaculture Goldfish Fishes) dataset, a genuine underwater aquaculture environment dataset featuring ornamental goldfish, and conducted extensive validation experiments thereon. The experimental results demonstrate that SegFT outperforms existing methods across evaluation metrics such as mIoU, mPA, and mRecall, achieving improvements of 1.76%, 0.39%, and 0.19%, respectively. Notably, in terms of mIoU, SegFT surpasses U-Net, PSPNet, HRNet, and Deeplabv3+ by impressive margins of 1.92%, 3.73%, 3.07%, and 3.58%, respectively. This study underscores the remarkable effectiveness and robustness of our proposed method in complex underwater settings, outperforming existing supervised image segmentation techniques in terms of segmentation performance.