نبذة مختصرة : Accurate and efficient detection of red fruits in complex orchard environments is crucial for the autonomous operation of agricultural harvesting robots. However, existing methods still face challenges such as high false negative rates, poor localization accuracy, and difficulties in edge deployment in real-world scenarios involving occlusion, strong light reflection, and drastic scale changes. To address these issues, this paper proposes a lightweight multi-attention detection framework, EdgeFormer-YOLO. While maintaining the efficiency of the YOLO series’ single-stage detection architecture, it introduces a multi-head self-attention mechanism (MHSA) to enhance the global modeling capability for occluded fruits and employs a hierarchical feature fusion strategy to improve multi-scale detection robustness. To further adapt to the quantitative deployment requirements of edge devices, the model introduces the arsinh activation function, improving numerical stability and convergence speed while maintaining a non-zero gradient. On the red fruit dataset, EdgeFormer-YOLO achieves 95.7% mAP@0.5, a 2.2 percentage point improvement over the YOLOv8n baseline, while maintaining 90.0% precision and 92.5% recall. Furthermore, on the edge GPU, the model achieves an inference speed of 148.78 FPS with a size of 6.35 MB, 3.21 M parameters, and a computational overhead of 4.18 GFLOPs, outperforming some existing mainstream lightweight YOLO variants in both speed and mAP@50. Experimental results demonstrate that EdgeFormer-YOLO possesses comprehensive advantages in real-time performance, robustness, and deployment feasibility in complex orchard environments, providing a viable technical path for agricultural robot vision systems.
No Comments.