0

YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

This paper presents a rigorous empirical evaluation of Ultralytics YOLO26 against the YOLOv8 baseline, offering an independent real-world stress test of NMS-free architectures on non-COCO distributions.

Preview
Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2605.24831CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

This paper presents a rigorous empirical evaluation of Ultralytics YOLO26 against the YOLOv8 baseline, offering an independent real-world stress test of NMS-free architectures on non-COCO distributions. Engineered for edge deployment, YOLO26 introduces native end-to-end one-to-one label assignment, the removal of Distribution Focal Loss (DFL), and a spectral-constrained CSP-Muon backbone. We conducted a comprehensive, cross-scale comparative analysis across five model capacities, using the general object detection (Pascal VOC) and dense aerial small-object detection (VisDrone) datasets. Models are evaluated across accuracy (mAP_50 and mAP_50:95), model complexity, and hardware-specific CPU/GPU latency. Our findings revealed that while YOLO26 achieves a lower computational footprint and superior accuracy on Pascal VOC, with YOLO26-x reaching 0.635 mAP_50:95, this advantage narrows in dense aerial environments. On VisDrone, where over 75% of objects are under 2,000 pixels, both architectures struggle significantly, yielding a minimal performance gap (0.214 mAP_50:95 for YOLOv8-x vs. 0.224 mAP_50:95 for YOLO26-x). Crucially, hardware benchmarking demonstrates that YOLOv8 maintains a consistent edge in GPU inference latency across identical scales (e.g., 6.92 ms for YOLOv8-s vs. 8.38 ms for YOLO26-s), showing that NMS-free design does not inherently guarantee superiority in universal deployment. This work maps the operational boundaries of NMS-free frameworks to guide architecture selection based on dataset density, object scale, and hardware constraints.