Low-resolution remote sensing small object detection is limited by both missing visual details and the ambiguity of how details serve detection. Existing super-resolution-assisted detectors generally follow a restoration-first paradigm to explicitly enhance inputs before detection, which implicitly assumes visual fidelity benefits recognition. Yet super-resolution favors dense texture and edge recovery, while object detection relies on sparse instance-level semantics, making restoration amplify visually plausible but semantically irrelevant background textures. To tackle this issue, we propose CoLR-Det, a Collaborative Latent-Restoration-Assisted Small Object Detection framework that treats super-resolution supervision as detection-oriented latent regularization rather than explicit image-level enhancement. Instead of reconstructing high-resolution images for inference, CoLR-Det uses a training-only restoration branch to impose auxiliary reconstruction constraints on shared multiscale representations, and the inference pathway remains purely detection-driven. We further design a saliency-guided object-preserving token routing mechanism, which prioritizes high-saliency tokens for attention-based refinement while retaining information of bypassed tokens. Besides, a detection-prioritized two-stage optimization strategy is developed: it first builds stable object-level semantics before introducing restoration supervision, and assigns a smaller learning rate to the SR decoder to keep its updates conservative and reduce perturbations in collaborative training. With this design, CoLR-Det transforms restoration from an explicit visual enhancement operator into an implicit semantic regularizer. Experiments on resolution-degraded NWPU VHR-10-Split, DOTAv1.5-Split and HRSSD-Split show that CoLR-Det outperforms state-of-the-art methods, with code available at https://github.com/qiruo-ya/CoLR-Det.
CoLR-Det: Collaborative Latent Restoration for Small Object Detection in Low-Resolution Remote Sensing Images
Low-resolution remote sensing small object detection is limited by both missing visual details and the ambiguity of how details serve detection. Existing super-resolution-assisted detectors generally follow a restoration-first paradigm to explicitly enhance inputs before…
- Preview

- Year
- 2026
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2601.12507ARXIV-DEFAULT
- TL;DR
- Semantic Scholar