Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded Refinement

Agentic AI

Published: arXiv: 2604.16177v1

Authors

Lorenzo Beltrame Jules Salzinger Filip Svoboda Jasmin Lampert Phillipp Fanta-Jende Radu Timofte Marco Koerner

Abstract

We present a three-stage progressive shadow-removal pipeline for the CVPR2026 NTIRE WSRD+ challenge. Built on OmniSR, our method treats deshadowing as iterative direct refinement, where later stages correct residual artefacts left by earlier predictions. The model combines RGB appearance with frozen DINOv2 semantic guidance and geometric cues from monocular depth and surface normals, reused across all stages. To stabilise multi-stage optimisation, we introduce a contraction-constrained objective that encourages non-increasing reconstruction error across the cascade. A staged training pipeline transfers from earlier WSRD pretraining to WSRD+ supervision and final WSRD+ 2026 adaptation with cosine-annealed checkpoint ensembling. On the official WSRD+ 2026 hidden test set, our final ensemble achieved 26.680 PSNR, 0.8740 SSIM, 0.0578 LPIPS, and 26.135 FID, ranked first overall, and won the NTIRE 2026 Image Shadow Removal Challenge. The strong performance of the proposed model is further validated on the ISTD+ and UAV-SC+ datasets.

Paper Summary

Problem

Image shadow removal is a challenging low-level vision problem that affects various applications, such as video analysis, traffic monitoring, and remote sensing. Shadows can corrupt local illumination, suppress visible texture, and reduce the reliability of downstream perception systems. In natural images, deshadowing is particularly difficult because the model must separate illumination changes from intrinsic scene appearance while preserving texture, geometry, and object boundaries.

Key Innovation

The proposed method introduces a three-stage progressive shadow-removal pipeline, which treats deshadowing as iterative direct refinement. The pipeline combines RGB appearance with frozen DINOv2 semantic guidance and geometric cues from monocular depth and surface normals. This approach stabilizes multi-stage optimization and promotes monotonic improvement across stages. The method also presents a staged pretraining regime that transfers from earlier imperfectly aligned WSRD data to aligned WSRD+ data and then adapts to the WSRD+ 2026 distribution.

Practical Impact

The proposed method has several practical implications. Firstly, it achieves state-of-the-art performance on the NTIRE 2026 WSRD+ challenge, confirming the effectiveness of progressive direct refinement for high-quality shadow removal under challenging illumination and distribution shift. Secondly, the method generalizes beyond the challenge benchmark, achieving the best PSNR among compared mask-free methods on ISTD+ and the best PSNR overall on UAV-SC+. This suggests that the proposed model learns a transferable restoration prior rather than a dataset-specific solution.

Analogy / Intuitive Explanation

Imagine trying to remove a shadow from a photograph using a single step. It's like trying to paint a picture without any reference to the original colors. The proposed method is like taking a series of small steps to refine the shadow removal, each step building on the previous one. In the first step, the model uses RGB appearance to remove the shadow. In the second step, it uses frozen DINOv2 semantic guidance to refine the removal. And in the third step, it uses geometric cues from monocular depth and surface normals to further refine the removal. This multi-stage approach allows the model to learn a more accurate and detailed representation of the scene, resulting in better shadow removal.

Paper Information

Categories:

cs.CV

Published Date:

arXiv ID:

2604.16177v1

Quick Actions

Back to Home