SANR: Scene-Aware Neural Representation for Light Field Image Compression with Rate-Distortion Optimization

Computer Vision & MultiModal AI

Published: arXiv: 2510.15775v1

Authors

Gai Zhang Xinfeng Zhang Lv Tang Hongyu An Li Zhang Qingming Huang

Abstract

Light field images capture multi-view scene information and play a crucial role in 3D scene reconstruction. However, their high-dimensional nature results in enormous data volumes, posing a significant challenge for efficient compression in practical storage and transmission scenarios. Although neural representation-based methods have shown promise in light field image compression, most approaches rely on direct coordinate-to-pixel mapping through implicit neural representation (INR), often neglecting the explicit modeling of scene structure. Moreover, they typically lack end-to-end rate-distortion optimization, limiting their compression efficiency. To address these limitations, we propose SANR, a Scene-Aware Neural Representation framework for light field image compression with end-to-end rate-distortion optimization. For scene awareness, SANR introduces a hierarchical scene modeling block that leverages multi-scale latent codes to capture intrinsic scene structures, thereby reducing the information gap between INR input coordinates and the target light field image. From a compression perspective, SANR is the first to incorporate entropy-constrained quantization-aware training (QAT) into neural representation-based light field image compression, enabling end-to-end rate-distortion optimization. Extensive experiment results demonstrate that SANR significantly outperforms state-of-the-art techniques regarding rate-distortion performance with a 65.62\% BD-rate saving against HEVC.

Paper Summary

Problem

Light field images capture both spatial and angular information of a scene, but their high-dimensional nature results in massive data volumes, posing significant challenges for storage, transmission, and processing. Traditional image codecs are ill-suited for light field data, and existing neural representation-based methods often neglect the explicit modeling of scene structure, limiting their compression efficiency.

Key Innovation

SANR, a Scene-Aware Neural Representation framework, addresses the limitations of existing methods by introducing a hierarchical scene modeling block that leverages multi-scale latent codes to capture intrinsic scene structures. SANR also incorporates entropy-constrained quantization-aware training (QAT) into neural representation-based light field image compression, enabling end-to-end rate-distortion optimization.

Practical Impact

SANR has the potential to significantly improve the compression performance of light field images, enabling more efficient storage and transmission. With a 65.62% BD-rate saving against HEVC, SANR outperforms state-of-the-art techniques, making it a promising solution for practical applications such as 3D scene reconstruction, depth estimation, and virtual reality.

Analogy / Intuitive Explanation

Imagine trying to compress a high-resolution image of a complex scene, such as a cityscape. Traditional methods might focus on compressing individual pixels, but SANR takes a more holistic approach by modeling the scene's structure and geometry. This allows SANR to capture more information about the scene, resulting in a more efficient compression that preserves the image's details and quality.

Paper Information

Categories:

eess.IV cs.CV cs.MM

Published Date:

arXiv ID:

2510.15775v1

Quick Actions

Back to Home