Innovative Integration of Residual Networks for Enhanced In-loop Filtering in VVC Using Deep Convolutional Neural Networks

Основное

Автор: Ibraheem M.K.I., Dvorkovich A.V., Al-Temimi A.M.S.

Журнал: Компьютерная оптика @computer-optics

Рубрика: Численные методы и анализ данных

Статья в выпуске: 4 т.49, 2025 года.

Бесплатный доступ

This paper explores the integration of Residual Networks (ResNets) into the in-loop filtering (ILF) process of the Versatile Video Coding (VVC) standard, aiming to enhance video compression efficiency and video quality through the application of Deep Convolutional Neural Networks (DCNNs). The study introduces a novel architecture, the Residual Deep Convolutional Neural Network (RDCNN), designed to replace conventional VVC in-loop filtering modules, including Deblocking Filter (DBF), Sample Adaptive Offset (SAO), and Adaptive Loop Filter (ALF). By leveraging the Rate Distortion Optimization (RDO) technique, the RDCNN model is applied to every coding unit (CU) to optimize the balance between video quality and bitrate. The proposed methodology involves offline training with specific parameters using the TensorFlow-GPU platform, followed by feature extraction and prediction of optimal filtering decisions for each video frame during the encoding process. The results demonstrate the effectiveness of the proposed RDCNN in significantly reducing the bitrate while maintaining high visual quality, outperforming existing methods in terms of compression efficiency and peak signal-to-noise ratio (PSNR) values across various video files (YUV color space). Specifically, the RDCNN achieved a YUV PSNR of 41.2 dB and a BD-rate reduction of – 2.43% for the Y component, – 6.96% for the U component, and – 9.43% for the V component. These results underscore the potential of deep learning techniques, particularly ResNets, in addressing the complexities of video compression and enhancing the VVC standard. The evaluation across various YUV video files, including Stefan_cif, Soccer, Mobile, Harbour, Crew, and Bus, revealed consistently higher average YUV PSNR values compared to both VTM 22.2 and other related methods. This indicates not only improved compression efficiency but also enhanced visual quality, crucial for diverse video processing tasks.

Еще

Deep Learning, Residual Deep Convolutional Neural Network, Versatile Video Coding, Video Compression, VTM

Короткий адрес: https://sciup.org/140310513

IDR: 140310513 | DOI: 10.18287/2412-6179-CO-1572

Статья научная