Comparison of convolutional networks and transformers for generating 3D-models via binary space partitioning from a single object image

Автор: Gribanov D.N., Kilbas I.A., Mukhin A.V., Paringer R.A., Kupriyanov A.V.

Журнал: Компьютерная оптика @computer-optics

Рубрика: XI International conference on information technology and nanotechnology

Статья в выпуске: 6 т.49, 2025 года.

Бесплатный доступ

This study explores the use of transformer architecture as an image encoder model for the task of 3D mesh generation from a single image. Traditionally, models based on autoencoder architecture perform such tasks, where an encoder produces a latent representation that a decoder subsequently converts into a 3D model. When processing image-based input, the ResNet18 convolutional network is a commonly used encoder. In this paper, we investigate replacing the convolutional network with a transformer-based approach while using binary space partitioning (BSP) for 3D object generation. Our experiments demonstrate that a transformer-based architecture, specifically the Compact Convolutional Transformer (CCT), can achieve performance comparable to its convolutional counterpart and exceed it both in quantitative metrics and visual quality. The best CCT-based model achieves a Chamfer Distance (CD) of 1.59 and a Light Field Distance (LFD) of 3907, whereas the convolutional variant attains a CD of 1.64 and an LFD of 3981. The CCT-based model also demonstrates superior 3D reconstruction quality on test samples. Additionally, the transformer model requires four times fewer parameters to achieve these results, though computational resources are two times higher in terms of Multiply-Accumulate operations (MACs). These findings indicate that the transformer-based model is more parameter-efficient and can achieve superior results compared to traditional convolutional networks in single-view reconstruction tasks.

Еще

Computer vision, 3D model, neural network, transformer, convolutional network, vector representation, latent vector

Короткий адрес: https://sciup.org/140313288

IDR: 140313288   |   DOI: 10.18287/COJ1863