A Hybrid CNN-Transformer Model for Multimodal Fake News Detection Using Feature Fusion
Автор: Vineela Krishna. Suri, Prasad. GVSNRV
Журнал: International Journal of Modern Education and Computer Science @ijmecs
Статья в выпуске: 2 vol.18, 2026 года.
Бесплатный доступ
The widespread distribution of fake news poses a critical societal challenge by influencing public opinion and shaping political discourse. Addressing this problem requires models that can capture multimodal cues beyond text alone. This work proposes a lightweight Multimodal Cross-attention Fusion–based Fake News Detection (MCAF-FND) model which combines textual and visual features through cross-attention strategy. The study evaluates MCAF-FND on the Fakeddit benchmark, a large-scale dataset comprising 682,996 multimodal samples collected from social media. Textual features are extracted using DistilBERT, while spatially aware image representations are derived from VGG-19 convolutional layers. The cross-attention module enables semantic alignment between text tokens and image patches, modeling inter-modal dependencies more effectively than conventional fusion strategies. The fused representation is classified using a Multilayer Perceptron(MLP) with softmax, ensuring contributions from both modalities. Experimental results demonstrate that MCAF-FND consistently outperforms unimodal baselines and traditional fusion methods, achieving 93.2% accuracy with strong precision, recall, and F1-score. Cross-attention based visualizations illustrate how the model aligns textual cues with salient visual regions, enhancing interpretability. By combining computational efficiency with robust multimodal reasoning, the proposed approach provides a reliable and extensible solution for automated fake news detection.
Multimodal Data, Fake News Detection, Convolution Neural Networks, Multimodal Fusion, Transformers
Короткий адрес: https://sciup.org/15020236
IDR: 15020236 | DOI: 10.5815/ijmecs.2026.02.08