Multimodal Hoax News Detection Using OCR and a Multi-Kernel 1D-CNN Model
Keywords:
Hoax Detection, Fake News Classification, Multimodal OCR, Multi-Kernel 1D-CNN, Deep LearningAbstract
The focus of this research is to develop an autonomous Indonesian hoax news detection system that can leverage textual information from both news articles and text within images or videos. The proposed approach combines Optical Character Recognition (OCR) for extracting text from visual data and a Multi-Kernel One-Dimensional Convolutional Neural Network (Multi-Kernel 1D-CNN) for modeling linguistic patterns across multiple n-gram scales. The dataset is compiled from two credible Indonesian sources: Kompas.com for accurate information and TurnBackHoax.Id to hoaxes that we have verified, which sum up to 24439 labeled samples in a nearly balanced distribution of classes (50.89% valid news and 49.11% hoax news). Model performance is assessed using an 80:20 hold-out split and 10-fold stratified cross-validation. Experiments show that the proposed Multi-Kernel 1D-CNN exhibits powerful performance, but with an average accuracy of 99.918% ± 0.055, precision of 99.950% ± 0.058, recall of 99.883% ± 0.112, and F1-score of 99.917% ±0.056, which consistently outperforms the single-kernel CNN baseline method in different temporal-length test samples we have discussed above. In conclusion, the OCR and Precision–Recall curves also suggest almost perfect separability between the two classes. In summary, OCR with a Multi-Kernel 1D-CNN is an effective and efficient multimodal approach for detecting hoax news. It can be used in real-time decision-support systems for Indonesian online news.
References
[1] E. Aïmeur, Z. Amri, and J. F. S. Ojo, “Fake news, disinformation and misinformation in social media: a review,” Social Network Analysis and Mining, vol. 13, no. 1, 2023, doi: 10.1007/s13278-023-01028-5.
[2] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, 2017, doi: 10.48550/arXiv.1708.01967.
[3] A. Orhan, “Fake news detection on social media: the predictive role of university students’ critical thinking dispositions and new media literacy,” Smart Learning Environments, vol. 10, no. 1, 2023, doi: 10.1186/s40561-023-00248-8.
[4] M. M. S. Nasser et al., “A systematic review of multimodal fake news detection on social media using deep learning models,” Results in Engineering, vol. 26, 2025, art. 104752, doi: 10.1016/j.rineng.2025.104752.
[5] I. Q. Abduljaleel and I. H. Ali, “Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection Methodologies: A Review,” Engineering, Technology & Applied Science Research, vol. 14, no. 4, pp. 15665–15675, 2024, doi: 10.48084/etasr.7907.
[6] A. S. Saini and P. S. Khatarkar, “A Review on Fake News Detection using Machine Learning,” International Journal of Science and Research in Computer Science and Engineering and Information Technology, vol. 9, no. 2, 2023, doi: 10.24113/ijoscience.v9i2.511.
[7] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “FNDNet: A Deep Convolutional Neural Network for Fake News Detection,” Cognitive Systems Research, vol. 61, pp. 32–44, 2020, doi: 10.1016/j.cogsys.2019.12.005.
[8] R. K. Kaliyar, A. Goswami, P. Narang, and V. Chamola, “Understanding the Use and Abuse of Social Media: Generalized Fake News Detection with a Multichannel Deep Neural Network,” IEEE Transactions on Computational Social Systems, vol. 11, no. 4, pp. 4878–4887, 2024, doi: 10.1109/TCSS.2022.3221811.
[9] P. K. Verma, P. Agrawal, V. Madaan, and R. Prodan, “MCred: multi-modal message credibility for fake news detection using BERT and CNN,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, pp. 10617–10629, 2023, doi: 10.1007/s12652-022-04338-2.
[10] A. Al-alshaqi et al., “A BERT-Based Multimodal Framework for Enhanced Fake News Detection Using Text and Image Data Fusion,” Computers, vol. 14, no. 6, 2025, art. 237, doi: 10.3390/computers14060237.
[11] M. Patel and K. Surati, “MTLFND: Multimodal fake news detection using attention mechanism and transfer learning,” Journal of Integrated Science and Technology, vol. 13, 2025, pp. 1–9, doi: 10.62110/sciencein.jist.2025.v13.1138.
[12] D. Mangal and D. K. Sharma, “Fake News Detection with Integration of Embedded Text Cues and Image Features,” in Proc. 8th Int. Conf. on Reliability, Infocom Technologies and Optimization (ICRITO), 2020, pp. 68–72, doi: 10.1109/ICRITO48877.2020.9197817.
[13] A. B. Athira, A. Tiwari, S. D. M. Kumar, and A. M. Chacko, “Multimodal Data Fusion Framework For Fake News Detection,” in INDICON 2022 – IEEE 19th India Council International Conference, 2022, doi: 10.1109/INDICON56171.2022.10039737.
[14] J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR),” IEEE Access, vol. 8, pp. 142642–142668, 2020, doi: 10.1109/ACCESS.2020.3012542.
[15] N. H. Imam, V. G. Vassilakis, and D. Kolovos, “OCR post-correction for detecting adversarial text images,” Journal of Information Security and Applications, vol. 66, 2022, art. 103170, doi: 10.1016/j.jisa.2022.103170.
[16] R. Juneja, G. Sikhwal, and J. Kumar, “Optical Character Recognition (OCR) Using Deep Learning and OpenCV,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 5, no. 3, pp. 553–558, 2019, doi: 10.32628/CSEIT195386.
[17] Y. Zhao, “Researches Advanced in Natural Scenes Text Detection Based on Deep Learning,” Highlights in Science, Engineering and Technology, vol. 16, pp. 36–43, 2025, doi: 10.54097/hset.v16i.2500.
[18] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proc. EMNLP 2014, pp. 1746–1751, 2014, doi: 10.3115/v1/D14-1181.
[19] H. He and Y. Yang, “The Text Classification Method Based on BiLSTM and Multi-Scale CNN,” Computing and Communication, vol. 10, no. 1, 2022, doi: 10.54097/ypxxse31.
[20] Q. Hu, “Cross-Language News Text Classification Using a BERT-MCNN Model,” Molecular & Cellular Biomechanics, vol. 20, no. 2, pp. 1–9, 2025, doi: 10.62617/mcb739.
[21] C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Computer Science, vol. 181, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.
[22] M. K. N. Mursalim and A. Kurniawan, “Multi-kernel CNN block-based detection for COVID-19 with imbalance dataset,” International Journal of Electrical and Computer Engineering, vol. 11, no. 3, pp. 2467–2476, 2021, doi: 10.11591/ijece.v11i3.pp2467-2476.
[23] M. A. Al Amin, D. Suhartono, and E. Budianto, “Indonesian Hoax News Detection Using One-Dimensional Convolutional Neural Network,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), vol. 14, no. 2, 2025, doi: 10.22146/jnteti.v14i2.19050.
[24] B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Computer Science, vol. 179, pp. 704–712, 2021, doi: 10.1016/j.procs.2021.01.059.
[25] R. Yoviananda and F. Fahrudin, “Implementation of Deep Learning to Detect Indonesian Hoax News with CNN Method,” International Journal of Electrical and Electronic Engineering and Information Technology, vol. 4, no. 2, pp. 86–90, 2020, doi: 10.29138/ijeeit.v4i2.1525.
[26] M. Y. Ridho and E. Yulianti, “From Text to Truth: Leveraging IndoBERT and Machine Learning Models for Hoax Detection in Indonesian News,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 10, no. 3, pp. 544–555, 2024, doi: 10.26555/jiteki.v10i3.29450.
[27] D. Y. Yefferson, V. Lawijaya, and A. S. Girsang, “Hybrid model: IndoBERT and long short-term memory for detecting Indonesian hoax news,” IAES International Journal of Artificial Intelligence, vol. 13, no. 2, pp. 1913–1924, 2024, doi: 10.11591/ijai.v13.i2.pp1913-1924.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Seggy Ferdianza Agustin (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

