Intelligent Document Processing Berbasis OCR + Transformers dan CNN untuk Verifikasi Dokumen Bantuan Pangan

David Bagas Santoso(1*),Muhammad Fachrie Fachrie(2)
(1) 
(2) Universitas Teknologi Yogyakarta
(*) Corresponding Author
DOI : 10.35889/jutisi.v14i3.3351

Abstract

The rice assistance program is a government initiative to alleviate poverty by distributing rice to low-income households. This study aims to design and implement an Intelligent Document Processing system based on Convolutional Neural Networks (CNN) and hybrid Optical Character Recognition (OCR) to facilitate the verification of social assistance documents. The system integrates the LayoutLMv3 model for text entity extraction and a ResNet-18 CNN for signature-box classification, supported by conventional OCR and TrOCR for challenging text regions. The research methodology covers system design and model development. Experimental results show that the system can successfully convert PDF documents into images and accurately recognize key entities such as names and national ID numbers using LayoutLMv3 (validation accuracy 96.78%). Although ResNet-18 achieves a validation accuracy of 99.04%, it remains biased toward majority classes and is not yet reliable for verification decisions. The proposed system has the potential to accelerate and standardize document verification, with future work focusing on dataset rebalancing and improving the signature classification model.

Keywords: LayoutLMv3; ResNet-18; Optical Character Recognition; TrOCR

 

Abstrak

Program bantuan pangan beras merupakan upaya pemerintah dalam menanggulangi kemiskinan melalui distribusi beras kepada masyarakat kurang mampu. Penelitian ini bertujuan merancang dan mengimplementasikan Intelligent Document Processing berbasis Convolutional Neural Network (CNN) dan OCR hibrida untuk mempermudah proses verifikasi dokumen bantuan pangan. Sistem ini mengintegrasikan model machine learning LayoutLMv3 untuk ekstraksi entitas teks serta CNN ResNet-18 untuk klasifikasi tanda tangan, dengan dukungan Optical Character Recognition (OCR) dan TrOCR. Metode penelitian mencakup perancangan sistem dan pengembangan perangkat. Hasil pengujian menunjukkan bahwa sistem mampu mengekstraksi dokumen PDF menjadi citra, mengenali entitas seperti nama dan NIK dengan akurasi tinggi menggunakan LayoutLMv3 (val_accuracy 96,78%), serta mengklasifikasikan kotak tanda tangan dengan ResNet-18 (val_accuracy 99,04%). Namun, ResNet-18 masih bias terhadap kelas mayoritas sehingga belum dapat dianggap andal untuk keputusan verifikasi. Sistem yang diusulkan berpotensi mempercepat pemeriksaan dokumen bantuan pangan, sementara pengembangan lanjut difokuskan pada penyeimbangan data dan penyempurnaan model klasifikasi tanda tangan.

 

Keywords


LayoutLMv3; ResNet-18; Optical Character Recognition; TrOCR

References


B. P. Nasional, “Jangan Keliru, Ini Perbedaan Bantuan Pangan Beras dengan Bantuan Sosial Lainnya,” Badan Pangan Nasional, 2024. https://badanpangan.go.id/blog/post/ jangan-keliru-ini-perbedaan-bantuan-pangan-beras-dengan-bantuan-sosial-lainnya (accessed Jun. 13, 2024).

B. P. Statisik, “Persentase Penduduk Miskin Maret 2024 turun menjadi 9,03 persen.,” Badan Pusat Statistik, 2024. https://www.bps.go.id/id/pressrelease/2024/07/01/2370/persentase-penduduk-miskin-maret-2024-turun-menjadi-9-03-persen-.html (accessed Jun. 13, 2024).

Z. Yan et al., “DocExtractNet: A novel framework for enhanced information extraction from business documents,” in Information Processing and Management, Elsevier Ltd, 2025, p. 104046. doi: 10.1016/j.ipm.2024.104046.

M. R. Wardani, S. Sudin, and G. Mandar, “Implementasi Teknologi OCR Berbasis Artificial Intelligence Dalam Perancangan Aplikasi Bansos di Desa Daruba Pantai,” J. Ilm. Multidisiplin, vol. 2, no. 2, pp. 19–29, 2024, doi: https://doi.org/10.62017/merdeka.

M. Li et al., “TrOCR : Transformer-Based Optical Character Recognition with Pre-trained Models,” in Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, 2023, pp. 13094–13101.

N. Khasanah, “Komparasi Arsitektur Resnet50 Dan Vgg16 Untuk Untuk Klasifikasi Citra Tanda Tangan,” JSI J. Sist. Inf., vol. 14, no. 1, pp. 2611–2621, 2022.

S. Huang, Y. Xiong, and G. Wu, “LayoutPointer: A Spatial-Context Adaptive Pointer Network for Visual Information Extraction,” in Association for Computational Linguistics, 2024, pp. 3737–3748. doi: 10.18653/v1/2024.naacl-long.207.

L. Wang, J. He, X. Xu, N. Liu, and H. Liu, “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models,” in AAAI Conference on Artificial Intelligence, 2023, pp. 2590–2598. doi: 10.1609/aaai.v37i2.25357.

M. T. Toha and A. Triayudi, “Penerapan Membaca Tulisan Di Dalam Gambar Menggunakan Metode OCR Berbasis Website (Studi Kasus: e-KTP) Penerapan Membaca Tulisan di Dalam Gambar Menggunakan Metode OCR Berbasis Website (Studi Kasus: e-KTP,” JST (Jurnal Sains dan Teknol., vol. 11, no. 1, pp. 175–183, 2022, doi: 10.23887/jstundiksha.v11i1.42279.

B. M. Sujatmiko, E. Yudaningtyas, and P. Mudji Raharjo, “Convolution Neural Network Dengan Desain Jaringan Resnet Sebagai Metode Klasifikasi Tumor Kulit,” J. Simantec, vol. 11, no. 1, pp. 53–64, 2022, doi: 10.21107/simantec.v11i1.14083.

K. Wijaya and E. P. Widiyanto, “Klasifikasi Kepemilikan Tanda Tangan Menggunakan Convolutional Neural Network Dengan Arsitektur Alexnet Khrisnaldi,” In Mdp Student Conference (MSC) 2023, Palembang, 2023, pp. 133–143.

T. Heriyanto, Y. Sholva, and R. D. Nyoto, “Implementasi Optical Character Recognition (OCR) untuk Verifikasi Berkas pada Digital Library Program Studi Informatika Universitas Tanjungpura,” J. Ris. Sains dan Teknol. Inform., vol. 1, no. 1, pp. 26–29, 2023, doi: 10.26418/juristi.v1i1.60916.

P. A. Septio and S. Y. J. Prasetyo, “Pembuatan Aplikasi Validasi Document Tagihan Pembelian Barang Secara Digital Menggunakan OCR dengan tool tesseract pada System Portal Perusahaan,” J. Sains Komput. Inform., vol. 7, no. September, pp. 650–662, 2023.

S. S. Nurhaliza, M. Subali, L. Etp, and Rozi, “Analisis Kinerja Optical Character Recognition Untuk Membaca Dokumen Secara Otomatis,” in Seminar Nasional Teknologi Informasi dan Komunikasi STI&K (SeNTIK), SeNTIK, Ed., Jakarta Selatan: STMIK Jakarta STI&K, 2022, pp. 2581–2327.

L. Huttner et al., “Low-Rank Adaptation vs. Fine-Tuning for Handwritten Text Recognition,” 2025. doi: 10.1109/WACVW65960.2025.00146.

M. Jungo et al., “Impact of the ground truth quality for handwriting recognition,” in ACM International Conference Proceeding Series, 2023, pp. 135–140. doi: 10.1145/3628797.3628976.

H. Zhang, E. Whittaker, and I. Kitagishi, “Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images,” 2023. doi: 10.1109/ICCVW60793.2023.00160.

P. Hu, Z. Zhang, J. Ma, S. Liu, J. Du, and J. Zhang, “DocMamba: Efficient Document Pre-training with State Space Model,” in AAAI Conference on Artificial Intelligence, 2025, pp. 24095–24103. doi: 10.1609/aaai.v39i22.34584.


The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Full Text: File PDF

How To Cite This :

Refbacks

  • There are currently no refbacks.