Pembangunan Fitur dalam Identifikasi Cerdas Hoaks dengan Naïve Bayes dan Klasifikasi Decision Tree

Muhammad Umar Shalih(1),Teja Endra Eng Tju(2*)
(1) Universitas Budi Luhur
(2) Universitas Budi Luhur
(*) Corresponding Author
DOI : 10.35889/jutisi.v13i1.1731

Abstract

Identifying hoaxes poses significant complexity and challenges due to issues such as the diverse nature of hoaxes, rapid narrative changes, swift dissemination, sophisticated technological usage, verification difficulties, and scalability challenges. Recognizing the societal impact of hoaxes, the development of features for intelligent hoax identification research becomes imperative. The methodology adopted from CRISP-DM and SKKNI No. 299 of 2020, customized to research needs, encompasses five stages: data understanding, data preparation, modeling, evaluation, and deployment. Data from Mafindo comprises 9,756 instances divided into 7,804 training data and 1,952 test data. Six features source, capital, keyword, sentiment, fact-check, and classification are utilized as supervisory labels. Sentiment and fact-check features are constructed using the Multinomial Naïve Bayes method and modeled using the Decision Tree technique on the dataset. Modeling variations include dataset quantities of 2,000, 4,000, 6,000, and 8,000, along with addressing imbalance dataset issues. Utilizing the Confusion Matrix technique, modeling results indicate an accuracy of 93.5% and an F1 score of 0.935. It's observed that the imbalanced dataset minimally affects accuracy and F1 score but contributes to model stability concerning the quantity of data with specific labels.

Keywords: Classification and Regression Trees; SMOTE; Confusion Matrix; Fact Check; Mafindo 

 

Abstrak

Identifikasi hoaks cukup kompleks dan menantang dengan permasalahan seperti keanekaragaman hoaks, perubahan narasi yang cepat, kecepatan penyebaran yang luas, penggunaan teknologi canggih, kesulitan verifikasi, dan tantangan skala, yang dihadapi. Sebagai kepedulian dampak hoaks pada masyarakat, penelitain pembangunan fitur dalam identifikasi cerdas hoaks perlu dilakukan. Metodologi diadopsi dari CRISP-DM dan SKKNI No. 299 tahun 2020 yang disesuaikan dengan kebutuhan penelitian sehingga menjadi lima tahapan yaitu data understanding, data preparation, modeling, evaluation, dan deployement. Data diperoleh dari Mafindo dan digunakan sebanyak 9.756 data yang dibagi menjadi 7.804 data latih dan 1.952 data uji. Terdapat enam fitur yaitu sumber, kapital, keyword, sentimen, factcheck, dan klasifikasi sebagai label supervisi. Dua fitur sentimen dan factcheck dibangun dengan metode Multinomial Naïve Bayes, selanjutnya dilakukan pemodelan pada dataset dengan metode Decision Tree. Pemodelan dilakukan pula dengan variasi kuantitas dataset sebanyak 2.000, 4.000, 6.000, 8000, juga dengan perbandingan masalah imbalance dataset. Hasil pemodelan dengan teknik Confusion Matrix diperoleh akurasi 93,5% dan skor F1 0,935 dan diperoleh bahwa imbalance dataset tidak terlalu berpengaruh pada hasil akurasi dan skor F1 namun memberikan kestabilan model dalam hal kuantitas besarnya data dengan label tertentu.

 

Keywords


Classification and Regression Trees; SMOTE; Confusion Matrix; Fact Check; Mafindo

References


Kumparan.com, “Dampak dari Pesatnya Perkembangan Teknologi di Era Digital.” Accessed: Jun. 02, 2023. [Online]. Available: https://kumparan.com/berita-update/dampak-dari-pesatnya-perkembangan-teknologi-di-era-digital-1vBkPOYNffj

Sarwan, “Perspektif Hukum Pidana Mengenai Berita Hoaks.” Accessed: Jun. 02, 2023. [Online]. Available: https://www.kompasiana.com/inggamaulana45747/64785a338221996cf1383c52/perspektif-hukum-pidana-mengenai-berita-hoax-tentang-modus-pemerasan-pemotor-tabrakan-diri-kemobil-di-tangerang

A. Bhattacherjee, “The effects of news source credibility and fact-checker credibility on users’ beliefs and intentions regarding online misinformation,” Journal of Electronic Business & Digital Economics, vol. 1, no. 1, pp. 24–33, Dec. 2022, doi: 10.1108/JEBDE-09-2022-0031.

G. V. D. Kumar, M. V Jadhav, A. Tadisetti, and K. an, “A Deep Model on Hoax Detection Using Feed Forward Neural Network and LSTM,” Webology, vol. 17, no. 2, pp. 652–662, Dec. 2020, doi: 10.14704/WEB/V17I2/WEB17058.

M. Zulfadhli, H. Hamdani, and L. Farokhah, “The Analysis of Hoax News Content on Facebook Reviewed from Theory of Critical Discourse Analysis and Linguistic Rules,” Aksis : Jurnal Pendidikan Bahasa dan Sastra Indonesia, vol. 5, no. 2, pp. 288–304, 2021, doi: 10.21009/aksis.050204.

Utra T. Linge and A. F. Wicaksono, “Detection Of Negative Content (Hoax) On Microblog Data That Contains Covid-19 Information,” Syntax Literate: Jurnal Ilmiah Indonesia, vol. 7, no. 6, pp. 8820–8830, 2022.

A. K. Darmawan, M. W. Al Wajieh, M. B. Setyawan, T. Yandi, and H. Hoiriyah, “Hoax News Analysis for the Indonesian National Capital Relocation Public Policy with the Support Vector Machine and Random Forest Algorithms,” Journal of Information Systems and Informatics, vol. 5, no. 1, pp. 150–173, Mar. 2023, doi: 10.51519/journalisi.v5i1.438.

H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, D. Rosal Ignatius Moses Setiadi, and R. S. Basuki, “Hoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, pp. 799–806, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14744.

D. Hidayat, A. Rohendi, D. Hanafy D, M. Christin, and N. Nur’aeni, “Fighting The Disinfodemic: Fact- Checking Management Of Hoax Covid-19 In Indonesia,” Profetik: Jurnal Komunikasi, vol. 15, no. 2, pp. 272–286, Nov. 2022, doi: 10.14421/pjk.v15i2.1996.

N. P. Satyawati, P. Utari, and S. Hastjarjo, “Fact Checking of Hoaxes by Masyarakat Antifitnah Indonesia,” International Journal of Multicultural and Multireligious Understanding, vol. 6, no. 6, pp. 159–172, 2019.

P.-M. Hui, C. Shao, A. Flammini, F. Menczer, and G. L. Ciampaglia, “The Hoaxy Misinformation and Fact-Checking Diffusion Network,” Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, no. 1, pp. 528–530, Jun. 2018, doi: 10.1609/icwsm.v12i1.14986.

G. Rebala, A. Ravi, and S. Churiwala, “Machine Learning Definition and Basics,” in An Introduction to Machine Learning, Cham: Springer International Publishing, 2019, pp. 1–17. doi: 10.1007/978-3-030-15729-6_1.

Potentia Analytics, “What Is Machine Learning: Definition, Types, Applications and Examples.” Accessed: Jun. 10, 2023. [Online]. Available: https://www.potentiaco. com/what-is-machine-learning-definition-types-applications-and-examples/

A. Y. Prayoga, A. I. Hadiana, and F. R. Umbara, “Deteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli Naïve Bayes dengan Ekstraksi Fitur Tf-Idf,” Jurnal Health Sains, vol. 2, no. 10, pp. 1808–1823, 2021, doi: 10.46799/jsa.v2i10.327.

A. Y. A. Nugraha and F. F. Abdulloh, “Optimasi Naive Bayes dan Cosine Similarity Menggunakan Particle Swarm Optimization Pada Klasifikasi Hoax Berbahasa Indonesia,” Jurnal Media Informatika Budidarma, vol. 6, no. 3, pp. 1444–1451, 2022, doi: 10.30865/mib.v6i3.4170.

H. Muhabatin, C. Prabowo, I. Ali, C. L. Rohmat, and D. R. Amalia, “Klasifikasi Berita Hoax Menggunakan Algoritma Naïve Bayes Berbasis PSO,” Informatics For Educators And Professional : Journal of Informatics, vol. 5, no. 2, pp. 156–165, Jun. 2021, doi: 10.51211/itbi.v5i2.1531.

E. Rasywir and A. Purwarianti, “Eksperimen pada Sistem Klasifikasi Berita Hoax Berbahasa Indonesia Berbasis Pembelajaran Mesin,” Jurnal Cybermatika, vol. 3, no. 2, pp. 1–8, 2015.

A. P. Kirana, G. B. Prasetyo, and E. W. Lestari, “The Detection of Indonesian Hoax Content about COVID-19 Vaccine using Naive Bayes Multinomial Method,” Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 5, no. 1, pp. 13–19, Feb. 2023, doi: 10.35882/ijeeemi.v5i1.262.

F. Prasetya and F. Ferdiansyah, “Analisis Data Mining Klasifikasi Berita Hoax COVID 19 Menggunakan Algoritma Naive Bayes,” Jurnal Sistem Komputer dan Informatika (JSON), vol. 4, no. 1, pp. 132–139, 2022, doi: 10.30865/json.v4i1.4852.

H. Mustofa and A. A. Mahfudh, “Klasifikasi Berita Hoax Dengan Menggunakan Metode Naive Bayes,” Walisongo Journal of Information Technology, vol. 1, no. 1, pp. 1–12, Nov. 2019, doi: 10.21580/wjit.2019.1.1.3915.

R. Wati, “Penerapan Algoritma Naive Bayes Dan Particle Swarm Optimization Untuk Klasifikasi Berita Hoax Pada Media Sosial,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 5, no. 2, pp. 9–14, 2020, doi: 10.33480/jitk.v5i2.1034.

A. Kesumawati and A. K. Thalib, “Hoax classification with Term Frequency - Inverse Document Frequency using non-linear SVM and Naïve Bayes,” International Journal of Advances in Soft Computing and its Applications, vol. 10, no. 3, pp. 115–128, 2018.

F. Rahutomo, I. Y. R. Pratiwi, and D. M. Ramadhani, “Eksperimen Naïve Bayes Pada Deteksi Berita Hoax Berbahasa Indonesia,” Jurnal Penelitian Komunikasi Dan Opini Publik, vol. 23, no. 1, pp. 1–15, Jul. 2019, doi: 10.33299/jpkop.23.1.1805.

A. Sudrajat, R. R. Wulandari, and E. Syafwan, “Indonesian Language Hoax News Classification Based on Naïve Bayes,” Journal of Applied Intelligent System, vol. 7, no. 1, pp. 70–79, 2022, doi: 10.33633/jais.v7i1.5985.

N. Agustina, A. Adrian, and M. Hermawati, “Implementasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Berita Palsu pada Sosial Media,” Faktor Exacta, vol. 14, no. 4, pp. 206–213, 2021.

S. Soleman, “Pemanfaatan Metode Klasifikasi Naïve Bayes Untuk Pendeteksi Berita Hoax Pada Artikel Berbahasa Indonesia,” Jurnal CoreIT: Jurnal Hasil Penelitian Ilmu Komputer dan Teknologi Informasi, vol. 7, no. 2, pp. 83–93, 2021, doi: 10.24014/coreit.v7i2.14290.

M. Z. Khan and O. H. Alhazmi, “Study and analysis of unreliable news based on content acquired using ensemble learning (prevalence of fake news on social media),” International Journal of System Assurance Engineering and Management, vol. 11, no. S2, pp. 145–153, Jul. 2020, doi: 10.1007/s13198-020-01016-4.

R. R. Sani, Y. A. Pratiwi, S. Winarno, E. D. Udayanti, and F. Alzami, “Analisis Perbandingan Algoritma Naive Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia,” Jurnal Masyarakat Informatika, vol. 13, no. 2, pp. 85–98, 2022, doi: 10.14710/jmasif.13.2.47983.

I. W. Santiyasa, G. P. A. Brahmantha, I. W. Supriana, I. G. G. A. Kadyanan, I. K. G. Suhartana, and I. B. M. Mahendra, “Identification Of Hoax Based On Text Mining Using K-Nearest Neighbor Method,” JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), vol. 10, no. 2, pp. 217–226, Jan. 2022, doi: 10.24843/JLK.2021.v10.i02.p04.

E. Utami, A. F. Iskandar, W. Hidayat, A. B. Prasetyo, and A. D. Hartanto, “Covid-19 Hoax Detection Using KNN in Jaccard Space,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, pp. 255–264, 2021, doi: 10.22146/ijccs.67392.

T. T. A. Putri, H. S. Warra, I. Y. Sitepu, M. Sihombing, and Silvi, “Analysis And Detection Of Hoax Contents In Indonesian News Based On Machine Learning,” Journal Of Informatics Pelita Nusantara, vol. 4, no. 1, pp. 19–26, 2019.

B. Irena and E. B. Setiawan, “Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 711–716, Aug. 2020, doi: 10.29207/resti.v4i4.2125.

N. A. Hasanah, N. Suciati, and D. Purwitasari, “Identifying Degree-of-Concern on COVID-19 topics with text classification of Twitters,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 7, no. 1, pp. 50–62, Feb. 2021, doi: 10.26594/register.v7i1.2234.

C. W. Kencana, E. B. Setiawan, and I. Kurniawan, “Hoax Detection System on Twitter using Feed-Forward and Back-Propagation Neural Networks Classification Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 655–663, 2020, doi: 10.29207/resti.v4i4.2038.

Mafindo.or.id, “MAFINDO – Masyarakat Anti Fitnah Indonesia.” Accessed: Jun. 11, 2023. [Online]. Available: https://www.mafindo.or.id/

X. Wu et al., “Top 10 algorithms in data mining,” Knowl Inf Syst, vol. 14, no. 1, pp. 1–37, 2008.

T. Chandraveni, “CART (Classification And Regression Tree) in Machine Learning,” GeeksforGeeks. Accessed: Jul. 08, 2023. [Online]. Available: https://www.geeksforgeeks. org/cart-classification-and-regression-tree-in-machine-learning/

N. Hotz, “What is CRISP DM? - Data Science Process Alliance.” Accessed: Jul. 01, 2023. [Online]. Available: https://www.datascience-pm.com/crisp-dm-2/

Kementerian Ketenagakerjaan Republik Indonesia, “SKKNI Keahlian Artificial Intelligence (Data Science).” Accessed: Jul. 01, 2023. [Online]. Available: https://skkni.kemnaker.go.id/tentang-skkni/dokumen?area=data%20science&limit= 20&page=1%20

F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte, and D. Vrgoč, “Foundations of JSON schema,” in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 263–273.

Dewan Pers, “Data Perusahaan Pers,” Lembaga Dewan Pers. Accessed: Jul. 06, 2023. [Online]. Available: https://dewanpers.or.id/data/perusahaanpers

R. Prakoso, “analisis-sentimen/kamus/positif_ta2.txt,” GitHub. Accessed: Jul. 05, 2023. [Online]. Available: https://github.com/ramaprakoso/analisis-sentimen/blob/master/ kamus/positif_ta2.txt

R. Prakoso, “analisis-sentimen/kamus/negatif_ta2.txt,” GitHub. Accessed: Jul. 05, 2023. [Online]. Available: https://github.com/ramaprakoso/analisis-sentimen/blob/master/ kamus/negatif_ta2.txt

G. Van Rossum and F. L. Drake, Python 3 Reference Manual. Scotts Valley, CA: CreateSpace, 2009.

M. Pilgrim and S. Willison, Dive Into Python 3, vol. 2. Springer, 2009.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.

M. K. Suryadewiansyah and T. E. E. Tju, “Naïve Bayes dan Confusion Matrix untuk Efisiensi Analisa Intrusion Detection System Alert,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 8, no. 2, pp. 81–88, 2022, doi: 10.25077/teknosi.v8i2.2022.81-88.


The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Full Text: File PDF

How To Cite This :

Refbacks

  • There are currently no refbacks.