Pembangunan Fitur dalam Identifikasi Cerdas Hoaks dengan NaÃ¯ve Bayes dan Klasifikasi Decision Tree

Muhammad Umar Shalih; Teja Endra Eng Tju

doi:10.35889/jutisi.v13i1.1731

Pembangunan Fitur dalam Identifikasi Cerdas Hoaks dengan NaÃ¯ve Bayes dan Klasifikasi Decision Tree

Muhammad Umar Shalih⁽¹⁾,Teja Endra Eng Tju^(2*)
(1) Universitas Budi Luhur
(2) Universitas Budi Luhur
(*) Corresponding Author

DOI : 10.35889/jutisi.v13i1.1731

Abstract

Identifying hoaxes poses significant complexity and challenges due to issues such as the diverse nature of hoaxes, rapid narrative changes, swift dissemination, sophisticated technological usage, verification difficulties, and scalability challenges. Recognizing the societal impact of hoaxes, the development of features for intelligent hoax identification research becomes imperative. The methodology adopted from CRISP-DM and SKKNI No. 299 of 2020, customized to research needs, encompasses five stages: data understanding, data preparation, modeling, evaluation, and deployment. Data from Mafindo comprises 9,756 instances divided into 7,804 training data and 1,952 test data. Six features source, capital, keyword, sentiment, fact-check, and classification are utilized as supervisory labels. Sentiment and fact-check features are constructed using the Multinomial NaÃ¯ve Bayes method and modeled using the Decision Tree technique on the dataset. Modeling variations include dataset quantities of 2,000, 4,000, 6,000, and 8,000, along with addressing imbalance dataset issues. Utilizing the Confusion Matrix technique, modeling results indicate an accuracy of 93.5% and an F1 score of 0.935. It's observed that the imbalanced dataset minimally affects accuracy and F1 score but contributes to model stability concerning the quantity of data with specific labels.

Keywords: Classification and Regression Trees; SMOTE; Confusion Matrix; Fact Check; MafindoÂ

Abstrak

Identifikasi hoaks cukup kompleks dan menantang dengan permasalahan seperti keanekaragaman hoaks, perubahan narasi yang cepat, kecepatan penyebaran yang luas, penggunaan teknologi canggih, kesulitan verifikasi, dan tantangan skala, yang dihadapi. Sebagai kepedulian dampak hoaks pada masyarakat, penelitain pembangunan fitur dalam identifikasi cerdas hoaks perlu dilakukan. Metodologi diadopsi dari CRISP-DM dan SKKNI No. 299 tahun 2020 yang disesuaikan dengan kebutuhan penelitian sehingga menjadi lima tahapan yaitu data understanding, data preparation, modeling, evaluation, dan deployement. Data diperoleh dari Mafindo dan digunakan sebanyak 9.756 data yang dibagi menjadi 7.804 data latih dan 1.952 data uji. Terdapat enam fitur yaitu sumber, kapital, keyword, sentimen, factcheck, dan klasifikasi sebagai label supervisi. Dua fitur sentimen dan factcheck dibangun dengan metode Multinomial NaÃ¯ve Bayes, selanjutnya dilakukan pemodelan pada dataset dengan metode Decision Tree. Pemodelan dilakukan pula dengan variasi kuantitas dataset sebanyak 2.000, 4.000, 6.000, 8000, juga dengan perbandingan masalah imbalance dataset. Hasil pemodelan dengan teknik Confusion Matrix diperoleh akurasi 93,5% dan skor F1 0,935 dan diperoleh bahwa imbalance dataset tidak terlalu berpengaruh pada hasil akurasi dan skor F1 namun memberikan kestabilan model dalam hal kuantitas besarnya data dengan label tertentu.

Keywords

Classification and Regression Trees; SMOTE; Confusion Matrix; Fact Check; Mafindo

References

Kumparan.com, â€œDampak dari Pesatnya Perkembangan Teknologi di Era Digital.â€ Accessed: Jun. 02, 2023. [Online]. Available: https://kumparan.com/berita-update/dampak-dari-pesatnya-perkembangan-teknologi-di-era-digital-1vBkPOYNffj

Sarwan, â€œPerspektif Hukum Pidana Mengenai Berita Hoaks.â€ Accessed: Jun. 02, 2023. [Online]. Available: https://www.kompasiana.com/inggamaulana45747/64785a338221996cf1383c52/perspektif-hukum-pidana-mengenai-berita-hoax-tentang-modus-pemerasan-pemotor-tabrakan-diri-kemobil-di-tangerang

A. Bhattacherjee, â€œThe effects of news source credibility and fact-checker credibility on usersâ€™ beliefs and intentions regarding online misinformation,â€ Journal of Electronic Business & Digital Economics, vol. 1, no. 1, pp. 24â€“33, Dec. 2022, doi: 10.1108/JEBDE-09-2022-0031.

G. V. D. Kumar, M. V Jadhav, A. Tadisetti, and K. an, â€œA Deep Model on Hoax Detection Using Feed Forward Neural Network and LSTM,â€ Webology, vol. 17, no. 2, pp. 652â€“662, Dec. 2020, doi: 10.14704/WEB/V17I2/WEB17058.

M. Zulfadhli, H. Hamdani, and L. Farokhah, â€œThe Analysis of Hoax News Content on Facebook Reviewed from Theory of Critical Discourse Analysis and Linguistic Rules,â€ Aksis : Jurnal Pendidikan Bahasa dan Sastra Indonesia, vol. 5, no. 2, pp. 288â€“304, 2021, doi: 10.21009/aksis.050204.

Utra T. Linge and A. F. Wicaksono, â€œDetection Of Negative Content (Hoax) On Microblog Data That Contains Covid-19 Information,â€ Syntax Literate: Jurnal Ilmiah Indonesia, vol. 7, no. 6, pp. 8820â€“8830, 2022.

A. K. Darmawan, M. W. Al Wajieh, M. B. Setyawan, T. Yandi, and H. Hoiriyah, â€œHoax News Analysis for the Indonesian National Capital Relocation Public Policy with the Support Vector Machine and Random Forest Algorithms,â€ Journal of Information Systems and Informatics, vol. 5, no. 1, pp. 150â€“173, Mar. 2023, doi: 10.51519/journalisi.v5i1.438.

H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, D. Rosal Ignatius Moses Setiadi, and R. S. Basuki, â€œHoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,â€ TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, pp. 799â€“806, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14744.

D. Hidayat, A. Rohendi, D. Hanafy D, M. Christin, and N. Nurâ€™aeni, â€œFighting The Disinfodemic: Fact- Checking Management Of Hoax Covid-19 In Indonesia,â€ Profetik: Jurnal Komunikasi, vol. 15, no. 2, pp. 272â€“286, Nov. 2022, doi: 10.14421/pjk.v15i2.1996.

N. P. Satyawati, P. Utari, and S. Hastjarjo, â€œFact Checking of Hoaxes by Masyarakat Antifitnah Indonesia,â€ International Journal of Multicultural and Multireligious Understanding, vol. 6, no. 6, pp. 159â€“172, 2019.

P.-M. Hui, C. Shao, A. Flammini, F. Menczer, and G. L. Ciampaglia, â€œThe Hoaxy Misinformation and Fact-Checking Diffusion Network,â€ Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, no. 1, pp. 528â€“530, Jun. 2018, doi: 10.1609/icwsm.v12i1.14986.

G. Rebala, A. Ravi, and S. Churiwala, â€œMachine Learning Definition and Basics,â€ in An Introduction to Machine Learning, Cham: Springer International Publishing, 2019, pp. 1â€“17. doi: 10.1007/978-3-030-15729-6_1.

Potentia Analytics, â€œWhat Is Machine Learning: Definition, Types, Applications and Examples.â€ Accessed: Jun. 10, 2023. [Online]. Available: https://www.potentiaco. com/what-is-machine-learning-definition-types-applications-and-examples/

A. Y. Prayoga, A. I. Hadiana, and F. R. Umbara, â€œDeteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli NaÃ¯ve Bayes dengan Ekstraksi Fitur Tf-Idf,â€ Jurnal Health Sains, vol. 2, no. 10, pp. 1808â€“1823, 2021, doi: 10.46799/jsa.v2i10.327.

A. Y. A. Nugraha and F. F. Abdulloh, â€œOptimasi Naive Bayes dan Cosine Similarity Menggunakan Particle Swarm Optimization Pada Klasifikasi Hoax Berbahasa Indonesia,â€ Jurnal Media Informatika Budidarma, vol. 6, no. 3, pp. 1444â€“1451, 2022, doi: 10.30865/mib.v6i3.4170.

H. Muhabatin, C. Prabowo, I. Ali, C. L. Rohmat, and D. R. Amalia, â€œKlasifikasi Berita Hoax Menggunakan Algoritma NaÃ¯ve Bayes Berbasis PSO,â€ Informatics For Educators And Professional : Journal of Informatics, vol. 5, no. 2, pp. 156â€“165, Jun. 2021, doi: 10.51211/itbi.v5i2.1531.

E. Rasywir and A. Purwarianti, â€œEksperimen pada Sistem Klasifikasi Berita Hoax Berbahasa Indonesia Berbasis Pembelajaran Mesin,â€ Jurnal Cybermatika, vol. 3, no. 2, pp. 1â€“8, 2015.

A. P. Kirana, G. B. Prasetyo, and E. W. Lestari, â€œThe Detection of Indonesian Hoax Content about COVID-19 Vaccine using Naive Bayes Multinomial Method,â€ Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 5, no. 1, pp. 13â€“19, Feb. 2023, doi: 10.35882/ijeeemi.v5i1.262.

F. Prasetya and F. Ferdiansyah, â€œAnalisis Data Mining Klasifikasi Berita Hoax COVID 19 Menggunakan Algoritma Naive Bayes,â€ Jurnal Sistem Komputer dan Informatika (JSON), vol. 4, no. 1, pp. 132â€“139, 2022, doi: 10.30865/json.v4i1.4852.

H. Mustofa and A. A. Mahfudh, â€œKlasifikasi Berita Hoax Dengan Menggunakan Metode Naive Bayes,â€ Walisongo Journal of Information Technology, vol. 1, no. 1, pp. 1â€“12, Nov. 2019, doi: 10.21580/wjit.2019.1.1.3915.

R. Wati, â€œPenerapan Algoritma Naive Bayes Dan Particle Swarm Optimization Untuk Klasifikasi Berita Hoax Pada Media Sosial,â€ JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 5, no. 2, pp. 9â€“14, 2020, doi: 10.33480/jitk.v5i2.1034.

A. Kesumawati and A. K. Thalib, â€œHoax classification with Term Frequency - Inverse Document Frequency using non-linear SVM and NaÃ¯ve Bayes,â€ International Journal of Advances in Soft Computing and its Applications, vol. 10, no. 3, pp. 115â€“128, 2018.

F. Rahutomo, I. Y. R. Pratiwi, and D. M. Ramadhani, â€œEksperimen NaÃ¯ve Bayes Pada Deteksi Berita Hoax Berbahasa Indonesia,â€ Jurnal Penelitian Komunikasi Dan Opini Publik, vol. 23, no. 1, pp. 1â€“15, Jul. 2019, doi: 10.33299/jpkop.23.1.1805.

A. Sudrajat, R. R. Wulandari, and E. Syafwan, â€œIndonesian Language Hoax News Classification Based on NaÃ¯ve Bayes,â€ Journal of Applied Intelligent System, vol. 7, no. 1, pp. 70â€“79, 2022, doi: 10.33633/jais.v7i1.5985.

N. Agustina, A. Adrian, and M. Hermawati, â€œImplementasi Algoritma NaÃ¯ve Bayes Classifier untuk Mendeteksi Berita Palsu pada Sosial Media,â€ Faktor Exacta, vol. 14, no. 4, pp. 206â€“213, 2021.

S. Soleman, â€œPemanfaatan Metode Klasifikasi NaÃ¯ve Bayes Untuk Pendeteksi Berita Hoax Pada Artikel Berbahasa Indonesia,â€ Jurnal CoreIT: Jurnal Hasil Penelitian Ilmu Komputer dan Teknologi Informasi, vol. 7, no. 2, pp. 83â€“93, 2021, doi: 10.24014/coreit.v7i2.14290.

M. Z. Khan and O. H. Alhazmi, â€œStudy and analysis of unreliable news based on content acquired using ensemble learning (prevalence of fake news on social media),â€ International Journal of System Assurance Engineering and Management, vol. 11, no. S2, pp. 145â€“153, Jul. 2020, doi: 10.1007/s13198-020-01016-4.

R. R. Sani, Y. A. Pratiwi, S. Winarno, E. D. Udayanti, and F. Alzami, â€œAnalisis Perbandingan Algoritma Naive Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia,â€ Jurnal Masyarakat Informatika, vol. 13, no. 2, pp. 85â€“98, 2022, doi: 10.14710/jmasif.13.2.47983.

I. W. Santiyasa, G. P. A. Brahmantha, I. W. Supriana, I. G. G. A. Kadyanan, I. K. G. Suhartana, and I. B. M. Mahendra, â€œIdentification Of Hoax Based On Text Mining Using K-Nearest Neighbor Method,â€ JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), vol. 10, no. 2, pp. 217â€“226, Jan. 2022, doi: 10.24843/JLK.2021.v10.i02.p04.

E. Utami, A. F. Iskandar, W. Hidayat, A. B. Prasetyo, and A. D. Hartanto, â€œCovid-19 Hoax Detection Using KNN in Jaccard Space,â€ IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, pp. 255â€“264, 2021, doi: 10.22146/ijccs.67392.

T. T. A. Putri, H. S. Warra, I. Y. Sitepu, M. Sihombing, and Silvi, â€œAnalysis And Detection Of Hoax Contents In Indonesian News Based On Machine Learning,â€ Journal Of Informatics Pelita Nusantara, vol. 4, no. 1, pp. 19â€“26, 2019.

B. Irena and E. B. Setiawan, â€œFake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method,â€ Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 711â€“716, Aug. 2020, doi: 10.29207/resti.v4i4.2125.

N. A. Hasanah, N. Suciati, and D. Purwitasari, â€œIdentifying Degree-of-Concern on COVID-19 topics with text classification of Twitters,â€ Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 7, no. 1, pp. 50â€“62, Feb. 2021, doi: 10.26594/register.v7i1.2234.

C. W. Kencana, E. B. Setiawan, and I. Kurniawan, â€œHoax Detection System on Twitter using Feed-Forward and Back-Propagation Neural Networks Classification Method,â€ Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 655â€“663, 2020, doi: 10.29207/resti.v4i4.2038.

Mafindo.or.id, â€œMAFINDO â€“ Masyarakat Anti Fitnah Indonesia.â€ Accessed: Jun. 11, 2023. [Online]. Available: https://www.mafindo.or.id/

X. Wu et al., â€œTop 10 algorithms in data mining,â€ Knowl Inf Syst, vol. 14, no. 1, pp. 1â€“37, 2008.

T. Chandraveni, â€œCART (Classification And Regression Tree) in Machine Learning,â€ GeeksforGeeks. Accessed: Jul. 08, 2023. [Online]. Available: https://www.geeksforgeeks. org/cart-classification-and-regression-tree-in-machine-learning/

N. Hotz, â€œWhat is CRISP DM? - Data Science Process Alliance.â€ Accessed: Jul. 01, 2023. [Online]. Available: https://www.datascience-pm.com/crisp-dm-2/

Kementerian Ketenagakerjaan Republik Indonesia, â€œSKKNI Keahlian Artificial Intelligence (Data Science).â€ Accessed: Jul. 01, 2023. [Online]. Available: https://skkni.kemnaker.go.id/tentang-skkni/dokumen?area=data%20science&limit= 20&page=1%20

F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte, and D. VrgoÄ, â€œFoundations of JSON schema,â€ in Proceedings of the 25th International Conference on World Wide Web, 2016, pp. 263â€“273.

Dewan Pers, â€œData Perusahaan Pers,â€ Lembaga Dewan Pers. Accessed: Jul. 06, 2023. [Online]. Available: https://dewanpers.or.id/data/perusahaanpers

R. Prakoso, â€œanalisis-sentimen/kamus/positif_ta2.txt,â€ GitHub. Accessed: Jul. 05, 2023. [Online]. Available: https://github.com/ramaprakoso/analisis-sentimen/blob/master/ kamus/positif_ta2.txt

R. Prakoso, â€œanalisis-sentimen/kamus/negatif_ta2.txt,â€ GitHub. Accessed: Jul. 05, 2023. [Online]. Available: https://github.com/ramaprakoso/analisis-sentimen/blob/master/ kamus/negatif_ta2.txt

G. Van Rossum and F. L. Drake, Python 3 Reference Manual. Scotts Valley, CA: CreateSpace, 2009.

M. Pilgrim and S. Willison, Dive Into Python 3, vol. 2. Springer, 2009.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, â€œSMOTE: synthetic minority over-sampling technique,â€ Journal of artificial intelligence research, vol. 16, pp. 321â€“357, 2002.

M. K. Suryadewiansyah and T. E. E. Tju, â€œNaÃ¯ve Bayes dan Confusion Matrix untuk Efisiensi Analisa Intrusion Detection System Alert,â€ Jurnal Nasional Teknologi dan Sistem Informasi, vol. 8, no. 2, pp. 81â€“88, 2022, doi: 10.25077/teknosi.v8i2.2022.81-88.

Download this PDF file

The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Full Text: File PDF

How To Cite This :

Refbacks

There are currently no refbacks.

Pembangunan Fitur dalam Identifikasi Cerdas Hoaks dengan NaÃ¯ve Bayes dan Klasifikasi Decision Tree

Abstract

Keywords

References

Article Statistic

Dimensions Metrics

How To Cite This :

Refbacks

Policies

Submissions

Other

External Links

Username
Password
Remember me