Analisis Sentimen Pengguna Tinder dengan Metode Synthetic Minority Oversampling Technique
Abstract
Imbalanced data tends to cause models like Naive Bayes to be biased toward the majority class, which negatively impacts model accuracy. Even if the model achieves high accuracy, it often makes incorrect predictions. The Naive Bayes model calculates class probabilities based on word frequency and the previous probability of classes. When the majority class dominates, the minority class is often overlooked. The SMOTE (Synthetic Minority Over-sampling Technique) method is applied in this study to address the imbalance in the Tinder app review data. SMOTE allows the model to be trained on a more representative distribution by balancing the quantity of samples for each class through the synthesis of novel data in the minority class. Based to the results of the evaluation, the overall model accuracy rose to 71%, while metrics for the majority class, including precision, recall, and f1-score, either stayed the same or even significantly improved. However, despite an improvement in recall for the neutral class (increased to 0.19), the performance remains low (f1-score of 0.15), indicating that the semantic complexity of neutral reviews is not sufficiently captured through synthetic data augmentation alone.
Keywords: Naive Bayes; Synthetic Minority Over-sampling Technique; Tinder; Precision & recall; f1-score.
Abstrak
Data yang tidak seimbang cenderung membuat model seperti naive bayes bias terhadap data mayoritas dan memberi dampak pada akurasi model. Meskipun model memiliki akurasi yang tinggi tapi sering keliru. Model Naive Bayes menghitung probabilitas kelas berdasarkan frekuensi kata dan prior probabilitas kelas. Jika data mayoritas lebih banyak maka data minoritas akan diabaikan. Penelitian ini bertujuan untuk menerapkan metode SMOTE dalam menangani data ulasan aplikasi Tinder yang tidak seimbang. Dengan menerapkan SMOTE, jumlah data untuk masing-masing kelas diseimbangkan melalui sintesis data baru pada kelas minoritas, sehingga model dapat dilatih pada distribusi yang lebih representatif Berdasarkan pengujian model, diperoleh hasil bahwa akurasi keseluruhan model meningkat menjadi 71%. Sementara itu parameter precision, f1-score dan recall pada kelas mayoritas tetap stabil bahkan sedikit meningkat. Namun, meskipun performa prediksi terhadap kelas netral membaik dari sisi recall (menjadi 0.19), hasilnya masih rendah (f1-score 0.15), yang menandakan bahwa kompleksitas semantik dari ulasan netral tidak cukup ditangkap hanya dengan penambahan data buatan.
Keywords
References
H. L. A. Hart, Punishment and Responsibility: Essays in the Philosophy of Law. Oxford: OUP Oxford, 2008. [Online]. Available: https://books.google.co.id/books?id=lMTmCwAAQBAJ
M. Mukherjee and M. Khushi, “SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features,” Applied System Innovation, vol. 4, no. 1, p. 18, 2021.
O. Peretz, M. Koren, and O. Koren, “Naive Bayes classifier—An ensemble procedure for recall and precision enrichment,” Engineering Applications of Artificial Intelligence, vol. 136, p. 108972, 2024.
P. H. Nehe, S. S. Berutu, and H. Budiati, “Analisis Sentimen Opini Masyarakat Terhadap Presiden Jokowi Sebelum Dan Sesudah Pilpres 2024 Menggunakan Metode Naive Bayes Classification,” Jutisi J. Ilm. Tek. Inform. dan Sist. Inf., vol. 13, no. 1, p. 451, 2024, doi: 10.35889/jutisi.v13i1.1841.
S. S. Berutu, “Text Mining dan Klasifikasi Sentimen Berbasis Naïve Bayes Pada Opini Masyarakat terhadap Makanan Tradisional,” J. Sist. Komput. dan Inform., vol. 4, no. 2, p. 254, 2022, doi: 10.30865/json.v4i2.5138.
R. Maria, R. U. Umayah, S. Mahardinny, D. Kalana, and D. D. Saputra, “Analisis Sentimen Persepsi Masyarakat Terhadap Penggunaan Aplikasi My Pertamina Pada Media Sosial Twitter Menggunakan Metode Naïve Bayes Classifier,” Jurnal Komputer Antartika, vol. 1, no. 1, pp. 1–10, 2023.
S. S. Berutu, H. Budiati, J. Jatmika, and F. Gulo, “Data preprocessing approach for machine learning-based sentiment classification,” Jurnal Infotel, vol. 15, no. 4, pp. 317–325, 2023, doi: 10.20895/infotel.v15i4.1030.
S. M. Chamzah, M. Lestandy, N. Kasan, A. Nugraha et al., “Penerapan Synthetic Minority Oversampling Technique (SMOTE) untuk Imbalance Class pada Data Text Menggunakan KNN,” Syntax: Jurnal Informatika, vol. 11, no. 02, pp. 56–67, 2022.
D. Berniawan, A. Amri, and T. Tinaliah, “Implementasi Algoritma Naïve Bayes Untuk Klasifikasi Sentimen Pengguna Twitter Terhadap KEMKOMINFO Di Indonesia,” MDP Student Conf., vol. 2, no. 1, pp. 24–31, 2023, doi: 10.35957/mdp-sc.v2i1.4326.
H. Hairani, T. Widiyaningtyas, and D. Dwi Prasetya, “Addressing Class Imbalance of Health Data: A Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies,” JOIV Int. J. Informatics Vis., vol. 8, no. 3, pp. 1310–1318, 2024.
T. P. W. Sukma and M. R. Pribadi, “Analisis Sentimen Review Pengguna Viu Pada Play Store Dengan Algoritma Random Forest,” Jurnal Software Engineering and Computational Intelligence, vol. 2, no. 01, pp. 9–16, 2024.
H. R. Sayegh, W. Dong, and A. M. Al-madani, “Enhanced Intrusion Detection with LSTM-Based Model, Feature Selection, and SMOTE for Imbalanced Data,” Appl. Sci., vol. 14, no. 2, pp. 1–20, 2024, doi: 10.3390/app14020479.
J. S. Gea and H. Budiati, “Analisis Sentimen Masyarakat Terhadap Direktorat Jenderal Pajak,” Jurnal Sains Dan Komputer, vol. 8, no. 01, pp. 30–36, 2024, doi: 10.61179/jurnalinfact.v8i01.466.
M. Sulistiyono, Y. Pristyanto, S. Adi, and G. Gumelar, “Implementasi algoritma synthetic minority over-sampling technique untuk menangani ketidakseimbangan kelas pada dataset klasifikasi,” Sistemasi: Jurnal Sistem Informasi, vol. 10, no. 2, pp. 445–459, 2021.
O. Caelen, “A Bayesian interpretation of the confusion matrix,” Annals of Mathematics and Artificial Intelligence, vol. 81, no. 3, pp. 429–450, 2017.
W. Wahyudi, R. Kurniawan, and Y. A. Wijaya, “Analisis Sentimen Pengguna Terhadap Aplikasi Blu BCA di Playstore Menggunakan Algoritma Naïve Bayes,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 3, pp. 2511–2517, 2024.
How To Cite This :
Refbacks
- There are currently no refbacks.