Analisis Sentimen Pada Kasus Positif Covid-19 Berdasarkan Pemberitaan Media Di Indonesia Menggunakan Indobert
Abstract
The application of sentiment analysis to news about the increase in the spread of the positive rate of Covid-19 in Indonesia using the IndoBERT model aims to find out how much influence the news of the increase in Covid-19 cases in Indonesia has on the opinion of sentiment analysis on the opinions of the Indonesian public. First, implementing Web Scraping, Labeling and Text Pre-processing techniques to collect data about the increase in Covid-19 cases in Indonesia. Second, apply the IndoBERT algorithm in sentiment analysis regarding news about the increase in positive Covid-19 cases in Indonesia. Next, evaluate the performance of the sentiment analysis model with varying batch sizes. In batch size 16, the model tends to show consistent performance with f1 scores ranging from 80.10% to 80.53%, while in batch size 32 there are variations. An increase in epochs does not necessarily mean a significant increase in performance. Although in some cases there was an increase, there was also a decrease in some cases. Overall, the model shows good performance with f1 score and accuracy above 0.80 and 0.81, while loss tends to increase with epoch. Further exploration is needed to understand the factors influencing model performance in depth.
Keywords: Web Scraping; Labeling; Text Pre-processing; Sentiment Analysis; Model Performance
Abstrak
Penerapan sentimen analisis pada berita kenaikan penyebaran tingkat positif Covid-19 di Indonesia dengan menggunakan model IndoBERT bertujuan untuk mengetahui seberapa berpenagruhkan berita kenaikan kasus Covid-19 di Indonesia terhadap opini sentimen analisis terhadap opini masyarakat Indonesia. Pertama, mengimplementasikan Teknik Web Scraping, Labeling dan Pre-processing Text untuk mengumpulkan data tentang peningkatan kasus Covid-19 di Indonesia. Kedua, menerapkan algoritma IndoBERT dalam analisis sentimen terhadap pemberitaan peningkatan kasus positif Covid-19 di Indonesia. Selanjutnya, mengevaluasi kinerja model sentimen analisis dengan variasi batch size. Pada batch size 16, model cenderung menunjukkan konsistensi kinerja dengan f1 score berkisar antara 80.10% hingga 80.53%, sedangkan pada batch size 32 terdapat variasi. Peningkatan epoch tidak selalu berarti peningkatan kinerja yang signifikan. Meskipun pada beberapa kasus terjadi peningkatan, ada juga penurunan pada beberapa kasus. Secara keseluruhan, model menunjukkan kinerja baik dengan f1 score dan accuracy di atas 0.80 dan 0.81, sementara loss cenderung meningkat seiring dengan epoch. Diperlukan eksplorasi lebih lanjut untuk memahami faktor-faktor yang mempengaruhi kinerja model secara mendalam.
Keyword: Web Scraping; Labeling; Pre-processing Text; Sentimen Analisis; Kinerja Model
References
I.R. Ginting, M.R. Makful, M. Muhtar, & J. Pusat, “Pola Penyebaran COVID-19 di DKI Jakarta pada Bulan Maret-Juli Tahun 2020 Secara Spasial. Pp. 161–169, 2020.
E.P. Kurniasih, Dampak Pandemi Covid 19 Terhadap Penurunan Kesejahteraan Masyarakat Kota Pontianak. Pp. 277–289, 2020.
K.S. Nugroho, A.Y. Sukmadewa, F.A. Bachtiar, & N. Yudistira, BERT Fine-Tuning for Sentimen Analysis on Indonesian Mobile Apps Reviews. Pp. 1–10, 2020.
Y.V. Wijaya, A. Erfina, & C. Warman, “Analisis Sentimen Seputar UU ITE Menggunakan Algoritma Support Vector Machine”. Progresif: Jurnal Ilmiah Komputer, Vol. 17, no. 2, pp. 1-14, 2021.
H. Juwiantho, E.I. Setiawan, J. Santoso, & M.H. Purnomo, “Sentiment analysis twitter bahasa indonesia berbasis word2vec menggunakan deep convolutional neural network”. Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 7, no. 1, pp. 181-188, 2020.
T. Baldwin, IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP, 2020.
S. Satriajati, S.B. Panuntun, & S. Pramana, “Implementasi web scraping dalam pengumpulan berita kriminal pada masa pandemi COVID-19. In Seminar Nasional Official Statistics, Vol. 2020, No. 1, pp. 300-308, 2020
K.S. Nugroho, A.Y. Sukmadewa, D.W.H. Wuswilahaken, F.A. Bachtiar, & N. Yudistira, “BERT fine-tuning for sentiment analysis on Indonesian mobile apps reviews. In Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology, pp. 258-264, 2021.
J.C. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies- Proceedings of the Conference, pp. 4171–4186, 2019.
M. Singh, A.K. Jakhar, & S. Pandey, “Sentiment analysis on the impact of coronavirus in social life using the BERT model. Social Network Analysis and Mining, vol. 11, no. 1, pp. 33-42, 2021.
N. Hidayah, & S. Sahibu, “Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 4, pp. 820-826, 2021.
T. Wang, K. Lu, K.P. Chow, & Q. Zhu, “COVID-19 sensing: negative sentiment analysis on social media in China via BERT model. Ieee Access, vol. 8, pp. 138162-138169, 2020.
F. Koto, A. Rahimi, J.H. Lau, & T. Baldwin, T. IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP. arXiv preprint arXiv:2011.00677, 2020.
S. Satriajati, S.B. Panuntun, & S. Pramana, “Implementasi web scraping dalam pengumpulan berita kriminal pada masa pandemi COVID-19. In Seminar Nasional Official Statistics, vol. 2020, no. 1, pp. 300-308, 2020.
D.D.A. Yani, H.S. Pratiwi, & H. Muhardi, “Implementasi web scraping untuk pengambilan data pada situs marketplace. JUSTIN (Jurnal Sistem dan Teknologi Informasi), vol. 7, no. 4, pp. 257-262, 2019.
How To Cite This :
Refbacks
- There are currently no refbacks.