Penerapan Algoritma Random Forest Berbasis Shap Feature Importance dan GridsearchCV Untuk Deteksi Phishing
Abstract
The rapid growth of internet users in Indonesia has increased the risk of cyberattacks, particularly phishing. Phishing is a digital fraud attempt that disguises links to resemble official websites in order to steal users’ sensitive information. This study aims to develop a phishing link detection model using a machine learning approach. The dataset consists of 11,430 URL entries from Mendeley Data, including features such as URL length, suspicious symbols, and subdomain levels. The Random Forest algorithm was chosen for its ability to handle high-dimensional data and resist overfitting. Feature selection was performed using SHAP (Shapley Additive Explanations) to assess feature contributions, while model optimization was conducted with GridSearchCV. The best configuration, RF + GS + SHAP Threshold-P10, achieved an accuracy of 0.9650 and an F1-score of 0.9651, producing an accurate, efficient, and interpretable phishing detection model.
Keywords: Phishing; Random Forest; GridSearchCV; SHAP; Machine Learning
Abstrak
Pesatnya pertumbuhan pengguna internet di Indonesia meningkatkan risiko serangan siber, salah satunya phishing. Phishing merupakan upaya penipuan digital dengan menyamarkan tautan agar menyerupai situs resmi untuk mencuri informasi sensitif pengguna. Penelitian ini bertujuan membangun model deteksi tautan phishing menggunakan pendekatan machine learning. Dataset yang digunakan berisi 11.430 entri URL dari Mendeley Data, mencakup fitur seperti panjang URL, simbol mencurigakan, dan tingkat subdomain. Algoritma random forest dipilih karena mampu menangani data berdimensi tinggi serta tahan terhadap overfitting. Seleksi fitur dilakukan dengan SHAP (Shapley Additive Explanations) untuk menilai kontribusi fitur, sedangkan optimasi parameter model menggunakan GridSearchCV. Hasil penelitian menunjukkan konfigurasi RF + GS + SHAP Threshold-P10 memberikan akurasi 0,9650 dan F1-score 0,9651, menghasilkan model yang akurat, efisien, dan transparan dalam mendeteksi tautan phishing.
Kata kunci: Phishing; Random Forest; GridSearchCV; SHAP; Machine Learning
References
F. Haikal and R. J. Anward, “Dampak Penggunaan Teknologi Informasi dan Komunikasi terhadap Produk Domestik Regional Bruto (PDRB) Per Kapita Tingkat Provinsi di Indonesia,” JIEP: Jurnal Ilmu Ekonomi dan Pembangunan, vol. 6, no. 1, pp. 45–60, 2025, Accessed: Sep. 15, 2025.
A. N. Nursabilah, N. V. Khurulani, A. Prasanti, D. A. Zuhra, A. A. Hg, and N. Nabanurohmah, “Perlindungan Hukum Bagi KorbanTindak Pidana Cyber Scam Serta Dampaknya Bagi Korban Sebagai Bentuk Viktimisasi Sekunder,” Hukum Inovatif, vol. 2, pp. 168–187, Jul. 2025.
P. Wijiastuti, H. Azahro, and A. Edward, “Analisis Kesadaran Ancaman Phishing di Social Media terhadap Gen Z di Indonesia Rentang Umur 12–27 Tahun Menggunakan Metode Likert,” JITU: Jurnal Informatika Utama, vol. 3, no. 1, pp. 82–93, 2025.
Kaspersky, “Kaspersky reports nearly 900 million phishing attempts in 2024 as cyber threats increase,” Kaspersky.com.
T. F. Ramadhan, I. Ramadhan, and A. A. Pangestu, “Analisis Keamanan Teknologi Dalam Menghadapi Ancaman Phising,” in Prosiding Seminar Nasional Teknologi Informasi dan Bisnis (SENATIB), Surakarta, Jul. 2024, pp. 568–573.
K. 4a4 and O. Iskandar, “Analisis Kejahatan Online Phishing Pada Masyarakat,” Leuser: Jurnal Hukum Nusantara, vol. 1, no. 2, pp. 32–36, Jun. 2024.
A. Nofiyan and M. Mushlihudin, “Analisis Forensik pada Web Phishing Menggunakan Metode National Institute Of Standards And Technology (NIST),” JSTIE: Jurnal Sarjana Teknik Informatika, vol. 8, no. 2, pp. 11–23, May 2020.
Komdigi, “Tangguhnya Keamanan Siber LPS Dalam Menangkal Serangan Hacker,” komdigi.go.id.
P. Vaitkevicius and V. Marcinkevicius, “Comparison of Classification Algorithms for Detection of Phishing Websites,” Informatica (Netherlands), vol. 31, no. 1, pp. 143–160, 2020.
Lukito and W. B. T. Handaya, “Deteksi Website Phishing Menggunakan Teknik Machine Learning,” Jurnal Informatika Atma Jogja, vol. 6, pp. 69–80, May 2025.
R. Saputra and E. Hartati, “Deteksi Website Phishing Menggunakan Algoritma Random Forest dengan Optimalisasi GridSearch,” JUTIM: Jurnal Teknologi Musi Rawas, vol. 10, no. 1, pp. 55–67, Jun. 2025.
K. D. Tzimourta, M. G. Tsipouras, P. Angelidis, D. G. Tsalikakis, and E. Orovou, “Maternal Health Risk Detection: Advancing Midwifery with Artificial Intelligence,” Healthcare (Switzerland), vol. 13, no. 7, pp. 1–21, Apr. 2025.
A. L. Puspanagara, “Penerapan Explainable AI untuk Prediksi Performa Akademik Mahasiswa Menggunakan Random Forest dan SHAP,” Infoman’s: Jurnal Ilmu-ilmu Informatika dan Manajemen, vol. 19, no. 1, pp. 1–7, May 2025.
V. A. Windarni, A. F. Nugraha, S. T. A. Ramadhani, D. A. Istiqomah, F. M. Puri, and A. Setiawan, “Deteksi Website Phishing Menggunakan Teknik Filter pada Model Machine Learning,” Information System Journal (INFOS), vol. 6, no. 1, pp. 39–43, May 2023.
A. K. Kencana, F. D. Ananda, and A. D. Hartanto, “Implementasi Metode Random Forest Klasifikasi untuk Phishing Link Detection,” Information Technology Journal, vol. 4, no. 2, pp. 55-59, Dec. 2022.
D. Kalla and S. Kuraku, “Phishing Website URL’s Detection Using NLP and Machine Learning Techniques,” Journal on Artificial Intelligence, vol. 5, no. 0, pp. 145–162, 2023.
K. Cao-Van, T. C. Minh, L. G. Minh, T. T. B. Quyen, and H. M. Tan, “Soft-Voting Ensemble Model: An Efficient Learning Approach for Predictive Prostate Cancer Risk,” Vietnam Journal of Computer Science, vol. 11, no. 4, pp. 531–552, Nov. 2024.
A. R. Kamila and V. Budiyanto, “Optimasi Model dengan Algoritma Support Vector Regressor Menggunakan Grid Search pada Penilaian Essai Otomatis,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 3, pp. 4622–4627, Jun. 2025.
W. Nugraha and A. Sasongko, “Hyperparameter Tuning pada Algoritma Klasifikasi dengan Grid Search,” SISTEMASI: Jurnal Sistem Informasi, vol. 11, no. 2, pp. 2540–9719, May 2022.
S. Alfadia Shauqie, M. Nurkamal Fauzan, and C. Prianto, “Analisis Pengaruh Fitur Terhadap Tinggi Badan Anak menggunakan SHAP,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 11, pp. 271–276, Aug. 2025.
Y. Gong, Q. Du, F. Wang, and L. Zhang, “Predicting road adhesion coefficient with a fusion strategy of SHAP dynamic parameters,” Sci Rep, vol. 15, no. 1, p. 35603, Oct. 2025.x`
How To Cite This :
Refbacks
- There are currently no refbacks.









