Perbandingan Metode Decision Tree dan Logistic Regression dalam Klasifikasi Tingkat Obesitas Berdasarkan Gaya Hidup
Abstract
This study compares two classification algorithms, Decision Tree and Logistic Regression, to predict obesity levels based on individual lifestyle patterns. The initial dataset consisted of 2,212 data points with 17 attributes, which were then narrowed down to 499 data points with the nine most relevant attributes. After a cleaning process, 498 valid data points were obtained, including demographic information and daily habits, which were then used in the modelling process. To objectively evaluate model performance, a stratified 5-fold cross-validation method was used, along with testing on separate test data. The evaluation results showed that Logistic Regression consistently performed better, with an average accuracy of 0.8755 and an F1-score of 0.8576. In contrast, Decision Tree achieved an accuracy of 0.7851 and an F1-score of 0.7704. The test data also showed a similar pattern, with Decision Tree achieving an accuracy of 0.75 and an F1-score of 0.7534, while Logistic Regression achieved an accuracy of 0.88 and an F1-score of 0.8664. Overall, the results showed that logistic regression performed more consistently and reliably in classifying obesity levels, suggesting that this method may be a superior method for supporting analysis in the healthcare industry.
Kata Kunci: Classification; Obesity; Lifestyle; Decision Tree; Logistic Regression
Abstrak
Studi ini membandingkan dua algoritma klasifikasi, Decision Tree dan Logistic Regression, untuk memprediksi tingkat obesitas berdasarkan pola gaya hidup individu. Dataset awal terdiri dari 2.212 data dengan 17 atribut, yang kemudian diseleksi menjadi 499 data dengan 9 atribut yang paling relevan. Setelah melalui proses pembersihan, diperoleh 498 data valid yang mencakup informasi demografis dan kebiasaan sehari-hari, yang selanjutnya digunakan dalam proses pemodelan. Untuk mengevaluasi kinerja model secara objektif, digunakan metode stratified 5-fold cross-validation serta pengujian pada data uji terpisah. Hasil evaluasi menunjukkan bahwa Logistic Regression secara konsisten berkinerja lebih baik, dengan rata-rata akurasi 0,8755 dan F1-score sebesar 0,8576. Sebaliknya, Decision Tree memperoleh akurasi 0,7851 dan F1-score sebesar 0,7704. Data pengujian juga menunjukkan pola serupa, dengan Decision Tree mencapai akurasi 0,75 dan F1-score sebesar 0,7534, sedangkan Logistic Regression mencapai akurasi 0,88 dan F1-score sebesar 0,8664. Secara keseluruhan, hasil penelitian menunjukkan bahwa regresi logistik berkinerja lebih konsisten dan andal dalam mengklasifikasikan tingkat obesitas, menunjukkan bahwa metode ini mungkin merupakan metode yang lebih unggul untuk mendukung analisis di industri perawatan kesehatan.
Keywords
References
D. Fakhrizal, A. Ananda Sutrisno, N. Afza Zain, G. Angga Mukti, and A. Setiawan, “Implementasi Algoritma Machine Learning Menggunakan Model Random Forest Untuk Klasifikasi Obesitas,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 9, no. 5, pp. 7579–7584, 2025, doi: 10.36040/jati.v9i5.14667.
S. Hardwis and J. Jajat, “Analisis Resiko Obesitas Berdasarkan Aktivitas Fisik: Implementasi Metode Artificial Intelligence Machine Learning,” Jurnal Keolahragaan, vol. 10, no. 2, p. 97, 2024, doi: 10.25157/jkor.v10i2.16884.
S. A. Utiarahman, A. Mulawati, and M. Pratama, “Analisis Perbandingan KNN, SVM, Decision Tree dan Regresi Logistik Untuk Klasifikasi Obesitas Multi Kelas,” Media Online), vol. 4, no. 6, pp. 3137–3146, 2024, doi: 10.30865/klik.v4i6.1871.
F. Almu’iini Ahda, A. Prasetya Wibawa, D. Prasetya, and A. Sulistyo, “International Journal On Informatics Visualization Journal Homepage : Www.Joiv.Org/Index.Php/Joiv International Journal On Informatics Visualization Comparison of Adam Optimization and RMSprop in Minangkabau-Indonesian Bidirectional Translation with Neura,” vol. 8, no. March, pp. 231–238, 2024, [Online]. Available: www.joiv.org/index.php/joiv
D. A. Sulistyo, D. D. Prasetya, F. A. Ahda, and A. P. Wibawa, “Pivoted Low Resource Multilingual Translation with NER Optimization,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 24, no. 5, 2025, doi: 10.1145/3727876.
F. A. Ahda, A. P. Wibawa, D. D. Prasetya, D. A. Sulistyo, and A. Nafalski, “Minangkabau Language Stemming: A New Approach with Modified Enhanced Confix Stripping,” Jurnal RESTI, vol. 9, no. 3, pp. 677–687, 2025, doi: 10.29207/resti.v9i3.6511.
F. A. Ahda and M. Zainuddin, “Prediksi Kepuasan Pelayanan Perpustakaan Menggunakan Algoritma Decision Tree (C4.5),” Jurnal Teknologi Informasi, vol. 10, pp. 143–150, 2019, doi: 10.36382/jti-tki.v10i2.368.
E. Halabaku and E. Bytyçi, “Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests,” Intelligent Automation and Soft Computing, vol. 39, no. 6, pp. 987–1006, 2024, doi: 10.32604/iasc.2024.059429.
A. Pinar, | Fatma, H. Yagin, and | Georgian Badicu, “Use of Logistic Regression Method in Predicting Obesity Levels with Machine Learning Method,” Journal of Exercise Science & Physical Activity Reviews, vol. 2024, no. 1, pp. 104–113, 2024, [Online]. Available: https://doi.org/10.5281/zenodo.12601115
J. Pongthao, A. Na-Udom, and J. Rungrattanaubol, “Machine Learning Classification with Logistic Regression Feature Selection Approach on Health Datasets,” IAENG International Journal of Applied Mathematics, vol. 55, no. 6, pp. 1–3, 2025.
E. Dwi et al., “Penggunaan Data Mining untuk Prediksi tingkat Obesitas di Meksiko Menggunakan Metode Random Forest,” Agustus, vol. 8, pp. 2549–7952, 2024.
H. W. Dhany, Sutarman, and F. Izhari, “Exploratory Data Analysis (EDA) methods for healthcare classification,” Journal of Intelligent Decision Support System (IDSS), vol. 6, no. 4, pp. 209–215, 2023, [Online]. Available: www.idss.iocspublisher.org
P. Bangert, MACHINE LEARNING: Konsep, Implementasi, dan Aplikasi, vol. 45, no. 13. 2021. [Online]. Available: https://books.google.ca/books?id=EoYBngEACAAJ&dq= mitchell+machine+learning+1997&hl=en&sa=X&ved=0ahUKEwiomdqfj8TkAhWGslkKHRCbAtoQ6AEIKjAA
S. Tondang, R. R. Prasetyo, R. Fulvian, Y. G. Sitorus, and G. Chrisnawati, “Analisis Perbandingan Algoritma K-Nearest Neighbor dan Ensemble Learning dalam Klasifikasi Penyakit Obesitas,” RIGGS: Journal of Artificial Intelligence and Digital Business, vol. 4, no. 2, pp. 4536–4548, 2025, doi: 10.31004/riggs.v4i2.994.
R. S, B. Kumaraswamy, V. Agarwal, and A. G. Jain, “A Comparative Study of Different Data Pre-processing Methods for Machine Learning,” International Journal For Multidisciplinary Research, vol. 7, no. 4, pp. 1–8, 2025, doi: 10.36948/ijfmr.2025.v07i04.52920.
E. Z. Dahmash et al., “Upholding Quality and Patient Safety during COVID-19 Pandemic—A Jordanian Case Study,” Healthcare (Switzerland), vol. 11, no. 4, pp. 1–13, 2023, doi: 10.3390/healthcare11040523.
A. W. Wicaksono and T. Setiadi, “Penerapan Klasifikasi Decision Tree (C4.5) untuk Memprediksi Kelulusan Siswa Sekolah Dasar di Kecamatan Juai,” Format : Jurnal Ilmiah Teknik Informatika, vol. 12, no. 2, p. 151, 2023, doi: 10.22441/format.2023.v12.i2.008.
M. Fahmuddin, M. K. Aidid, and M. J. Taslim, “Implementasi Analisis Regresi Logistik Dengan Metode Machine Learning Untuk Mengklasifikasi Berita Di Indonesia,” VARIANSI: Journal of Statistics and Its Application on Teaching and Research, vol. 5, no. 03, pp. 155–162, 2023, doi: 10.35580/variansiunm116.
Y. I. Sulistya and M. Istighosah, “Obesity Prediction with Machine Learning Models Comparing Various Algorithm Performances,” International Journal of Artificial Intelligence in Medical Issues, vol. 3, no. 1, pp. 1–13, 2025, doi: 10.56705/ijaimi.v3i1.181.
G. James, T. Hastie, R. Tibshirani, and D. Witten, An Introduction to Statistical Learning, Springer Texts, vol. 102. 2023.
H. T. Santoso, F. A. Felmidi, A. N. Fadhila, A. Ristyawan, and E. Daniati, “Analisis Kinerja Algoritma Data Mining pada Klasifikasi Tingkat Obesitas dengan K-Fold Cross Validation dan AUC,” Agustus, vol. 8, pp. 2549–7952, 2024.
M. Iwagami et al., “Comparison of machine-learning and Logistic Regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study,” PLOS Digital Health, vol. 3, no. 8, pp. 1–16, 2024, doi: 10.1371/journal.pdig.0000578.
A.S. Lase, S.S. Berutu, & H. Budiati, “Implementasi Metode K-Nearest Neighbor Pada Sentimen Masyarakat Terkait Pelaksanaan KTT G20. Progresif: Jurnal Ilmiah Komputer, vol. 19, no. 2, pp. 481-490. 2023.
How To Cite This :
Refbacks
- There are currently no refbacks.










