Komparasi Algoritma Naïve Bayes, Logistic Regression Dan Support Vector Machine pada Klasifikasi File Application Package Kit Android Malware

Diana Diana(1*),Richardus Eko Indrajit(2),Erick Dazki(3)
(1) Universitas Pradita
(2) Universitas Pradita
(3) Universitas Pradita
(*) Corresponding Author
DOI : 10.35889/jutisi.v11i1.815

Abstract

Abstrak. Fenomena malware yang terus bertumbuh pada sistem Android menjadikan peneliti berfokus untuk menganalisa malware dengan memanfaatkan teknologi kecerdasan buatan. Tujuan dari penelitian ini adalah menganalisa file-file APK (Application Package Kit) Android dengan mengklasifikasi keluarga malware. File malware akan dijadikan dataset untuk dilakukan training menggunakan algoritma pembelajaran mesin. Pembelajaran mesin yang digunakan adalah Naïve Bayes, Logistic Regression dan Support Vector Machine. Pengukuran performansi dan akurasi juga disajikan dalam perbandingan antara algoritma Naïve bayes, Logistic Regression dan Support Vector Machine yang merupakan algoritma Machine Learning dan bagian dari kecerdasan buatan. Hasil uji akurasi menunjukkan algoritma Naive Bayes mampu mengklasifikasi keluarga malware dengan tingkat akurasi 97.75%, sedangkan algoritma Logistic Regression akurasinya 88.75% dan akurasi Support Vector Machine mencapai 96,75%. Meskipun akurasi tidak setinggi penelitian sebelumnya, teknik analisa statis dengan fitur Permission dan fitur Intent cukup sederhana untuk mendeteksi file APK Android adalah malware atau bukan malware.

Kata kunci: Malware Android; Naïve Bayes; Logistic Regression; Support Vector Machine

 

Abstract. The phenomenon of malware that continues to grow on the Android system makes researchers focus on analyzing malware by utilizing artificial intelligence technology. The purpose of this research is to analyze Android APK (Application Package Kit) files by classifying malware families. The malware files will be used as a dataset for training using machine learning algorithms. The machine learning used is Naïve Bayes, Logistic Regression and Support Vector Machine. Performance and accuracy measurements are also presented in a comparison between the Naïve Bayes algorithm, Logistic Regression and Support Vector Machine which is a Machine Learning algorithm and part of artificial intelligence. The accuracy test results show that the Naive Bayes algorithm is able to classify malware families with an accuracy rate of 97.75%, while the Logistic Regression algorithm has an accuracy of 88.75% and an accuracy of Support Vector Machine reaches 96.75%. Although the accuracy is not as high as previous studies, the static analysis technique with the Permission feature and the Intent feature is quite simple to detect Android APK files are malware or not malware.

Keyword: Malware Android; Naïve Bayes; Logistic Regression; Support Vector Machine

References


M. Hussain et al., “Conceptual framework for the security of mobile health applications on Android platform,” Telemat. Informatics, vol. 35, no. 5, pp. 1335–1354, 2018, doi: 10.1016/j.tele.2018.03.005.

P. Black, I. Gondal, and R. Layton, “A survey of similarities in banking malware behaviours,” Comput. Secur., vol. 77, pp. 756–772, 2018, doi: 10.1016/j.cose.2017.09.013.

S. Aonzo, G. C. Georgiu, L. Verderame, and A. Merlo, “Obfuscapk: An open-source black-box obfuscation tool for Android apps,” SoftwareX, vol. 11, p. 100403, 2020, doi: 10.1016/j.softx.2020.100403.

C. Walls, “Open Source, Embedded Linux, and Android,” Embed. Softw., pp. 337–363, 2012, doi: 10.1016/b978-0-12-415822-1.00009-x.

E. Bougiakiotis, “One law to rule them all? The reach of EU data protection law after the Google v CNIL case,” Comput. Law Secur. Rev., vol. 42, p. 105580, 2021, doi: 10.1016/j.clsr.2021.105580.

G. Shrivastava and P. Kumar, “Android application behavioural analysis for data leakage,” Expert Syst., vol. 38, no. 1, pp. 1–12, 2021, doi: 10.1111/exsy.12468.

C. Wang, Z. Wu, X. Li, X. Zhou, A. Wang, and P. C. K. Hung, “SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Mobile Devices,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/101986.

E. M. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, “MalDozer: Automatic framework for android malware detection using deep learning,” DFRWS 2018 EU - Proc. 5th Annu. DFRWS Eur., vol. 24, pp. S48–S59, 2018, doi: 10.1016/j.diin.2018.01.007.

X. Su, Q. Gong, Y. Zheng, X. Liu, and K. C. Li, “An Informative and Comprehensive Behavioral Characteristics Analysis Methodology of Android Application for Data Security in Brain-Machine Interfacing,” Comput. Math. Methods Med., vol. 2020, 2020, doi: 10.1155/2020/3658795.

S. Garg and N. Baliyan, “Data on Vulnerability Detection in Android,” Data Br., vol. 22, pp. 1081–1087, 2019, doi: 10.1016/j.dib.2018.12.038.

J. Abawajy, A. Darem, and A. A. Alhashmi, “Feature subset selection for malware detection in smart iot platforms,” Sensors (Switzerland), vol. 21, no. 4, pp. 1–19, 2021, doi: 10.3390/s21041374.

H. Yuan, Y. Tang, W. Sun, and L. Liu, “A detection method for android application security based on TF-IDF and machine learning,” PLoS One, vol. 15, no. 9 September, pp. 1–19, 2020, doi: 10.1371/journal.pone.0238694.

A. Mahindru and A. L. Sangal, “FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques: FSDroid,” Multimed. Tools Appl., 2021, doi: 10.1007/s11042-020-10367-w.

C. Ding, N. Luktarhan, B. Lu, and W. Zhang, “A hybrid analysis-based approach to android malware family classification,” Entropy, vol. 23, no. 8, 2021, doi: 10.3390/e23081009.

X. Wang, Y. Yang, and Y. Zeng, “Accurate mobile malware detection and classification in the cloud,” Springerplus, vol. 4, no. 1, pp. 1–23, 2015, doi: 10.1186/s40064-015-1356-1.

H. Yuan, “MADFU : An Improved Malicious Application,” Entropy, 2020.

M. Rashed and G. Suarez-Tangil, “An Analysis of Android Malware Classification Services,” Sensors, vol. 21, no. 16, p. 5671, 2021, doi: 10.3390/s21165671.

S. R. T. Mat, M. F. Ab Razak, M. N. M. Kahar, J. M. Arif, S. Mohamad, and A. Firdaus, Towards a systematic description of the field using bibliometric analysis: malware evolution, vol. 126, no. 3. Springer International Publishing, 2021.

V. Balakrishnan and W. Kaur, “String-based multinomial naïve bayes for emotion detection among facebook diabetes community,” Procedia Comput. Sci., vol. 159, pp. 30–37, 2019, doi: 10.1016/j.procs.2019.09.157.

L. Jiang, S. Wang, C. Li, and L. Zhang, “Structure extended multinomial naive Bayes,” Inf. Sci. (Ny)., vol. 329, pp. 346–356, 2016, doi: 10.1016/j.ins.2015.09.037.

M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “Performance of bernoulli’s naive bayes classifier in the detection of fake news,” Mater. Today Proc., no. xxxx, 2020, doi: 10.1016/j.matpr.2020.10.896.

M. Artur, “Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,” Procedia Comput. Sci., vol. 190, no. 2019, pp. 564–570, 2021, doi: 10.1016/j.procs.2021.06.066.

D. Petschke and T. E. M. Staab, “A supervised machine learning approach using naive Gaussian Bayes classification for shape-sensitive detector pulse discrimination in positron annihilation lifetime spectroscopy (PALS),” Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., vol. 947, no. September, p. 162742, 2019, doi: 10.1016/j.nima.2019.162742.

M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, and M. Valdes-Sosa, “Fast Gaussian Naïve Bayes for searchlight classification analysis,” Neuroimage, vol. 163, pp. 471–479, 2017, doi: 10.1016/j.neuroimage.2017.09.001.

C.-W. Tsai, Y.-P. Chen, T.-C. Tang, and Y.-C. Luo, “An efficient parallel machine learning-based blockchain framework,” ICT Express, no. xxxx, pp. 0–7, 2021, doi: 10.1016/j.icte.2021.08.014.

O. E. Gundersen, S. Shamsaliei, and R. J. Isdahl, “Do machine learning platforms provide out-of-the-box reproducibility?,” Futur. Gener. Comput. Syst., vol. 126, pp. 34–47, 2022, doi: 10.1016/j.future.2021.06.014.

A. A. H. Ahmadini, “A novel technique for parameter estimation in intuitionistic fuzzy logistic regression model,” Ain Shams Eng. J., no. xxxx, 2021, doi: 10.1016/j.asej.2021.06.004.

R. Verma, N. Bhardwaj, P. D. Singh, A. Bhavsar, and V. Sharma, “Estimation of sex through morphometric landmark indices in facial images with strength of evidence in logistic regression analysis,” Forensic Sci. Int. Reports, vol. 4, p. 100226, 2021, doi: 10.1016/j.fsir.2021.100226.

W. Książek, M. Gandor, and P. Pławiak, “Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma,” Comput. Biol. Med., vol. 134, 2021, doi: 10.1016/j.compbiomed.2021.104431.

P. Taraba, “Linear regression on a set of selected templates from a pool of randomly generated templates,” Mach. Learn. with Appl., vol. 6, no. May, p. 100126, 2021, doi: 10.1016/j.mlwa.2021.100126.

K. Brzeziński, K. Józefiak, and A. Zbiciak, “On the interpretation of shear parameters uncertainty with a linear regression approach,” Meas. J. Int. Meas. Confed., vol. 174, no. May 2020, 2021, doi: 10.1016/j.measurement.2020.108949.


The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Full Text: PDF

How To Cite This :

Refbacks

  • There are currently no refbacks.