Komparasi Algoritma NaÃ¯ve Bayes, Logistic Regression Dan Support Vector Machine pada Klasifikasi File Application Package Kit Android Malware

Diana Diana; Richardus Eko Indrajit; Erick Dazki

doi:10.35889/jutisi.v11i1.815

Komparasi Algoritma NaÃ¯ve Bayes, Logistic Regression Dan Support Vector Machine pada Klasifikasi File Application Package Kit Android Malware

Diana Diana^(1*),Richardus Eko Indrajit⁽²⁾,Erick Dazki⁽³⁾
(1) Universitas Pradita
(2) Universitas Pradita
(3) Universitas Pradita
(*) Corresponding Author

DOI : 10.35889/jutisi.v11i1.815

Abstract

Abstrak. Fenomena malware yang terus bertumbuh pada sistem Android menjadikan peneliti berfokus untuk menganalisa malware dengan memanfaatkan teknologi kecerdasan buatan. Tujuan dari penelitian ini adalah menganalisa file-file APK (Application Package Kit) Android dengan mengklasifikasi keluarga malware. File malware akan dijadikan dataset untuk dilakukan training menggunakan algoritma pembelajaran mesin. Pembelajaran mesin yang digunakan adalah NaÃ¯ve Bayes, Logistic Regression dan Support Vector Machine. Pengukuran performansi dan akurasi juga disajikan dalam perbandingan antara algoritma NaÃ¯ve bayes, Logistic Regression dan Support Vector Machine yang merupakan algoritma Machine Learning dan bagian dari kecerdasan buatan. Hasil uji akurasi menunjukkan algoritma Naive Bayes mampu mengklasifikasi keluarga malware dengan tingkat akurasi 97.75%, sedangkan algoritma Logistic Regression akurasinya 88.75% dan akurasi Support Vector Machine mencapai 96,75%. Meskipun akurasi tidak setinggi penelitian sebelumnya, teknik analisa statis dengan fitur Permission dan fitur Intent cukup sederhana untuk mendeteksi file APK Android adalah malware atau bukan malware.

Kata kunci: Malware Android; NaÃ¯ve Bayes; Logistic Regression; Support Vector Machine

Abstract. The phenomenon of malware that continues to grow on the Android system makes researchers focus on analyzing malware by utilizing artificial intelligence technology. The purpose of this research is to analyze Android APK (Application Package Kit) files by classifying malware families. The malware files will be used as a dataset for training using machine learning algorithms. The machine learning used is NaÃ¯ve Bayes, Logistic Regression and Support Vector Machine. Performance and accuracy measurements are also presented in a comparison between the NaÃ¯ve Bayes algorithm, Logistic Regression and Support Vector Machine which is a Machine Learning algorithm and part of artificial intelligence. The accuracy test results show that the Naive Bayes algorithm is able to classify malware families with an accuracy rate of 97.75%, while the Logistic Regression algorithm has an accuracy of 88.75% and an accuracy of Support Vector Machine reaches 96.75%. Although the accuracy is not as high as previous studies, the static analysis technique with the Permission feature and the Intent feature is quite simple to detect Android APK files are malware or not malware.

Keyword: Malware Android; NaÃ¯ve Bayes; Logistic Regression; Support Vector Machine

References

M. Hussain et al., â€œConceptual framework for the security of mobile health applications on Android platform,â€ Telemat. Informatics, vol. 35, no. 5, pp. 1335â€“1354, 2018, doi: 10.1016/j.tele.2018.03.005.

P. Black, I. Gondal, and R. Layton, â€œA survey of similarities in banking malware behaviours,â€ Comput. Secur., vol. 77, pp. 756â€“772, 2018, doi: 10.1016/j.cose.2017.09.013.

S. Aonzo, G. C. Georgiu, L. Verderame, and A. Merlo, â€œObfuscapk: An open-source black-box obfuscation tool for Android apps,â€ SoftwareX, vol. 11, p. 100403, 2020, doi: 10.1016/j.softx.2020.100403.

C. Walls, â€œOpen Source, Embedded Linux, and Android,â€ Embed. Softw., pp. 337â€“363, 2012, doi: 10.1016/b978-0-12-415822-1.00009-x.

E. Bougiakiotis, â€œOne law to rule them all? The reach of EU data protection law after the Google v CNIL case,â€ Comput. Law Secur. Rev., vol. 42, p. 105580, 2021, doi: 10.1016/j.clsr.2021.105580.

G. Shrivastava and P. Kumar, â€œAndroid application behavioural analysis for data leakage,â€ Expert Syst., vol. 38, no. 1, pp. 1â€“12, 2021, doi: 10.1111/exsy.12468.

C. Wang, Z. Wu, X. Li, X. Zhou, A. Wang, and P. C. K. Hung, â€œSmartMal: A Service-Oriented Behavioral Malware Detection Framework for Mobile Devices,â€ Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/101986.

E. M. B. Karbab, M. Debbabi, A. Derhab, and D. Mouheb, â€œMalDozer: Automatic framework for android malware detection using deep learning,â€ DFRWS 2018 EU - Proc. 5th Annu. DFRWS Eur., vol. 24, pp. S48â€“S59, 2018, doi: 10.1016/j.diin.2018.01.007.

X. Su, Q. Gong, Y. Zheng, X. Liu, and K. C. Li, â€œAn Informative and Comprehensive Behavioral Characteristics Analysis Methodology of Android Application for Data Security in Brain-Machine Interfacing,â€ Comput. Math. Methods Med., vol. 2020, 2020, doi: 10.1155/2020/3658795.

S. Garg and N. Baliyan, â€œData on Vulnerability Detection in Android,â€ Data Br., vol. 22, pp. 1081â€“1087, 2019, doi: 10.1016/j.dib.2018.12.038.

J. Abawajy, A. Darem, and A. A. Alhashmi, â€œFeature subset selection for malware detection in smart iot platforms,â€ Sensors (Switzerland), vol. 21, no. 4, pp. 1â€“19, 2021, doi: 10.3390/s21041374.

H. Yuan, Y. Tang, W. Sun, and L. Liu, â€œA detection method for android application security based on TF-IDF and machine learning,â€ PLoS One, vol. 15, no. 9 September, pp. 1â€“19, 2020, doi: 10.1371/journal.pone.0238694.

A. Mahindru and A. L. Sangal, â€œFSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques: FSDroid,â€ Multimed. Tools Appl., 2021, doi: 10.1007/s11042-020-10367-w.

C. Ding, N. Luktarhan, B. Lu, and W. Zhang, â€œA hybrid analysis-based approach to android malware family classification,â€ Entropy, vol. 23, no. 8, 2021, doi: 10.3390/e23081009.

X. Wang, Y. Yang, and Y. Zeng, â€œAccurate mobile malware detection and classification in the cloud,â€ Springerplus, vol. 4, no. 1, pp. 1â€“23, 2015, doi: 10.1186/s40064-015-1356-1.

H. Yuan, â€œMADFU : An Improved Malicious Application,â€ Entropy, 2020.

M. Rashed and G. Suarez-Tangil, â€œAn Analysis of Android Malware Classification Services,â€ Sensors, vol. 21, no. 16, p. 5671, 2021, doi: 10.3390/s21165671.

S. R. T. Mat, M. F. Ab Razak, M. N. M. Kahar, J. M. Arif, S. Mohamad, and A. Firdaus, Towards a systematic description of the field using bibliometric analysis: malware evolution, vol. 126, no. 3. Springer International Publishing, 2021.

V. Balakrishnan and W. Kaur, â€œString-based multinomial naÃ¯ve bayes for emotion detection among facebook diabetes community,â€ Procedia Comput. Sci., vol. 159, pp. 30â€“37, 2019, doi: 10.1016/j.procs.2019.09.157.

L. Jiang, S. Wang, C. Li, and L. Zhang, â€œStructure extended multinomial naive Bayes,â€ Inf. Sci. (Ny)., vol. 329, pp. 346â€“356, 2016, doi: 10.1016/j.ins.2015.09.037.

M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, â€œPerformance of bernoulliâ€™s naive bayes classifier in the detection of fake news,â€ Mater. Today Proc., no. xxxx, 2020, doi: 10.1016/j.matpr.2020.10.896.

M. Artur, â€œReview the performance of the Bernoulli NaÃ¯ve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features,â€ Procedia Comput. Sci., vol. 190, no. 2019, pp. 564â€“570, 2021, doi: 10.1016/j.procs.2021.06.066.

D. Petschke and T. E. M. Staab, â€œA supervised machine learning approach using naive Gaussian Bayes classification for shape-sensitive detector pulse discrimination in positron annihilation lifetime spectroscopy (PALS),â€ Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., vol. 947, no. September, p. 162742, 2019, doi: 10.1016/j.nima.2019.162742.

M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, and M. Valdes-Sosa, â€œFast Gaussian NaÃ¯ve Bayes for searchlight classification analysis,â€ Neuroimage, vol. 163, pp. 471â€“479, 2017, doi: 10.1016/j.neuroimage.2017.09.001.

C.-W. Tsai, Y.-P. Chen, T.-C. Tang, and Y.-C. Luo, â€œAn efficient parallel machine learning-based blockchain framework,â€ ICT Express, no. xxxx, pp. 0â€“7, 2021, doi: 10.1016/j.icte.2021.08.014.

O. E. Gundersen, S. Shamsaliei, and R. J. Isdahl, â€œDo machine learning platforms provide out-of-the-box reproducibility?,â€ Futur. Gener. Comput. Syst., vol. 126, pp. 34â€“47, 2022, doi: 10.1016/j.future.2021.06.014.

A. A. H. Ahmadini, â€œA novel technique for parameter estimation in intuitionistic fuzzy logistic regression model,â€ Ain Shams Eng. J., no. xxxx, 2021, doi: 10.1016/j.asej.2021.06.004.

R. Verma, N. Bhardwaj, P. D. Singh, A. Bhavsar, and V. Sharma, â€œEstimation of sex through morphometric landmark indices in facial images with strength of evidence in logistic regression analysis,â€ Forensic Sci. Int. Reports, vol. 4, p. 100226, 2021, doi: 10.1016/j.fsir.2021.100226.

W. KsiÄ…Å¼ek, M. Gandor, and P. PÅ‚awiak, â€œComparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma,â€ Comput. Biol. Med., vol. 134, 2021, doi: 10.1016/j.compbiomed.2021.104431.

P. Taraba, â€œLinear regression on a set of selected templates from a pool of randomly generated templates,â€ Mach. Learn. with Appl., vol. 6, no. May, p. 100126, 2021, doi: 10.1016/j.mlwa.2021.100126.

K. BrzeziÅ„ski, K. JÃ³zefiak, and A. Zbiciak, â€œOn the interpretation of shear parameters uncertainty with a linear regression approach,â€ Meas. J. Int. Meas. Confed., vol. 174, no. May 2020, 2021, doi: 10.1016/j.measurement.2020.108949.

Download this PDF file

The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off

Full Text: PDF

How To Cite This :

Refbacks

There are currently no refbacks.

Komparasi Algoritma NaÃ¯ve Bayes, Logistic Regression Dan Support Vector Machine pada Klasifikasi File Application Package Kit Android Malware

Abstract

References

Article Statistic

Dimensions Metrics

How To Cite This :

Refbacks

Policies

Submissions

Other

External Links

Username
Password
Remember me