Malicious Web Links Detection - A Comparative Analysis of Machine Learning Algorithms

C.-I. Coste

doi:10.24193/subbi.2023.1.02

C.-I. Coste Department of Computer Science, Babes-Bolyai University, 1, M. Kogalniceanu Street, 400084, Cluj-Napoca, Romania

DOI: https://doi.org/10.24193/subbi.2023.1.02

Abstract

One of the most challenging categories of threats circulating into the online world is social engineering, with malicious web links, fake news, clickbait, and other tactics. Malware URLs are extremely dangerous because they represent the main propagating vector for web malware. Malicious web links detection is a challenging task because the detection mechanism should not influence the consumers’ online experience. The proposed solutions must be sensitive enough, and fast enough to perform the detection mechanism before the user accesses the link and downloads its content.

Our paper proposes three goals. The main purpose of this paper is to refine a methodology for malicious web links detection that may be used to experiment with machine learning algorithms. Moreover, we propose to use this methodology for training and comparing several machine learning algorithms such as Random Forest, Decision Tree, K-Nearest Neighbor. The results are compared, justified, and placed in the malicious web links literature. In addition, we propose to identify the most relevant features and draw some observations about them.

References

[1] Adas, H., Shetty, S., and Tayib, W. Scalable detection of web malware on smartphones. In 2015 international conference on information and communication technology research (ICTRC) (Abu Dhabi, UAE, 2015), IEEE, IEEE, pp. 198–201.
[2] APWG. Phishing activity trends report - q4, 2020. Tech. rep., APWG, USA, 2021.
[3] Catak, F. O., Sahinbas, K., and D¨ortkardes¸, V. Malicious url detection using machine learning. In Artificial intelligence paradigms for smart cyber-physical systems. IGI Global, Papua New Guinea, Turkey, 2021, pp. 160–180.
[4] Cofense. Annual state of phishing report. Tech. rep., Cofense, Leesburg, VA, USA, 2021.
[5] Cook, S. Phishing statistics and facts for 2019–2022, Oct 2022.
[6] Ibrahim, S., Herami, N. A., Naqbi, E. A., and Aldwairi, M. Detection and analysis of drive-by downloads and malicious websites. In International Symposium on Security in Computing and Communication (Trivandrum, India, 2019), Springer, Springer, pp. 72–86.
[7] Islam, M., Poudyal, S., Gupta, K. D., et al. Mapreduce implementation for malicious websites classification. International Journal of Network Security & Its Applications (IJNSA) Vol 11 (2019).
[8] Johnson, C., Khadka, B., Basnet, R. B., and Doleck, T. Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 11, 4 (2020), 31–48.
[9] Kumi, S., Lim, C., and Lee, S.-G. Malicious url detection based on associative classification. Entropy 23, 2 (2021), 182.
[10] Naveen, I. N. V. D., Manamohana, K., and Verma, R. Detection of malicious urls using machine learning techniques. International Journal of Innovative Technology and Exploring Engineering 8, 4S2 (2019), 389–393.
[11] Oshingbesan, A., Okobi, C., Ekoh, C., Richard, K., and Munezero, A. Detection of malicious websites using machine learning techniques. preprint none, none (06 2021), 1–5.
[12] Pakhare, P. S., Krishnan, S., and Charniya, N. N. Malicious url detection using machine learning and ensemble modeling. In Computer Networks, Big Data and IoT. Springer, Singapore, 2021, pp. 839–850.
[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[14] Sahoo, D., Liu, C., and Hoi, S. C. Malicious url detection using machine learning: A survey. arXiv preprint arXiv:1701.07179 (2017).
[15] Shantanu, Janet, B., and Kumar, R. J. A. Malicious url detection: A comparative study. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) (Tamil Nadu, India, 2021), IEEE, IEEE, pp. 1147–1151.
[16] Tung, S. P., Wong, K. Y., Kuzminykh, I., Bakhshi, T., and Ghita, B. Using a machine learning model for malicious url type detection. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems (Cham, 2022), Y. Koucheryavy, S. Balandin, and S. Andreev, Eds., Springer International Publishing, pp. 493–505.
[17] Urcuqui, C., Navarro, A., Osorio, J., and Garcia, M. Machine learning classifiers to detect malicious websites. SSN 1950 (2017), 14–17.