Malicious Web Links Detection - A Comparative Analysis of Machine Learning Algorithms
Abstract
One of the most challenging categories of threats circulating into the online world is social engineering, with malicious web links, fake news, clickbait, and other tactics. Malware URLs are extremely dangerous because they represent the main propagating vector for web malware. Malicious web links detection is a challenging task because the detection mechanism should not influence the consumers’ online experience. The proposed solutions must be sensitive enough, and fast enough to perform the detection mechanism before the user accesses the link and downloads its content.
Our paper proposes three goals. The main purpose of this paper is to refine a methodology for malicious web links detection that may be used to experiment with machine learning algorithms. Moreover, we propose to use this methodology for training and comparing several machine learning algorithms such as Random Forest, Decision Tree, K-Nearest Neighbor. The results are compared, justified, and placed in the malicious web links literature. In addition, we propose to identify the most relevant features and draw some observations about them.
References
[2] APWG. Phishing activity trends report - q4, 2020. Tech. rep., APWG, USA, 2021.
[3] Catak, F. O., Sahinbas, K., and D¨ortkardes¸, V. Malicious url detection using machine learning. In Artificial intelligence paradigms for smart cyber-physical systems. IGI Global, Papua New Guinea, Turkey, 2021, pp. 160–180.
[4] Cofense. Annual state of phishing report. Tech. rep., Cofense, Leesburg, VA, USA, 2021.
[5] Cook, S. Phishing statistics and facts for 2019–2022, Oct 2022.
[6] Ibrahim, S., Herami, N. A., Naqbi, E. A., and Aldwairi, M. Detection and analysis of drive-by downloads and malicious websites. In International Symposium on Security in Computing and Communication (Trivandrum, India, 2019), Springer, Springer, pp. 72–86.
[7] Islam, M., Poudyal, S., Gupta, K. D., et al. Mapreduce implementation for malicious websites classification. International Journal of Network Security & Its Applications (IJNSA) Vol 11 (2019).
[8] Johnson, C., Khadka, B., Basnet, R. B., and Doleck, T. Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 11, 4 (2020), 31–48.
[9] Kumi, S., Lim, C., and Lee, S.-G. Malicious url detection based on associative classification. Entropy 23, 2 (2021), 182.
[10] Naveen, I. N. V. D., Manamohana, K., and Verma, R. Detection of malicious urls using machine learning techniques. International Journal of Innovative Technology and Exploring Engineering 8, 4S2 (2019), 389–393.
[11] Oshingbesan, A., Okobi, C., Ekoh, C., Richard, K., and Munezero, A. Detection of malicious websites using machine learning techniques. preprint none, none (06 2021), 1–5.
[12] Pakhare, P. S., Krishnan, S., and Charniya, N. N. Malicious url detection using machine learning and ensemble modeling. In Computer Networks, Big Data and IoT. Springer, Singapore, 2021, pp. 839–850.
[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[14] Sahoo, D., Liu, C., and Hoi, S. C. Malicious url detection using machine learning: A survey. arXiv preprint arXiv:1701.07179 (2017).
[15] Shantanu, Janet, B., and Kumar, R. J. A. Malicious url detection: A comparative study. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) (Tamil Nadu, India, 2021), IEEE, IEEE, pp. 1147–1151.
[16] Tung, S. P., Wong, K. Y., Kuzminykh, I., Bakhshi, T., and Ghita, B. Using a machine learning model for malicious url type detection. In Internet of Things, Smart Spaces, and Next Generation Networks and Systems (Cham, 2022), Y. Koucheryavy, S. Balandin, and S. Andreev, Eds., Springer International Publishing, pp. 493–505.
[17] Urcuqui, C., Navarro, A., Osorio, J., and Garcia, M. Machine learning classifiers to detect malicious websites. SSN 1950 (2017), 14–17.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
When the article is accepted for publication, I, as the author and representative of the coauthors, hereby agree to transfer to Studia Universitatis Babes-Bolyai, Series Informatica, all rights, including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author specifically retain: the right to make further copies of all or part of the published article for my use in classroom teaching; the right to reuse all or part of this material in a review or in a textbook of which I am the author; the right to make copies of the published work for internal distribution within the institution that employs me.