Malware Analysis and Static Call Graph Generation with Radare2

A Mester

doi:10.24193/subbi.2023.1.01

A Mester Department of Computer Science, Babes-Bolyai University, 1, M. Kogalniceanu Street, 400084, Cluj-Napoca, Romania

DOI: https://doi.org/10.24193/subbi.2023.1.01

Abstract

A powerful feature used in automated malware analysis is the static call graph of the executable file. Elimination of sandbox environment, fast scan, function call patterns beyond instruction level information – all of these motivate the prevalence of the feature. Processing and storing the static call graph of malicious samples in a scaled manner facilitates the application of complex network analysis in malware research. IDA Pro is one of the leading disassembler tools in the industry and can generate the call graph via GenCallGdl and GenFuncGdl APIs – a tool which was used in our previous works. In this paper an alternative analysis method is presented using another disassembler tool, Radare2, an open-source Unix-based software, which is also frequently used in this domain. Radare2 has Python support (among other languages), via the r2pipe package, thus enabling full scalability on Linux-based servers using containerized solutions. This paper offers a detailed technical description on how to use Radare2 to generate the static call graph of a PE file and a thorough comparison with the output of IDA Pro, as well as a public dataset on which the experiments were carried out.

References

[1] Andriesse, D., Chen, X., Van Der Veen, V., Slowinska, A., and Bos, H. An in-depth analysis of disassembly on full-scale x86/x64 binaries. In USENIX Security Symposium (2016), pp. 583–600.
[2] Bai, J., Shi, Q., and Mu, S. A malware and variant detection method using function call graph isomorphism. Security and Communication Networks 2019 (2019), 1–12.
[3] Cohen, I. Deobfuscating apt32 flow graphs with cutter and radare2. Tech. rep., 2019.
[4] Cunningham, E., Boydell, O., Doherty, C., Roques, B., and Le, Q. Using text classification methods to detect malware. In AICS (2019).
[5] Dahl, G. E., Stokes, J. W., Deng, L., and Yu, D. Large-scale malware classification using random projections and neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), IEEE, pp. 3422–3426.
[6] del Pilar Angeles, M., and Gamez, A. E. Comparison of methods hamming distance, jaro, and monge–elkan. DBKDA 2015 (2015), 73.
[7] Elhadi, A. A. E., Maarof, M. A., and Barry, B. I. Improving the detection of malware behaviour using simplified data dependent api call graph. International Journal of Security and Its Applications 7, 5 (2013), 29–42.
[8] Faruki, P., Laxmi, V., Gaur, M. S., and Vinod, P. Mining control flow graph as api call-grams to detect portable executable malware. In Proceedings of the Fifth International Conference on Security of Information and Networks (2012), pp. 130–137.
[9] Gibert, D., Mateu, C., and Planes, J. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153 (2020), 102526.
[10] Jaro, M. A. Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84, 406 (1989), 414–420.
[11] Jiang, H., Turki, T., and Wang, J. T. Dlgraph: Malware detection using deep learning and graph embedding. In 2018 17th IEEE international conference on machine learning and applications (ICMLA) (2018), IEEE, pp. 1029–1033.
[12] Kilgallon, S., De La Rosa, L., and Cavazos, J. Improving the effectiveness and efficiency of dynamic malware analysis with machine learning. In 2017 Resilience Week (RWS) (2017), pp. 30–36.
[13] Kinable, J., and Kostakis, O. Malware classification based on call graph clustering. Journal in computer virology 7, 4 (2011), 233–245.
[14] Koo, H., Park, S., and Kim, T. A look back on a function identification problem. In Annual Computer Security Applications Conference (2021), pp. 158–168.
[15] Levenshtein, V. I., et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (1966), vol. 10, Soviet Union, pp. 707–710.
[16] Massarelli, L., Di Luna, G. A., Petroni, F., Baldoni, R., and Querzoni, L. Safe: Self-attentive function embeddings for binary similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment (Cham, 2019), Springer International Publishing, pp. 309–329.
[17] Mester, A. Scalable, real-time malware clustering based on signatures of static call graph features. Master’s thesis, Babes–Bolyai University, Faculty of Mathematics and Computer Science, Cluj-Napoca, Romania, 2020.
[18] Mester, A., and Bod´o, Z. Validating static call graph-based malware signatures using community detection methods. In Proceedings of ESANN (2021).
[19] Mester, A., and Bod´o, Z. Malware classification based on graph convolutional neural networks and static call graph features. In Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence: 35th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2022, Kitakyushu, Japan, July 19–22, 2022, Proceedings (2022), Springer, pp. 528–539.
[20] Nar, M., Kakisim, A. G., Yavuz, M. N., and So˘gukpinar, ˙I. Analysis and comparison of disassemblers for opcode based malware analysis. In 2019 4th International Conference on Computer Science and Engineering (UBMK) (2019), IEEE, pp. 17–22.
[21] Org., R. The official radare2 book. https://book.rada.re/.
[22] Park, Y., Reeves, D., Mulukutla, V., and Sundaravel, B. Fast malware classification by automated behavioral graph matching. In Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research (2010), pp. 1–4.
[23] Pektas¸, A., and Acarman, T. Deep learning for effective android malware detection using api call graph embeddings. Soft Computing 24 (2020), 1027–1043.
[24] Priyanga, S., Suresh, R., Romana, S., and Shankar Sriram, V. The good, the bad, and the missing: A comprehensive study on the rise of machine learning for binary code analysis. In Computational Intelligence in Data Mining: Proceedings of ICCIDM 2021. Springer, 2022, pp. 397–406.
[25] Shaila, S., Darki, A., Faloutsos, M., Abu-Ghazaleh, N., and Sridharan, M. Disco: Combining disassemblers for improved performance. In Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses (2021), pp. 148–161.
[26] Singh, A., Arora, R., and Pareek, H. Malware analysis using multiple api sequence mining control flow graph. arXiv preprint arXiv:1707.02691 (2017).
[27] Steffens, T. Attribution of Advanced Persistent Threats. Springer, 2020.
[28] Ucci, D., Aniello, L., and Baldoni, R. Survey of machine learning techniques for malware analysis. Computers & Security 81 (2019), 123–147.
[29] Wenzl, M., Merzdovnik, G., Ullrich, J., and Weippl, E. From hack to elaborate technique—a survey on binary rewriting. ACM Computing Surveys (CSUR) 52, 3 (2019), 1–37.
[30] Winkler, W. E. String comparator metrics and enhanced decision rules in the fellegisunter model of record linkage.
[31] Yin, X., Liu, S., Liu, L., and Xiao, D. Function recognition in stripped binary of embedded devices. IEEE Access 6 (2018), 75682–75694.