Software Maintainability and Refactorings Prediction Based on Technical Debt Issues
Abstract
Software maintainability is a crucial factor impacting cost, time and resource allocation for software development. Code refactorings greatly enhance code quality, readability, understandability and extensibility. Hence, accurate prediction methods for both maintainability and refactorings are vital for long-term project sustainability and success, offering substantial benefits to the software community as a whole. This article focuses on prediction of software maintainability and the number of needed code refactorings using technical debt data. Two approaches were explored, one compressing technical debt issues per software component and employing machine learning algorithms such as ExtraTrees, Random Forest, Decision Trees, which all obtained a high accuracy and performance. The second approach retained multiple debt issue entries and utilized a Recurrent Neural Network, although less effectively. In addition to the prediction of the requisite number of code refactorings and software maintainability for individual software components, a comprehensive analysis of technical debt issues was conducted before and after the refactoring process. The outcomes of this study contribute to the advancement of a dependable prediction system for maintainability and refactorings, presenting potential advantages to the software community in effectively managing maintenance resources. From all the employed models, the ExtraTrees model yielded the most optimal predictive outcomes. To the best of our knowledge no other approaches of using ML techniques for this problem have been reported in the literarture.
References
[2] Arisholm, E., Briand, L. C., and Johannessen, E. B. An empirical study on the relationship between software maintainability and bug-proneness. In 2010 IEEE International Symposium on Software Metrics (METRICS) (2010), IEEE.
[3] Biau, G., and Scornet, E. A random forest guided tour. TEST 25 (2016), 197–227.
[4] Breiman, L. Classification and regression trees. In Decision forests for computer vision and medical image analysis (2017), Springer, pp. 19–38.
[5] CAST. 2018 software intelligence report. Tech. rep., CAST, 2018.
[6] Cortes, C., and Vapnik, V. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.
[7] Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., and Vapnik, V. Support vector regression machines. Advances in neural information processing systems 9 (1997), 155–161.
[8] Elmidaoui, S., Cheikhi, L., Idri, A., and Abran, A. Machine learning techniques for software maintainability prediction: Accuracy analysis. Journal of Computer Science and Technology 35, 5 (2020), 1147–1174.
[9] Ernst, N. A., and Eichmann, D. A. The future of software maintenance. IEEE Software 16, 1 (1999), 44–50.
[10] Geurts, P., Ernst, D., and Wehenkel, L. Extremely randomized trees. Machine Learning 63, 1 (2006), 3–42.
[11] Guo, G., Wang, H., Bell, D., Bi, Y., and Greer, K. Knn model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE (Berlin, Heidelberg, 2003), R. Meersman, Z. Tari, and D. C. Schmidt, Eds., Springer Berlin Heidelberg, pp. 986–996.
[12] Heged˝us, P., K´ad´ar, I., Ferenc, R., and Gyim´othy, T. Empirical evaluation of software maintainability based on a manually validated refactoring dataset. Information and Software Technology 95 (2018), 313–327.
[13] Jang, J.-S., Sun, C.-T., and Mizutani, E. Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Prentice Hall, 1997.
[14] Kaur, A., and Kaur, K. Statistical comparison of modelling methods for software maintainability prediction. International Journal of Software Engineering and Knowledge Engineering 23, 6 (2013), 743–774.
[15] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural
Information Processing Systems (2017), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30, Curran Associates, Inc.
[16] Marinescu, R. An empirical study of the relationship between code smells and refactoring. Empirical Software Engineering 9, 4 (2004), 429–462.
[17] Molnar, A.-J. Collection of technical debt issues in freemind, jedit and tuxguitar open source software.
[18] Molnar, A.-J., and Motogna, S. Long-term evaluation of technical debt in open-source software. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (New York, NY, USA, 2020), ESEM ’20, Association for Computing Machinery.
[19] Molnar, A.-J., and Motogna, S. A study of maintainability in evolving open-source software. In Evaluation of Novel Approaches to Software Engineering (Cham, 2021), R. Ali, H. Kaindl, and L. A. Maciaszek, Eds., Springer International Publishing, pp. 261–282.
[20] Montgomery, D. C., Peck, E. A., and Vining, G. G. Introduction to linear regression analysis. John Wiley & Sons, 2012.
[21] NIST. The economic impacts of inadequate infrastructure for software testing. Technical Report NISTIR 6859, National Institute of Standards and Technology, 2002.
[22] Oman, P., and Hagemeister, J. Metrics for assessing a software system’s maintainability. In Proceedings Conference on Software Maintenance 1992 (Nov 1992), pp. 337–344.
[23] Pearl, J. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann (1988).
[24] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.
[25] Taud, H., and Mas, J. Multilayer Perceptron (MLP). Springer International Publishing, Cham, 2018, pp. 451–455.
[26] van Koten, C., and Gray, A. R. An application of bayesian network for predicting object-oriented software maintainability. Information and Software Technology 48, 1 (2006), 59–67.
[27] Wahler, M., Drofenik, U., and Snipes, W. Improving code maintainability: A case study on the impact of refactoring. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2016), pp. 493–501
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
When the article is accepted for publication, I, as the author and representative of the coauthors, hereby agree to transfer to Studia Universitatis Babes-Bolyai, Series Informatica, all rights, including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author specifically retain: the right to make further copies of all or part of the published article for my use in classroom teaching; the right to reuse all or part of this material in a review or in a textbook of which I am the author; the right to make copies of the published work for internal distribution within the institution that employs me.