Bug Severity Prediction using Keywords in Imbalanced Learning Environment
Автор: Jayalath Ekanayake
Журнал: International Journal of Information Technology and Computer Science @ijitcs
Статья в выпуске: 3 Vol. 13, 2021 года.
Бесплатный доступ
Reported bugs of software systems are classified into different severity levels before fixing them. The number of bug reports may not be equally distributed according to the severity levels of bugs. However, most of the severity prediction models developed in the literature assumed that the underlying data distribution is evenly distributed, which may not correct at all instances and hence, the aim of this study is to develop bug classification models from unevenly distributed datasets and tested them accordingly. To that end first, the topics or keywords of developer descriptions of bug reports are extracted using Rapid Keyword Extraction (RAKE) algorithm and then transferred them into numerical attributes, which combined with severity levels constructs datasets. These datasets are used to build classification models; Naïve Bayes, Logistic Regression, and Decision Tree Learner algorithms. The models’ prediction quality is measured using Area Under Recursive Operative Characteristics Curves (AUC) as the models learnt from more skewed environments. According to the results, the prediction quality of the Logistics Regression model is 0.65 AUC whereas the other two models recorded maximum 0.60 AUC. Though the datasets contain comparatively less number of instances from the high severity classes; Blocking and High, the Logistic Regression models predict the two classes with a decent AUC value of 0.65 AUC. Hence, this projects shows that the models can be trained from highly skewed datasets so that the models prediction quality is equally well over all the classes regardless of number of instances representing the class. Further, this project emphasizes that the models should be evaluated using the appropriate metrics when the models are trained from imbalance learning environments. Also, this work uncovers that the Logistic Regression model is also capable of classifying documents as Naïve Bayes, which is well known for this task.
Bug reports classification, bug severity level, topics modeling, candidate keywords, classification algorithms
Короткий адрес: https://sciup.org/15017762
IDR: 15017762 | DOI: 10.5815/ijitcs.2021.03.04
Текст научной статьи Bug Severity Prediction using Keywords in Imbalanced Learning Environment
Published Online June 2021 in MECS
Typically software systems released with defects as they become more and more complex. Usually, the users and the developers inform the software bugs to triagers through systems such as Bugzilla1, Jira2. Such bugs may be resolved in future revisions. Reporting bugs helps developer to fix the bugs in next release so that the users get an improved version of the software. However, the developer are under pressure if they receive too many bug reports from the users or other parties.
Usually the triagers receive significant number of bug reports in daily basis [1]. However, due to many reasons the triagers require long time to fix the reports [2]. According to Wang et al. [3] the users reported average of 1000 bug reports for software projects in daily basis. Further Wang et al. reported that the developers spend two and half days per week to read 1000 bug reports. Reading all these reports and starts fixing them according to the order they received, without categorizing them into different severity levels, then some high severity bugs may reside in the software causing negative implications on the project. Hence, categorizing the bugs based on the severity level is essential; so that the high severity bugs are fixed immediately. A bug can be assigned to one of four severity levels; Blocking, High, Low, Normal as in UNIX Kernels project. The impact level of Blocking bugs is the highest whereas Normal is the lowest. The Blocking bugs are given the highest priority when fixing them, however, Normal bugs may remain on software projects for a long time. Typically the bug reporters assign the severity level of the bug or he may leave it blank. Perhaps, the reporter’s decision may incorrect as his judgment on the bug severity may differ from the triager’s perception.
Manual bug classification process is not efficient, as bug triagers need to read the textual content of all bug reports
-
1 https://www.bugzilla.org
-
2 https://www.atlassian.com
-
2. Related Works
and then compare the new reports with existing reports. In some bug reports the description is not sufficient to figure out the nature of the bug and hence, developers want to spend even more time to classify them. To that end, bug classifications tools can be used to speed up the bug classification process and hence, high severity bugs can be fixed immediately. The automatic bug classification tools can be trained using machine learning algorithms. Typically the severity level assigned in resolved bug reports is confirmed to be correct and hence, the data extracted from such reports can be used to train the machine learning algorithms for classifying new bug reports. The severity is divided into four levels and the number of bug reports assigned into different severity levels may not be evenly distributed. Hence, training the classification models from such an imbalanced dataset can be biased towards the class containing the majority of instances. Consequently, the models mostly predict the instances in the majority class with decent accuracy whereas the instances belong to the other classes may not be correctly classified. In such a skewed learning environment the testing sample may also contain the same distribution as the training sample. However, many of the existing bug severity prediction models trained from history data are nether mention the distribution of the datasets used to train the models, nor acknowledged this issue [7,8, 9,10,11,12,13,14]. This implies that they assumed the underlying data distribution is balanced, which may not essentially be true in all real datasets such as I use in this project. However, reference [10] addressed this issue into a certain extent. They proposed a cost sensitive classification by defining two categories of classes; minority and majority based on the number of instances representing the two classes and the misclassification cost of minority class is higher that the majority class regardless of the significance of the classes. According to their definition the misclassification cost of the Blocking and High severity classes could be smaller than Normal and Low if the dataset contains more instances from Blocking and High classes. However, the bugs from these two classes are more important for the developers than the other two classes.
Hence, the objective of this project is to develop bug severity prediction models from imbalanced learning environments and test them properly using an appropriate metric. Many models developed in the literature [7,8, 9,10,11,12,13,14] used the accuracy--correctly classified instances vs. the total number of instances presented--to evaluate the prediction quality in skewed learning environments [4,5]. However, the accuracy would not provide fair judgment on the quality of models trained from unevenly distributed datasets like used in this project. Alternatively, Area Under Recursive Operative Characteristics Curves (AUC) can be used to evaluate such prediction models, as the AUC does not depend on the underlying data distribution [6]. Further, the prediction quality is evaluated in category wise in contest to the overall prediction quality calculated in many prediction models in the literature.
This project proposes to construct three prediction models; Naïve Bayes, Logistic Regression, and Decision Tree Learner algorithms to categorize bugs into one of four severity levels--Blocking, High, Low, and Normal--. The three models predict the class probability for a given instance. Also, the models are trained from skewed datasets as shown in Table 1 and hence, the prediction quality is measured using AUC. The proposed models address the questionable point in most of the models developed in the literature.
Software repository contains source code, bug reports, email achieves etc. Mining these repositories is popular among research community in recent years as it extracts hidden patterns, which may be non-trivial, previously not known and vital information for the developers.
There are models that predict bug severity using textual description in bug reports [7,8, 9,10,11,12,13,14].
Список литературы Bug Severity Prediction using Keywords in Imbalanced Learning Environment
- Xie T, Zhang L, Xiao X, Xiong YF, Hao D. Cooperative software testing and analysis: Advances and challenges. Journal of Computer Science and Technology. 2014 Jul 1;29(4):713-23.
- Xia X, Lo D, Wen M, Shihab E, Zhou B. An empirical study of bug report field reassignment. In2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) 2014 Feb 3 (pp. 174-183). IEEE.
- Wang J, Wang S, Cui Q, Wang Q. Local-based active classification of test report to assist crowdsourced testing. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering 2016 Aug 25 (pp. 190-201).
- Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd international conference on Machine learning 2006 Jun 25 (pp. 233-240).
- Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering. 2008 May 23;34(4):485-96.
- Provost F, Fawcett T. Robust classification for imprecise environments. Machine learning. 2001 Mar 1;42(3):203-31.
- Tian Y, Lo D, Sun C. Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In2012 19th Working Conference on Reverse Engineering 2012 Oct 15 (pp. 215-224). IEEE.
- Roy NK, Rossi B. Towards an improvement of bug severity classification. In2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications 2014 Aug 27 (pp. 269-276). IEEE.
- Tan Y, Xu S, Wang Z, Zhang T, Xu Z, Luo X. Bug severity prediction using question-and-answer pairs from Stack Overflow. Journal of Systems and Software. 2020 Mar 2:110567.
- Sabor KK, Hamdaqa M, Hamou-Lhadj A. Automatic prediction of the severity of bugs using stack traces and categorical features. Information and Software Technology. 2020 Jul 1;123:106205.
- Arokiam J, Bradbury JS. Automatically predicting bug severity early in the development process. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results 2020 Jun 27 (pp. 17-20).
- Kumari M, Singh UK, Sharma M. Entropy Based Machine Learning Models for Software Bug Severity Assessment in Cross Project Context. InInternational Conference on Computational Science and Its Applications 2020 Jul 1 (pp. 939-953). Springer, Cham.
- Kudjo PK, Chen J, Mensah S, Amankwah R, Kudjo C. The effect of Bellwether analysis on software vulnerability severity prediction models. Software Quality Journal. 2020 Jan 7:1-34.
- Kukkar A, Mohana R, Kumar Y. Does bug report summarization help in enhancing the accuracy of bug severity classification?. Procedia Computer Science. 2020 Jan 1;167:1345-53.
- Kanwal J, Maqbool O. Bug prioritization to facilitate bug report triage. Journal of Computer Science and Technology. 2012 Mar 1;27(2):397-412.
- Alenezi M, Banitaan S. Bug reports prioritization: Which features and classifier to use?. In2013 12th International Conference on Machine Learning and Applications 2013 Dec 4 (Vol. 2, pp. 112-116). IEEE.
- Tian Y, Lo D, Xia X, Sun C. Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering. 2015 Oct 1;20(5):1354-83.
- Kumari M, Singh VB. An improved classifier based on entropy and deep learning for bug priority prediction. In International Conference on Intelligent Systems Design and Applications 2018 Dec 6 (pp. 571-580). Springer, Cham.
- Waqar A. Software Bug Prioritization in Beta Testing Using Machine Learning Techniques. Journal of Computers for Society 2020;1(1):24-34.
- Cheng X, Liu N, Guo L, Xu Z, Zhang T. Blocking Bug Prediction Based on XGBoost with Enhanced Features. In2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC) 2020 Jul 13 (pp. 902-911). IEEE.
- Sharma M, Kumari M, Singh VB. Bug Priority Assessment in Cross-Project Context Using Entropy-Based Measure. InAdvances in Machine Learning and Computational Intelligence 2020 (pp. 113-128). Springer, Singapore.
- Ekanayake, J.B., 2021. Predicting Bug Priority Using Topic Modelling in Imbalanced Learning Environments. International Journal of Systems and Service-Oriented Engineering (IJSSOE), 11(1), pp.31-42.
- Rose S, Engel D, Cramer N, Cowley W. Automatic keyword extraction from individual documents. Text mining: applications and theory. 2010 Mar 26;1:1-20.
- Mihalcea R, Tarau P. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing 2004 Jul (pp. 404-411).
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 2009 Nov 16;11(1):10-8.
- Mai Farag Imam, Amal Elsayed Aboutabl, Ensaf H. Mohamed, "Automating Text Simplification Using Pictographs for People with Language Deficits", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.7, pp.26-34, 2019. DOI: 10.5815/ijitcs.2019.07.04.
- Pierre MOUKELI MBINDZOUKOU, Arsène Roland MOUKOUKOU, David NACCACHE, Nino TSKHOVREBASHVILI, "A Stochastic Model for Simple Document Processing", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.7, pp.43-53, 2019. DOI: 10.5815/ijitcs.2019.07.06.
- Ahmed Iqbal, Shabib Aftab, "Prediction of Defect Prone Software Modules using MLP based Ensemble Techniques", International Journal of Information Technology and Computer Science(IJITCS), Vol.12, No.3, pp.26-31, 2020. DOI: 10.5815/ijitcs.2020.03.04
- Ekanayake J, Tappolet J, Gall HC, Bernstein A. Time variance and defect prediction in software projects. Empirical Software Engineering. 2012 Aug;17(4):348-89.
- Ekanayake J, Tappolet J, Gall HC, Bernstein A. Tracking concept drift of software projects using defect prediction quality. In2009 6th IEEE International Working Conference on Mining Software Repositories 2009 May 16 (pp. 51-60). IEEE.