A feature selection based ensemble classification framework for software defect prediction
Автор: Ahmed Iqbal, Shabib Aftab, Israr Ullah, Muhammad Salman Bashir, Muhammad Anwaar Saeed
Журнал: International Journal of Modern Education and Computer Science @ijmecs
Статья в выпуске: 9 vol.11, 2019 года.
Бесплатный доступ
Software defect prediction is one of the emerging research areas of software engineering. The prediction of defects at early stage of development process can produce high quality software at lower cost. This research contributes by presenting a feature selection based ensemble classification framework which consists of four stages: 1) Dataset selection, 2) Feature Selection, 3) Classification, and 4) Results. The proposed framework is implemented from two dimensions, one with feature selection and second without feature selection. The performance is evaluated through various measures including: Precision, Recall, F-measure, Accuracy, MCC and ROC. 12 Cleaned publically available NASA datasets are used for experiments. The results of both the dimensions of proposed framework are compared with the other widely used classification techniques such as: “Naïve Bayes (NB), Multi-Layer Perceptron (MLP). Radial Basis Function (RBF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), kStar (K*), One Rule (OneR), PART, Decision Tree (DT), and Random Forest (RF)”. Results reflect that the proposed framework outperformed other classification techniques in some of the used datasets however class imbalance issue could not be fully resolved.
Ensemble Classifier, Hybrid Classifier, Random Forest, Software Defect Prediction, Feature Selection
Короткий адрес: https://sciup.org/15016880
IDR: 15016880 | DOI: 10.5815/ijmecs.2019.09.06
Текст научной статьи A feature selection based ensemble classification framework for software defect prediction
Published Online September 2019 in MECS DOI: 10.5815/ijmecs.2019.09.06
Today, the production of high quality software at lower cost is challenging due to the large size and high complexity of required systems [1,2], [23]. However this issue can be resolved if we can predict about the particular software modules in advance, where defects are more likely to occur in future [3], [10]. The process of predicting a defective module is known as software defect prediction in which we predict the future defects at the early stages of software development life cycle (before the testing). It is considered as one of the challenging tasks of quality assurance process. Identification of defective modules at the early stage is vital as the cost of correction increases at later stages of development life cycle. Software metrics extracted from historical software data is used to predict the defective modules [29,30,31,32]. Machine learning techniques have been proved as a promising way for effective and efficient software defect prediction. These techniques are categories as 1) supervised, 2) un-supervised, and 3) hybrid. The supervised technique needs a pre-classified (training data) in order to train the classifier. During training the rules are developed which are further used to classify the unseen data (test data). In unsupervised techniques no training data is needed as these techniques use particular algorithm to identify the classes and maintain. The hybrid approach integrates the both (supervised and un-supervised). This paper proposed a feature selection based ensemble classification framework for software defect prediction. The framework is implemented from two dimensions, one with the feature selection and second without the feature selection, so that the difference of results in both dimensions can be analyzed and discussed. Each dimension further used two techniques Bagging and Boosting with Random Forest. Performance evaluation is performed from various measures such as: Precision, Recall, F-measure, Accuracy, MCC and ROC. Clean version of 12 publically available NASA datasets are used in this research including: “CM1, JM1, KC1, KC3, MC1, MC2, MW1, PC1, PC2, PC3, PC4 and PC5”. The results of the proposed framework are also compared with other widely used supervised classification techniques such as: “Naïve Bayes (NB), Multi-Layer Perceptron (MLP). Radial Basis Function (RBF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), kStar (K*), One Rule (OneR), PART, Decision Tree (DT), and Random Forest (RF)”. According to results the proposed framework showed higher performance in some of the used datasets but the class imbalance problem is not fully resolved. The class imbalance issue in software defect datasets is one of the main reason of lower and biased performance of classifiers [22,23].
-
II. Related Work
Many researchers have used machine learning techniques to resolve the classification problems in various areas including: sentiment analysis
[11,12,13,14,15,16], network intrusion detection [17] “in press”[18],[19], rainfall prediction [20,21], and software defect prediction [10], [29] etc.. Some selected studies regarding the software defect predictions are discussed here briefly. In [10] the researchers compared the performance of various supervised machine learning techniques on software defect prediction and used 12 NASA datasets for experiments. The authors have highlighted that Accuracy and ROC did not show any reaction on class imbalance issue however Precision, Recall, F-Measure and MCC reacted on this issue with a symbol of “?” in results. In [24], the researchers used six classification techniques for software defect prediction and used the data of 27 academic projects for experiment. The used techniques are: Discriminant Analysis, Principal Component Analysis (PCA), Logistic Regression (LR), Logical Classification, Holographic Networks, and Layered Neural Networks model. Back-propagation learning technique was used to train ANN. Performance evaluation was performed by using following measures: Verification Cost, Predictive
Validity, Achieved Quality and Misclassification Rate. The results reflected that, no classification technique performed better on software defect prediction in the experiment. In [25] the researchers predicted the software defects by using SVM and compared the performance with other widely used prediction techniques including: Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Trees, Multilayer Perceptron (MLP), Bayesian Belief Networks (BBN), Radial Basis Function (RBF), Random Forest (RF), and Naïve Bayes, (NB). For experiments, NASA datasets are used including: PC1, CM1, KC1 and KC3. According to results SVM outperformed some of the other classification techniques. In [26] the researchers explored and discussed the significance of particular software metrics for the prediction of software defects. They identified the significant software metrics with the help of ANN after training with historical data. After that the extracted and shortlisted metrics were used to predict the software defects through another ANN model. The performance of the proposed technique was compared with Gaussian kernel SVM. JM1 dataset from NASA MDP repository was used for experiment. According to results the SVM performed better than ANN in binary defect classification. Researchers in [27] proposed a technique for software defect prediction which includes a novel Artificial Bee Colony (ABC) algorithm with Artificial Neural Network in order to find the optimal weights. For experiment, five publically available datasets were used from NASA MDP repository and the results reflected the higher results of proposed technique as compared to other classification techniques. In [28], the researchers introduced an approach which consists of Hybrid Genetic algorithm and Deep Neural Network. Hybrid Genetic algorithm is used for the selection and optimization of features whereas Deep Neural Network is used for classification by focusing on the selected features. The experiments were carried out on the PROMISE datasets and the results showed the higher performance of proposed approach as compared to other defect prediction techniques.
-
III. MAterials and Methods
This research proposes a feature selection based ensemble classification framework to predict the software defects.

Fig.1. Proposed Classification Framework.
The proposed framework (Fig. 1) consists of four stages: 1) Dataset selection, 2) Feature Selection, 3) Classification, and 4) Results. The framework is implemented in two dimensions, in first, the feature selection stage is skipped and datasets are directly given to the ensemble classifiers however in second dimension the datasets gone through the feature selection stage. The performance of both the dimensions of proposed framework is compared with other widely used classifiers such as: “Naïve Bayes (NB), Multi-Layer Perceptron (MLP), Radial Basis Function (RBF), Support Vector Machine (SVM), K Nearest Neighbor (kNN), kStar (K*), One Rule (OneR), PART, Decision Tree (DT), and Random Forest (RF)”. All the experiments are performed in WEKA [5], which is the widely used data mining tools. It is developed in Java language at the University of Waikato, New Zealand. It is widely accepted due to its portability, General Public License and ease of use.
Dataset selection is the first stage of proposed framework. Twelve publically available cleaned NASA datasets are used in this research for experiment. The datasets include: “CM1, JM1, KC1, KC3, MC1, MC2, MW1, PC1, PC2, PC3, PC4 and PC5 (Table 2)”. Each dataset belongs to a particular NASA’s software system, and consists of various quality metrics in the form of attributes along with known output class. The output class is also known as target class and is predicted on the basis of other available attributes. The target/output class is known as dependent attribute whereas other attributes which are used to predict the dependent attribute are known as independent attributes. The datasets used in this research included dependent attribute having values either “Y” or “N”. “Y” reflects that the particular instance (module) is defective and “N” means it is nondefective. The researchers in [4] provided two versions of clean datasets: DS’ (“which included duplicate and inconsistent instances”) and D’’ (“which do not include duplicate and inconsistent instances”). Table 1 reflects the cleaning criteria implemented by [4]. We have used D’’ (Table 2) version in this research which is taken from [6]. These cleaned datasets are already used and discussed by [7,8,9,10].
Table 1. Cleaning Criteria [4]
Criterion |
Data Quality Category |
Explanation |
1. |
Identical cases |
“Instances that have identical values for all metrics including class label”. |
2. |
Inconsistent cases |
“Instances that satisfy all conditions of Case 1, but where class labels differ‘. |
3. |
Cases with missing values |
“Instances that contain one or more missing observations”. |
4. |
Cases with conflicting feature values |
“Instances that have 2 or more metric values that violate some referential integrity constraint. For example, LOC TOTAL is less than Commented LOC. However, Commented LOC is a subset of LOC TOTAL”. |
5. |
Cases with implausible values |
“Instances that violate some integrity constraint. For example, value of LOC=1.1”. |
Table 2. NASA Cleaned Datasets D’’ [4], [7]
Dataset |
Attributes |
Modules |
Defective |
NonDefective |
Defective (%) |
CM1 |
38 |
327 |
42 |
285 |
12.8 |
JM1 |
22 |
7,720 |
1,612 |
6,108 |
20.8 |
KC1 |
22 |
1,162 |
294 |
868 |
25.3 |
KC3 |
40 |
194 |
36 |
158 |
18.5 |
MC1 |
39 |
1952 |
36 |
1916 |
1.8 |
MC2 |
40 |
124 |
44 |
80 |
35.4 |
MW1 |
38 |
250 |
25 |
225 |
10 |
PC1 |
38 |
679 |
55 |
624 |
8.1 |
PC2 |
37 |
722 |
16 |
706 |
2.2 |
PC3 |
38 |
1,053 |
130 |
923 |
12.3 |
PC4 |
38 |
1,270 |
176 |
1094 |
13.8 |
PC5 |
39 |
1694 |
458 |
1236 |
27.0 |
Feature selection is the second and the most significant stage of proposed classification framework. This stage selects the optimum set of features for effective classification results. Many researchers have reported that most of the datasets only have few of the independent features which can predict the target class effectively whereas remaining features do not participate well and even can reduce the performance of classifier if not removed. We have used Chi-Square as attribute evaluator along with Ranker search method as feature selection technique.
Third stage deals with the classification with ensemble classifiers. Besides the feature selection, ensemble learning techniques have also been reported as an efficient way to improve the classification results. Bagging and Boosting are the two widely used ensemble techniques provided by Weka, which are also known as meta-learners. These techniques work by taking the base learner as argument and create a new learning algorithm by manipulating the training data. We have used Bagging and Boosting along with Random Forest as base classifier in the proposed framework.
Finally the fourth (result) stage reflects the classified modules along with the accuracy of proposed framework. The results are analyzed and discussed in detail in the next section.
-
IV. Results and Discussion
This section reflects the performance of proposed framework. The performance evaluation is performed in terms of various measures generated from confusion matrix (Fig. 2).
Actual Values Defective (Y) Non-defective (N) Defective (Y) 5 Non-defective (N) ш
TP
FP
FN
TN
Fig.2. Confusion Matrix. A confusion matrix consists of the following parameters: True Positive (TP): “Instances which are actually positive and also classified as positive”. False Positive (FP): “Instances which are actually negative but classified as positive”. False Negative (FN): “Instances which are actually positive but classified as negative”. True Negative (TN): “Instances which are actually negative and also classified as negative”. The performance of both the dimensions of proposed framework is evaluated through following measures: Precision, Recall, F-measure, Accuracy, MCC and ROC [22]. These measures are calculated from the parameters of confusion matrix as shown below. TP
Precision
=---------
(TP
+
FP
)
Re
call
=
TP
(TP
+
FN
)
Precision * Recall * 2 F-measure =------------------- (Precision + Recall)
Accuracy
=
TP
+
TN
TP
+
TN
+
FP
+
FN
AUC
=
1
+
TPr
-
FPr
MCC
=
________
TN
*
TP
-
FN
*
FP
________ 4
(
FP
+
TP
)(
FN
+
TP
)(
TN
+
FP )(TN
+
FN
)
The proposed framework classified the datasets in two dimensions 1) with feature selection and 2) without feature selection. In each dimension the Random Forest classifier is used with Bagging and Boosting techniques so there are total of four techniques in the proposed framework 1) Bagging-RF, 2) Boosting-RF, Feature Selection-Bagging-RF, 4) Feature-Selection-Boosting-RF. Each of the table which reflects the results also shows the score of other classification techniques such as: “Naïve Bayes (NB), Multi-Layer Perceptron (MLP). Radial Basis Function (RBF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), kStar (K*), One Rule (OneR), PART, Decision Tree (DT), and Random Forest (RF)”. These results are taken from a published paper [10] in order to compare the performance of proposed framework. The paper [10] have used the same datasets (D’’) for experiments. The results of Precision, Recall and F-Measure of each dataset for each class (Y and N) are reflected in the tables from Table 3 to Table 14. Highest scores in each class are highlighted in bold for easy identification. Table 3. CM1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.1670
0.2220
0.1900
N
0.9190
0.8880
0.9030
MLP
Y
0.0000
0.0000
0.0000
N
0.9040
0.9550
0.9290
RBF
Y
?
0.0000
?
N
0.9080
1.0000
0.9520
SVM
Y
?
0.0000
?
N
0.9080
1.0000
0.9520
kNN
Y
0.0670
0.1110
0.0830
N
0.9040
0.8430
0.8720
kStar
Y
0.0670
0.1110
0.0830
N
0.9040
0.8430
0.8720
OneR
Y
0.0000
0.0000
0.0000
N
0.9030
0.9440
0.9230
PART
Y
?
0.0000
?
N
0.9080
1.0000
0.9520
DT
Y
0.1180
0.2220
0.1540
N
0.9140
0.8310
0.8710
RF
Y
0.0000
0.0000
0.0000
N
0.9070
0.9890
0.9460
Boost-RF
Y
0.0000
0.0000
0.0000
N
0.9070
0.9890
0.9460
Bag-RF
Y
0.0000
0.0000
0.0000
N
0.9070
0.9890
0.9460
Boost-RF-FS
Y
0.0000
0.0000
0.0000
N
0.9070
0.9890
0.9460
Bag-RF-FS
Y
0.0000
0.0000
0.0000
N
0.9070
0.9890
0.9460
Results of CM1 datasets are given in Table 3. The table reflects that, in Precision, NB performed better in both the classes (Y and N). In Recall, NB and DT both performed better in Y class whereas RBF, SVM and PART showed better performance in N class, and finally in F-measure, NB showed better performance in Y class whereas RBF, SVM and PART performed better in N class. Table 4. JM1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.5370
0.2260
0.3180
N
0.8230
0.9490
0.8820
MLP
Y
0.7650
0.0810
0.1460
N
0.8040
0.9930
0.8890
RBF
Y
0.6940
0.1040
0.1810
N
0.8070
0.9880
0.8890
SVM
Y
?
0.0000
?
N
0.7920
1.0000
0.8840
kNN
Y
0.3630
0.3340
0.3480
N
0.8290
0.8460
0.8370
kStar
Y
0.4030
0.3170
0.3550
N
0.8300
0.8760
0.8530
OneR
Y
0.3780
0.1510
0.2160
N
0.8070
0.9350
0.8660
PART
Y
0.8180
0.0190
0.0370
N
0.7950
0.9990
0.8850
DT
Y
0.4960
0.2680
0.3480
N
0.8280
0.9290
0.8760
RF
Y
0.5720
0.1890
0.2840
N
0.8190
0.9630
0.8850
Boost-RF
Y
0.6010
0.1970
0.2970
N
0.8210
0.9660
0.8870
Bag-RF
Y
0.6190
0.1780
0.2770
N
0.8180
0.9710
0.8880
Boost-RF-FS
Y
0.6010
0.1970
0.2970
N
0.8210
0.9660
0.8870
Bag-RF-FS
Y
0.6190
0.1780
0.2770
N
0.8180
0.9710
0.8880
Results of JM1 datasets are reflected in Table 4. In precision, PART performed better in Y class whereas kStar performed better in N class. In Recall, kNN performed better in Y class and SVM performed better in N class. In F-measure, kStar outperformed in Y class whereas MLP and RBF outperformed in N class. Table 5. KC1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.4920
0.3370
0.4000
N
0.7950
0.8810
0.8360
MLP
Y
0.6470
0.2470
0.3580
N
0.7870
0.9540
0.8630
RBF
Y
0.7780
0.2360
0.3620
N
0.7890
0.9770
0.8730
SVM
Y
0.8000
0.0450
0.0850
N
0.7530
0.9960
0.8580
kNN
Y
0.3980
0.3930
0.3950
N
0.7930
0.7960
0.7950
kStar
Y
0.4490
0.3930
0.4190
N
0.8010
0.8350
0.8170
OneR
Y
0.4440
0.1800
0.2560
N
0.7670
0.9230
0.8380
PART
Y
0.6670
0.1570
0.2550
N
0.7710
0.9730
0.8610
DT
Y
0.5330
0.3600
0.4300
N
0.8030
0.8920
0.8450
RF
Y
0.6150
0.3600
0.4540
N
0.8080
0.9230
0.8620
Boost-RF
Y
0.5770
0.3370
0.4260
N
0.8010
0.9150
0.8550
Bag-RF
Y
0.6440
0.3260
0.4330
N
0.8030
0.9380
0.8650
Boost-RF-FS
Y
0.6350
0.3710
0.4680
N
0.8110
0.9270
0.8650
Bag-RF-FS
Y
0.6520
0.3370
0.4440
N
0.8050
0.9380
0.8670
Results of KC1 datasets are given in Table 5. It can be seen that in Precision, SVM outperformed in Y Class whereas RF showed better results in N Class. In Recall, kNN and kStar performed better in Y class whereas SVM showed better performance in N class, and finally, in F-measure, Boost-RF-FS performed better in Y and RBF outperform in N class. Table 6. KC3 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.4440
0.4000
0.4210
N
0.8780
0.8960
0.8870
MLP
Y
0.5000
0.3000
0.3750
N
0.8650
0.9380
0.9000
RBF
Y
0.0000
0.0000
0.0000
N
0.8180
0.9380
0.8740
SVM
Y
?
0.0000
?
N
0.8280
1.0000
0.9060
kNN
Y
0.3330
0.4000
0.3640
N
0.8700
0.8330
0.8510
kStar
Y
0.3000
0.3000
0.3000
N
0.8540
0.8540
0.8540
OneR
Y
0.5000
0.3000
0.3750
N
0.8650
0.9380
0.9000
PART
Y
0.2500
0.1000
0.1430
N
0.8330
0.9380
0.8820
DT
Y
0.3000
0.3000
0.3000
N
0.8540
0.8540
0.8540
RF
Y
0.2860
0.2000
0.2350
N
0.8430
0.8960
0.8690
Boost-RF
Y
0.3330
0.2000
0.2500
N
0.8460
0.9170
0.8800
Bag-RF
Y
0.4000
0.2000
0.2670
N
0.8490
0.9380
0.8910
Boost-RF-FS
Y
0.4170
0.5000
0.4550
N
0.8910
0.8540
0.8720
Bag-RF-FS
Y
0.2000
0.1000
0.1330
N
0.8300
0.9170
0.8710
Results of KC3 dataset is reflected in Table 6. It is reflected that in Precision, MLP and OneR showed highest performance in Y class whereas Boost-RF-FS. In Recall, Boost-RF-FS performed better in Y class and in N class, SVM outperformed the others. In F-measure, Boost-RF-FS performed better in Y class whereas SVM performed better in N class. Table 7. MC1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.1560
0.3570
0.2170
N
0.9840
0.9530
0.9680
MLP
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
RBF
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
SVM
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
kNN
Y
0.4000
0.2860
0.3330
N
0.9830
0.9900
0.9860
kStar
Y
0.2500
0.1430
0.1820
N
0.9790
0.9900
0.9840
OneR
Y
0.3330
0.1430
0.2000
N
0.9790
0.9930
0.9860
PART
Y
0.4000
0.2860
0.3330
N
0.9830
0.9900
0.9860
DT
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
RF
Y
0.0000
0.0000
0.0000
N
0.9760
0.9980
0.9870
Boost-RF
Y
0.3330
0.0710
0.1180
N
0.9780
0.9970
0.9870
Bag-RF
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
Boost-RF-FS
Y
0.5000
0.0710
0.1250
N
0.9780
0.9980
0.9880
Bag-RF-FS
Y
?
0.0000
?
N
0.9760
1.0000
0.9880
Results of MC1 dataset are reflected in Table 7. In Precision, Boost-RF-FS showed better performance in Y class whereas NB performed better in N class. In Recall, NB performed better in Y class whereas MLP, RBF, SVM, DT, Bag-RF and Bag-RF-FS performed better in N class. In F-Measure, kNN and PART performed better in Y class whereas MLP, RBF, SVM, DT, Bag-RF, Boost-RF-FS, and Bag-RF-FS performed better in N class. Table 8. MC2 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.8330
0.3850
0.5260
N
0.7420
0.9580
0.8360
MLP
Y
0.5000
0.5380
0.5190
N
0.7390
0.7080
0.7230
RBF
Y
0.8000
0.3080
0.4400
N
0.7190
0.9580
0.8210
SVM
Y
0.4000
0.1540
0.2220
N
0.6560
0.8750
0.7500
kNN
Y
0.6670
0.4620
0.5450
N
0.7500
0.8750
0.8080
kStar
Y
0.4000
0.3080
0.3480
N
0.6670
0.7500
0.7060
OneR
Y
0.5000
0.2310
0.3160
N
0.6770
0.8750
0.7640
PART
Y
0.7270
0.6150
0.6670
N
0.8080
0.8750
0.8400
DT
Y
0.5000
0.3850
0.4350
N
0.7040
0.7920
0.7450
RF
Y
0.5000
0.4620
0.4800
N
0.7200
0.7500
0.7350
Boost-RF
Y
0.4550
0.3850
0.4170
N
0.6920
0.7500
0.7200
Bag-RF
Y
0.5000
0.4620
0.4800
N
0.7200
0.7500
0.7350
Boost-RF-FS
Y
0.5000
0.4620
0.4800
N
0.7200
0.7500
0.7350
Bag-RF-FS
Y
0.5380
0.5380
0.5380
N
0.7500
0.7500
0.7500
Table 8 reflects the results of MC2 dataset. It can be observed that in precision, NB performed better in Y class whereas PART performed better in N class. In Recall, PART performed better in Y class and NB and RBF performed better in N class. and finally, in F-Measure, PART showed highest results in both classes.
N
0.9610
0.8920
0.9250
OneR
Y
0.3330
0.1000
0.1540
N
0.9550
0.9900
0.9720
PART
Y
0.3750
0.6000
0.4620
N
0.9790
0.9480
0.9630
DT
Y
0.3890
0.7000
0.5000
N
0.9840
0.9430
0.9630
RF
Y
0.7500
0.3000
0.4290
N
0.9650
0.9950
0.9800
Boost-RF
Y
0.6000
0.3000
0.4000
N
0.9650
0.9900
0.9770
Bag-RF
Y
1.0000
0.2000
0.3330
N
0.9600
1.0000
0.9800
Boost-RF-FS
Y
0.6000
0.3000
0.4000
N
0.9650
0.9900
0.9770
Bag-RF-FS
Y
1.0000
0.2000
0.3330
N
0.9600
1.0000
0.9800
Table 9. MW1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.3330
0.6250
0.4350
N
0.9500
0.8510
0.8980
MLP
Y
0.5450
0.7500
0.6320
N
0.9690
0.9250
0.9470
RBF
Y
?
0.0000
?
N
0.8930
1.0000
0.9440
SVM
Y
?
0.0000
?
N
0.8930
1.0000
0.9440
kNN
Y
0.4000
0.5000
0.4440
N
0.9380
0.9100
0.9240
kStar
Y
0.1430
0.1250
0.1330
N
0.8970
0.9100
0.9040
OneR
Y
0.5000
0.1250
0.2000
N
0.9040
0.9850
0.9430
PART
Y
0.2500
0.1250
0.1670
N
0.9010
0.9550
0.9280
DT
Y
0.2500
0.1250
0.1670
N
0.9010
0.9550
0.9280
RF
Y
0.3330
0.1250
0.1820
N
0.9030
0.9700
0.9350
Boost-RF
Y
0.5000
0.2500
0.3330
N
0.9150
0.9700
0.9420
Bag-RF
Y
0.5000
0.1250
0.2000
N
0.9040
0.9850
0.9430
Boost-RF-FS
Y
0.5000
0.2500
0.3330
N
0.9150
0.9700
0.9420
Bag-RF-FS
Y
0.5000
0.1250
0.2000
N
0.9040
0.9850
0.9430
Table 9 reflects the result of MW1 dataset. It can be seen that in Precision, MLP performed better in both the classes. In Recall, MLP performed better in Y class whereas RBF and SVM performed better in in N class. In F-measure, MLP performed better in both the classes. Table 10. PC1 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.2800
0.7000
0.4000
N
0.9830
0.9070
0.9440
MLP
Y
1.0000
0.3000
0.4620
N
0.9650
1.0000
0.9820
RBF
Y
0.3330
0.1000
0.1540
N
0.9550
0.9900
0.9720
SVM
Y
?
0.0000
?
N
0.9510
1.0000
0.9750
kNN
Y
0.2730
0.3000
0.2860
N
0.9640
0.9590
0.9610
kStar
Y
0.1250
0.3000
0.1760
Results of PC1 datasets are shown in Table 10. It can be seen that in Precision, MLP, Bag-RF, Boost-RF-FS, and Bag-RF-FS performed better in Y class whereas DT performed better in N class. In Recall, NB and DT performed better in Y class whereas MLP, SVM, Bag-RF, Boost-RF-FS, and Bag-RF-FS both performed better in N class. In F-measure, DT performed better in Y class whereas MLP performed better in N class. Table 11. PC2 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.0000
0.0000
0.0000
N
0.9760
0.9670
0.9720
MLP
Y
0.0000
0.0000
0.0000
N
0.9770
0.9910
0.9840
RBF
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
SVM
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
kNN
Y
0.0000
0.0000
0.0000
N
0.9770
0.9910
0.9840
kStar
Y
0.1430
0.2000
0.1670
N
0.9810
0.9720
0.9760
OneR
Y
0.0000
0.0000
0.0000
N
0.9770
0.9950
0.9860
PART
Y
0.0000
0.0000
0.0000
N
0.9770
0.9910
0.9840
DT
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
RF
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
Boost-RF
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
Bag-RF
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
Boost-RF-FS
Y
0.0000
0.0000
0.0000
N
0.9770
0.9950
0.9860
Bag-RF-FS
Y
?
0.0000
?
N
0.9770
1.0000
0.9880
Results of PC2 datasets are shown in Table 11. According to results in Precision, kStar performed well in both the classes. In Recall, kStar performed well in Y class whereas RBF, SVM, DT, RF, Boost-RF, Bag-RF, and Bag-RF-FS performed well in N class. In F-measure, kStar performed well in Y class however RBF, SVM, DT, RF, Boost-RF, Bag-RF, and Bag-RF-FS performed well in N class. Table 12. PC3 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.1500
0.9070
0.2570
N
0.9290
0.1900
0.3160
MLP
Y
0.3460
0.2090
0.2610
N
0.8830
0.9380
0.9090
RBF
Y
?
0.0000
?
N
0.8640
1.0000
0.9270
SVM
Y
?
0.0000
?
N
0.8640
1.0000
0.9270
kNN
Y
0.4800
0.2790
0.3530
N
0.8930
0.9520
0.9220
kStar
Y
0.3130
0.2330
0.2670
N
0.8840
0.9190
0.9010
OneR
Y
0.6000
0.1400
0.2260
N
0.8790
0.9850
0.9290
PART
Y
?
0.0000
?
N
0.8640
1.0000
0.9270
DT
Y
0.5000
0.2790
0.3580
N
0.8940
0.9560
0.9240
RF
Y
0.6000
0.1400
0.2260
N
0.8790
0.9850
0.9290
Boost-RF
Y
0.4440
0.0930
0.1540
N
0.8730
0.9820
0.9240
Bag-RF
Y
0.5710
0.0930
0.1600
N
0.8740
0.9890
0.9280
Boost-RF-FS
Y
0.6670
0.1400
0.2310
N
0.8790
0.9890
0.9310
Bag-RF-FS
Y
0.8000
0.0930
0.1670
N
0.8750
0.9960
0.9320
Results of PC3 dataset is reflected in Table 12. It can be seen that in Precision, Bag-RF-FS performed better in Y class however NB performed better in N class. In Recall, NB performed better in Y class whereas RBF, SVM and PART performed better in N class. In F-measure, DT performed better in Y class whereas Bag-RF-FS performed better in N class. Table 13. PC4 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.4860
0.3460
0.4040
N
0.9010
0.9420
0.9210
MLP
Y
0.6760
0.4810
0.5620
N
0.9220
0.9640
0.9420
RBF
Y
0.6670
0.1540
0.2500
N
0.8810
0.9880
0.9310
SVM
Y
0.8180
0.1730
0.2860
N
0.8840
0.9940
0.9360
kNN
Y
0.4770
0.4040
0.4380
N
0.9080
0.9300
0.9190
kStar
Y
0.3330
0.3270
0.3300
N
0.8940
0.8970
0.8950
OneR
Y
0.6500
0.2500
0.3610
N
0.8920
0.9790
0.9330
PART
Y
0.4640
0.5000
0.4810
N
0.9200
0.9090
0.9140
DT
Y
0.5150
0.6730
0.5830
N
0.9460
0.9000
0.9220
RF
Y
0.7780
0.4040
0.5320
N
0.9120
0.9820
0.9460
Boost-RF
Y
0.7880
0.5000
0.6120
N
0.9250
0.9790
0.9510
Bag-RF
Y
0.8570
0.3460
0.4930
N
0.9060
0.9910
0.9460
Boost-RF-FS
Y
0.8330
0.4810
0.6100
N
0.9230
0.9850
0.9530
Bag-RF-FS
Y
0.9050
0.3650
0.5210
N
0.9080
0.9940
0.9490
Results of PC4 datasets are shown in Table 13. It can be seen that in Precision, Bag-RF-FS performed better in Y class whereas DT performed better in N class. In Recall, DT performed better in Y class whereas SVM and Bag-RF-FS performed better in N class, and finally, In F-measure, Boosting-RF performed better in Y class whereas Boosting-RF-FS performed better in N class. Table 14. PC5 Data Results
Classifier
Class
Precision
Recall
F-Measure
NB
Y
0.6760
0.1680
0.2690
N
0.7590
0.9700
0.8520
MLP
Y
0.5600
0.2040
0.2990
N
0.7620
0.9410
0.8420
RBF
Y
0.7600
0.1390
0.2350
N
0.7560
0.9840
0.8550
SVM
Y
0.8750
0.0510
0.0970
N
0.7400
0.9970
0.8500
kNN
Y
0.5000
0.4960
0.4980
N
0.8150
0.8170
0.8160
kStar
Y
0.4390
0.4230
0.4310
N
0.7900
0.8010
0.7950
OneR
Y
0.4550
0.3360
0.3870
N
0.7760
0.8520
0.8120
PART
Y
0.6460
0.2260
0.3350
N
0.7700
0.9540
0.8520
DT
Y
0.5370
0.5260
0.5310
N
0.8260
0.8330
0.8300
RF
Y
0.5880
0.3650
0.4500
N
0.7940
0.9060
0.8460
Boost-RF
Y
0.5880
0.3430
0.4330
N
0.7900
0.9110
0.8460
Bag-RF
Y
0.6430
0.3280
0.4350
N
0.7900
0.9330
0.8550
Boost-RF-FS
Y
0.5880
0.3430
0.4330
N
0.7900
0.9110
0.8460
Bag-RF-FS
Y
0.6430
0.3280
0.4350
N
0.7900
0.9330
0.8550
Results of PC5 dataset are presented in Table 14. It can be seen that in Precision, SVM performed better in Y class whereas DT performed better in N class. In Recall, DT performed better in Y class whereas SVM performed better in N Class, and finally, in F-Measure, DT performed better in Y class whereas RBF, Bagging-RF and Bagging-RF-FS outperform in N class. Table 15. Accuracy Results
Dataset
NB
MLP
RBF
SVM
kNN
kStar
OneR
PART
DT
RF
Boost-
RF
Bag-RF
Boost-
RF-FS
Bag-RF-FS
CM1
82.6531
86.7347
90.8163
90.8163
77.5510
77.5510
85.7143
90.8163
77.5510
89.7959
89.7959
89.7959
89.7959
89.7959
JM1
79.8359
80.3541
80.3972
79.1883
73.9637
75.9931
77.1589
79.4905
79.1019
80.1813
80.5699
80.6131
80.5699
80.6131
KC1
74.2120
77.3639
78.7966
75.3582
69.3410
72.2063
73.3524
76.5043
75.6447
77.9370
76.7900
78.2235
78.5100
78.5100
KC3
81.0345
82.7586
77.5862
82.7586
75.8621
75.8621
82.7586
79.3103
75.8621
77.5862
79.3103
81.0345
79.3103
77.5862
MC1
93.8567
97.6109
97.6109
97.6109
97.2696
96.9283
97.2696
97.2696
97.6109
97.4403
97.4403
97.6109
97.6109
97.6109
MC2
75.6757
64.8649
72.9730
62.1622
72.9730
59.4595
64.8649
78.3784
64.8649
64.8649
62.1622
64.8649
64.8649
67.5676
MW1
82.6667
90.6667
89.3333
89.3333
86.6667
82.6667
89.3333
86.6667
86.6667
88.0000
89.3333
89.3333
89.3333
89.3333
PC1
89.7059
96.5686
94.6078
95.0980
92.6471
86.2745
94.6078
93.1373
93.1373
96.0784
95.5882
96.0784
96.0784
96.0784
PC2
94.4700
96.7742
97.6959
97.6959
96.7742
95.3917
97.2350
96.7742
97.6959
97.6959
97.6959
97.6959
97.2350
97.6959
PC3
28.7975
83.8608
86.3924
86.3924
86.0759
82.5949
87.0253
86.3924
86.3924
87.0253
86.0759
86.7089
87.3418
87.3418
PC4
86.0892
89.7638
87.4016
88.189
85.8268
81.8898
87.9265
85.3018
86.8766
90.2887
91.3386
90.2887
91.6010
90.8136
PC5
75.3937
74.2126
75.5906
74.2126
73.0315
69.8819
71.2598
75.7874
75.0000
75.9843
75.7874
76.9685
75.7874
76.9685
Table 16. ROC Area Results
Dataset
NB
MLP
RBF
SVM
kNN
kStar
OneR
PART
DT
RF
Boost-RF
Bag-RF
Boost-RF-FS
Bag-RF-FS
CM1
0.7030
0.6340
0.7020
0.5000
0.4770
0.5380
0.4720
0.6100
0.3780
0.7610
0.7650
0.7370
0.6600
0.6830
JM1
0.6630
0.7020
0.7130
0.5000
0.5910
0.5720
0.5430
0.7140
0.6710
0.7380
0.7360
0.7460
0.7360
0.7460
KC1
0.6940
0.7360
0.7130
0.5210
0.5950
0.6510
0.5510
0.6360
0.6060
0.7510
0.7510
0.7570
0.7510
0.7500
KC3
0.7690
0.7330
0.7350
0.5000
0.617
0.5280
0.6190
0.7880
0.5700
0.8070
0.7850
0.8150
0.8340
0.8670
MC1
0.8260
0.8050
0.7810
0.5000
0.6380
0.6310
0.5680
0.6840
0.5000
0.8640
0.8350
0.8470
0.8270
0.8830
MC2
0.7950
0.7530
0.7660
0.5140
0.6680
0.5100
0.5530
0.7240
0.6150
0.6460
0.6650
0.6700
0.6460
0.6570
MW1
0.7910
0.8430
0.8080
0.5000
0.7050
0.5430
0.5550
0.3140
0.3140
0.7660
0.7260
0.7420
0.7260
0.7610
PC1
0.8790
0.7790
0.8750
0.5000
0.6290
0.6730
0.5450
0.8890
0.7180
0.8580
0.8960
0.9210
0.9240
0.9100
PC2
0.7510
0.7460
0.7240
0.5000
0.4950
0.7910
0.4980
0.6230
0.5790
0.7310
0.6560
0.7740
0.4890
0.5630
PC3
0.7730
0.7960
0.7950
0.5000
0.6160
0.7490
0.5620
0.7900
0.6640
0.8550
0.8360
0.8390
0.8500
0.8410
PC4
0.8070
0.8980
0.8620
0.5830
0.6670
0.7340
0.6140
0.7760
0.8340
0.9450
0.9450
0.9530
0.9520
0.9550
PC5
0.7250
0.7510
0.7320
0.5240
0.6570
0.6290
0.5940
0.7390
0.7030
0.8050
0.7990
0.8050
0.7990
0.8050
Table 17. MCC Results
Dataset
NB
MLP
RBF
SVM
kNN
kStar
OneR
PART
DT
RF
Boost-RF
Bag-RF
Boost-RF-FS
Bag-RF-FS
CM1
0.0970
-0.0660
?
?
-0.0370
-0.037
-0.074
?
0.0410
-0.032
-0.032
-0.032
-0.032
-0.032
JM1
0.2510
0.2060
0.2150
?
0.1860
0.2120
0.1260
0.1040
0.2520
0.2440
0.2620
0.2560
0.2620
0.2560
KC1
0.2500
0.2960
0.3470
0.1510
0.1900
0.2380
0.1470
0.2390
0.2910
0.3460
0.3090
0.3440
0.3640
0.3550
KC3
0.3090
0.2950
-0.1070
?
0.2180
0.1540
0.2950
0.0560
0.1540
0.1110
0.1450
0.1850
0.3300
0.0220
MC1
0.2080
?
?
?
0.3250
0.1740
0.2060
0.3250
?
-0.006
0.1450
?
0.1820
?
MC2
0.4440
0.2430
0.3710
0.0400
0.3740
0.0620
0.1370
0.5120
0.1890
0.2160
0.1410
0.2160
0.2160
0.2880
MW1
0.3670
0.5890
?
?
0.3730
0.0380
0.2110
0.1100
0.1100
0.1500
0.3020
0.2110
0.3020
0.2110
PC1
0.4000
0.5380
0.1610
?
0.2470
0.1280
0.1610
0.4400
0.4900
0.4590
0.4050
0.4380
0.4380
0.4380
PC2
-0.0280
-0.0150
?
?
-0.0150
0.1460
-0.010
0.0150
?
?
?
?
-0.010
?
PC3
0.0880
0.1830
?
?
0.2940
0.1730
0.2450
?
0.3040
0.2450
0.1540
0.1910
0.2650
0.2460
PC4
0.3340
0.5150
0.2790
0.3420
0.3590
0.2250
0.3520
0.3960
0.5140
0.5160
0.5840
0.5070
0.5930
0.5410
PC5
0.2450
0.2160
0.2510
0.1730
0.3140
0.2270
0.2090
0.2740
0.3610
0.3220
0.3100
0.3360
0.3100
0.3360
We have considered F-measure for analysis from Table 3 to Table 14 with ‘Yes’ class. F measure is selected as it provides the average of Precision and Recall and ‘Yes’ class predicts the probability of defective modules. It has been observed from the results of F-measure that the proposed framework outperformed only in three datasets KC1, KC3 and PC4. In Accuracy (Table 15), the proposed framework performed better in four datasets including JM1, PC3, PC4, and PC5. In remaining datasets either the result is lower or equal to one or more of the other classification techniques. It has also been noted that NB, kNN, and kStar could not be able to perform better in any of the dataset. In ROC Area, the higher performance is reflected in the following datasets: CM1, JM1, KC1, KC3, MC1, PC1, and PC4 however the results in remaining datasets shows either lower or equal performance when compared to other classification techniques. It has also been observed that RBF, SVM, kNN, OneR, PART, and DT could not be able to perform better in any of the dataset. In MCC, the proposed framework showed the higher performance in following datasets: JM1, KC1, KC3 and PC4. In remaining datasets the scores are either lower or equal, as compared to other classification techniques. It has also been noted that RBF, SVM, OneR, RF, Bag-RF, and Bag-RF-FS could not be able to perform better in any of the dataset. As discussed by [10], F- measure and MCC reacts to the issue of class imbalance however it has been observed in this study that our proposed framework could not be able to fully solve that issue either. V. Conclusion This research proposed and implemented a feature selection based ensemble classification framework. The proposed framework consisted of four stages including: 1) Dataset, 2) Feature Selection, 3) Classification, and 4) Results. Two different dimensions are used in the framework, one with feature selection and second without feature selection. Each dimension further used two ensemble techniques with Random Forest classifier: Bagging and Boosting. Performance of proposed framework is evaluated through Precision, Recall, F-measure, Accuracy, MCC and ROC. For experiments, 12 Cleaned publically available NASA datasets are used and the results of both the dimensions are compared with the other widely used classification techniques such as: “Naïve Bayes (NB), Multi-Layer Perceptron (MLP). Radial Basis Function (RBF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), kStar (K*), One Rule (OneR), PART, Decision Tree (DT), and Random Forest (RF)”. Results showed that the proposed classification framework outperformed other classification techniques in some of the datasets however class imbalance issue could not be resolved, which is the main reason of lower and biased performance of classification techniques. It is suggested for future work that the resampling techniques should be included in proposed framework to resolve the class imbalance issue in datasets as well as to achieve higher performance.
Список литературы A feature selection based ensemble classification framework for software defect prediction
- S. Huda, S. Alyahya, M. M. Ali, S. Ahmad, J. Abawajy, J. Al-Dossari, and J. Yearwood, “A Framework for Software Defect Prediction and Metric Selection,” IEEE Access, vol. 6, pp. 2844–2858, 2018.
- E. Erturk and E. Akcapinar, “A comparison of some soft computing methods for software fault prediction,” Expert Syst. Appl., vol. 42, no. 4, pp. 1872–1879, 2015.
- Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer learning for crosscompany software defect prediction,” Inf. Softw. Technol., vol. 54, no. 3, Mar. 2012.
- M. Shepperd, Q. Song, Z. Sun and C. Mair, “Data Quality: Some Comments on the NASA Software Defect Datasets,” IEEE Trans. Softw. Eng., vol. 39, pp. 1208–1215, 2013.
- I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
- “NASA Defect Dataset.” [Online]. Available: https://github.com/klainfo/NASADefectDataset. [Accessed: 01-July-2019].
- B. Ghotra, S. McIntosh, and A. E. Hassan, “Revisiting the impact of classification techniques on the performance of defect prediction models,” Proc. - Int. Conf. Softw. Eng., vol. 1, pp. 789–800, 2015.
- G. Czibula, Z. Marian, and I. G. Czibula, “Software defect prediction using relational association rule mining,” Inf. Sci. (Ny)., vol. 264, pp. 260–278, 2014.
- D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, and J. C. Riquelme, “Preliminary comparison of techniques for dealing with imbalance in software defect prediction,” in Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM, p. 43, 2014.
- A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana, M. Ahmad, and A. Husen “Performance Analysis of Machine Learning Techniques on Software Defect Prediction using NASA Datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 5, 2019.
- M. Ahmad, S. Aftab, I. Ali, and N. Hameed, “Hybrid Tools and Techniques for Sentiment Analysis: A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 3, 2017.
- M. Ahmad, S. Aftab, and S. S. Muhammad, “Machine Learning Techniques for Sentiment Analysis: A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 3, p. 27, 2017.
- M. Ahmad, S. Aftab, and I. Ali, “Sentiment Analysis of Tweets using SVM,” Int. J. Comput. Appl., vol. 177, no. 5, pp. 25–29, 2017.
- M. Ahmad and S. Aftab, “Analyzing the Performance of SVM for Polarity Detection with Different Datasets,” Int. J. Mod. Educ. Comput. Sci., vol. 9, no. 10, pp. 29–36, 2017.
- M. Ahmad, S. Aftab, M. S. Bashir, N. Hameed, I. Ali, and Z. Nawaz, “SVM Optimization for Sentiment Analysis,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 4, 2018.
- M. Ahmad, S. Aftab, M. S. Bashir, and N. Hameed, “Sentiment Analysis using SVM: A Systematic Literature Review,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 2, 2018.
- A. Iqbal and S. Aftab, “A Feed-Forward and Pattern Recognition ANN Model for Network Intrusion Detection,” Int. J. Comput. Netw. Inf. Secur., vol. 11, no. 4, pp. 19–25, 2019.
- A. Iqbal, S. Aftab, I. Ullah, M. A. Saeed, and A. Husen, “A Classification Framework to Detect DoS Attacks,” Int. J. Comput. Netw. Inf. Secur., vol. 11, no. 9, pp. 40-47, 2019.
- S. Behal, K. Kumar, and M. Sachdeva, “D-FAC: A novel ϕ-Divergence based distributed DDoS defense system,” J. King Saud Univ. - Comput. Inf. Sci., 2018.
- S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Ali, and Z. Nawaz, “Rainfall Prediction in Lahore City using Data Mining Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 4, 2018.
- S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Ali, and Z. Nawaz, “Rainfall Prediction using Data Mining Techniques: A Systematic Literature Review,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 5, 2018.
- S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434–443, 2013.
- J. C. Riquelme, R. Ruiz, D. Rodr´ıguez, and J. Moreno, “Finding defective modules from highly unbalanced datasets,” Actas de los Talleres de las Jornadas de Ingenier´ıa del Software y Bases de Datos, vol. 2, no. 1, pp. 67–74, 2008
- F. Lanubile, A. Lonigro, and G. Visaggio, “Comparing Models for Identifying Fault-Prone Software Components,” Proc. Seventh Int’l Conf. Software Eng. and Knowledge Eng., pp. 312–319, June 1995.
- K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” J. Syst. Softw., vol. 81, no. 5, pp. 649–660, 2008.
- I. Gondra, “Applying machine learning to software fault-proneness prediction,” J. Syst. Softw., vol. 81, no. 2, pp. 186–195, 2008.
- O. F. Arar and K. Ayan, “Software defect prediction using cost-sensitive ¨ neural network,” Applied Soft Computing, vol. 33, pp. 263–277, 2015.
- C. Manjula and L. Florence, “Deep neural network based hybrid approach for software defect prediction using software metrics,” Cluster Comput., pp. 1–17, 2018.
- R. Moser, W. Pedrycz, and G. Succi, “A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction”, In Robby, editor, ICSE, pp. 181–190. ACM, 2008.
- E. Giger, M. D’Ambros, M. Pinzger, and H. C. Gall, “Method-level bug prediction,” in ESEM ’12, pp. 171–180, 2012.
- S. E. S. Taba, F. Khomh, Y. Zou, A. E. Hassan, and M. Nagappan, “Predicting bugs using antipatterns,” in Proc. of the 29th Int’l Conference on Software Maintenance, pp. 270–279, 2013.
- K. Herzig, S. Just, A. Rau, and A. Zeller, “Predicting defects using change genealogies,” in Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, pp. 118–127, 2013.