Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Программные средства

Feature Selection based Breast Cancer Prediction

Автор: Rakibul Hasan, A.S.M. Shafi

Журнал: International Journal of Image, Graphics and Signal Processing @ijigsp

Статья в выпуске: 2 vol.15, 2023 года.

Бесплатный доступ

Breast cancer is one of the main causes of mortality for women around the world. Such mortality rate could be reduced if it is possible to diagnose breast cancer at the primary stage. It is hard to determine the causes of this disease that may lead to the development of breast cancer. But it is still important in predicting the probability of cancer. We can assess the likelihood of occurrence of breast cancer using machine learning algorithms and routine diagnosis data. Although a variety of patient information attributes are stored in cancer datasets not all of the attributes are important in predicting cancer. In such situations, feature selection approaches can be applied to keep the pertinent feature set. In this research, a comprehensive analysis of Machine Learning (ML) classification algorithms with and without feature selection on Wisconsin Breast Cancer Original (WBCO), Wisconsin Diagnosis Breast Cancer (WDBC), and Wisconsin Prognosis Breast Cancer (WPBC) datasets is performed for breast cancer prediction. We employed wrapper-based feature selection and three different classifiers Logistic Regression (LR), Linear Support Vector Machine (LSVM), and Quadratic Support Vector Machine (QSVM) for breast cancer prediction. Based on experimental results, it is shown that the LR classifier with feature selection performs significantly better with an accuracy of 97.1% and 83.5% on WBCO and WPBC datasets respectively. On WDBC datasets, the result reveals that the QSVM classifier without feature selection achieved an accuracy of 97.9% and these results outperform the existing methods.

Еще

Breast Cancer Prediction, Machine Learning, Feature Selection, Classification

Короткий адрес: https://sciup.org/15018750

IDR: 15018750 | DOI: 10.5815/ijigsp.2023.02.02

Текст научной статьи Feature Selection based Breast Cancer Prediction

According to the statistics of the World Health Organization (WHO) in 2019, cancer is the first or second leading cause of human death around the world [1]. In 2020, approximately 10 million cancer deaths happened and female breast cancer (11.7%) exceeded lung cancer (11.4%) as the most commonly diagnosed cancer. It is also reported that the leading cause of cancer-related death in women is breast cancer [2]. Breast cancer develops when some breast tissues begin to grow abnormally. Breast cancer prevention methods have yet to be identified, therefore scientists have focused on creating new and better approaches to treat the disease after it has developed. Furthermore, efforts have been concentrated on the early identification of breast cancer in women through screening to ensure that more lives can be saved through treatment.

For at least 30 years, researchers have been investigating breast cancer screening methods such as mammography, clinical breast examination, and biopsy. Although mammography is one of the most useful methods for screening womens breast cancer, radiologists’ interpretations of mammograms can vary significantly [3]. The accuracy of surgical biopsy is more accurate than mammography but it is a highly cost and invasiveness process [4]. As a result, it is essential to build better breast cancer detection systems. These detection methods can assist in classifying patients into the benign or noncancerous and malignant or cancerous groups. Early detection of breast cancer increases the likelihood of a patient's survival. To achieve this goal, clinicians will need diagnostic systems with high levels of predictability and reliability that can assist them to distinguish between benign and malignant breast tumors.

Various research have been developed based on machine learning and data mining for breast cancer prediction. Some of them focus on improving learning models, and some concentrate on data pre-processing steps [5, 6]. While others focus on feature selection for identifying relevant features from a dataset to build a more effective classification system [7, 8]. Filters, wrappers, and embedded methods are the three different types of feature selection methods. In this study, a wrapper-based feature selection method has been used. The wrapper method finds optimal features by repeatedly evaluating subset combinations about the prediction model's accuracy. Although it is computationally more expensive but has high accuracy [9].

In this study, a breast cancer prediction approach is developed from Wisconsin breast cancer datasets with the assistance of machine learning. Feature optimization is used to apprehend the most unerring features. In our exploration, LR, LSVM, and QSVM classifiers are applied to construct the final prediction.

The rest of this research is structured as follows. Section 2 focuses on related works. The methodology and breast cancer datasets are described in Section 3. Sections 4 and 5 are for experimental results and discussion. Finally, Section 5 draws the conclusion and future directions of this work.

2. Literature Review

In medical science, a significant number of machine-learning based research has been initiated and ongoing to achieve a deeper understanding of successful breast cancer diagnosis using breast cancer datasets from the University of California, Irvine (UCI) machine learning repository.

E. A. Bayrak et al. [10] used Support Vector Machine (SVM) and Artificial Neural Network (ANN) classifiers to predict breast cancer from the WBC dataset. They utilized Sequential Minimal Optimization (SMO) and LibSVM algorithms for the classification of SVM. They also employed Multilayer Perceptron (MLP) and voted perceptron methods for the classification of SVM in WEKA software. They obtained the highest accuracy of 96. 9957% using SMO-SVM with 10-fold cross-validation.
3.1 Dataset Description

Authors [11] presented a breast cancer diagnosis scheme using K-Nearest Neighbour (KNN), Naive Bayes (NB), and Fast Decision Tree (FDT) classifiers. They used Particle Swarm Optimization (PSO) for feature selection. All three classifiers were evaluated on WPBC data and obtained the highest accuracy of 81.3% using the NB classifier. The behavior of two platforms (Spark and Weka) was compared by AlGhunaim et al. [12]. According to their experimental results, they showed that the SVM classifier outperformed the other classifiers. For gene expression and DNA methylation datasets, they obtained an accuracy of 99.68% and 98.73 %, respectively. They also combined the two datasets and achieved an accuracy of 97.33%. Authors [13] employed a deep learning algorithm with multiple activation functions such as Rectifier, Maxout, Tanh, and Exprectifier to classify breast cancer. They attained the highest classification accuracy of 96.99% by utilizing the Exprectifier function.

A neural network with feed-forward backpropagation algorithm was constructed in [14] for the classification of breast cancer and obtained a correct classification rate of 96.63% when applied on theWBC dataset. Microsoft Azure machine learning (AzureML) platform was utilized by K. Alshouiliy et al. [15] to analyze the WDBC dataset for breast cancer prediction. They applied decision trees and decision jungle for classification purposes. Their findings revealed that the decision jungle (97%) outperformed than decision tree (95%) In [16], authors developed an ensemble-based stacking classifier. They implemented different classification methods over the WDBC dataset and fine-tuned their parameters to achieve a better classification rate. By integrating the findings of those classifiers, they got 97.20% accuracy.

Six different machine-learning based classification algorithms (Random Forest, KNN, Decision Trees, NB, LR, and SVM) were applied on the WDBC dataset for the classification of breast cancer [17]. By combining Linear Discriminant Analysis (LDA) and LR, they improved their accuracy. The authors [18] suggested a technique for predicting and analyzing the WDBC dataset. Using Principal Component Analysis (PCA), they selected the top 6 and 10 features. Their suggested model reached 97.52% accuracy when applying Random Forest (RF) classifier with the top selected 10 features. For the prediction of the WPBC dataset, A. I. Pritom et al. [19] used the NB, SVM, and C4.5 Decision Tree. After ranking all the features, authors selected the 11 top-ranked features for classification and gained 76.26% accuracy for NB classifier. A good set of parameters of the KNN algorithm was proposed in [20] for breast cancer prognosis from WPBC dataset. Aalaei Sh et al. [21] developed a Genetic Algorithm (GA) based feature selection technique for breast cancer diagnosis on WBC, WDBC, and WPBC datasets. Authors used three renowned classifiers namely PS, ANN, and GA to analyze the efficacy of their suggested technique. They also compared their outcomes with and without the feature selection approach. Based on their experimental results it is clear that the features selection with the ANN classifier (97.3% and 79.2%) performed better than the other two classifiers on the WDBC, and WPBC datasets. Feature selection-based PS-classifier (96.9%), on the other hand, obtained the highest accuracy on WBC dataset.

On the WBC dataset, Vikas Chaurasia et al. [22] presented a technique for predicting breast cancer survivability using three prominent data mining approaches (NB, RBF Network, J48). After applying 10-fold cross-validation, their experimental results indicated that the NB classifier performed better classification results (97.36%) than RBF (96.7%) network and J48 (93.41%). Authors [23] presented a model for breast cancer classification on the WDBC dataset with the performance of three types of Bayes classifiers: Tree Augmented Naive Bayes (TAN), Boosted Augmented Naive

Bayes (BAN), and Bayes Belief Network (BBN). When compared to other networks, they discovered that the TAN Bayes classifier with Gradient Boosting (GB) technique improved classification accuracy (94.11%). Huang et al. [24] compared the performance of SVM and SVM ensembles in breast cancer prediction. Authors proved that the feature selection based on GA with linear SVM bagging method and GA with RBF kernel SVM boosting method can be a good prediction model for the WBC dataset.

This section discusses the structure of the proposed approach along with the corresponding datasets.

In this article, we have used three breast cancer datasets: WBCO, WDBC, and WPBC acquired from the UCI machine learning repository [25]. Table 1 contains a summary of the sample dataset.

Table 1. Sample dataset

Data Set	Short Name	No. of Attributes	No. of Instances	No. of Class
Wisconsin Breast Cancer Original	WBCO	10	699	2 (B=Benign, M=Malignant)
Wisconsin Diagnosis Breast Cancer	WDBC	32	569	2 (B=Benign, M=Malignant)
Wisconsin Prognosis Breast Cancer	WPBC	34	198	2 (N=Non-Recur, R=Recur)

3.2 Methodology
3.3 Pre-processing
3.4 Feature Selection

Fig. 1 depicts the architecture of the proposed method.

Fig. 1. Architecture of the proposed approach

We have removed the missing values from the datasets. Table 2 shows the pre-processed table.

Table 2. Pre-processed table

Datasets	No. of instances (original)	No. of Missing values	No. of instances (pre-processed)	Dataset distribution
WBCO	699	16	683	B=444, M=239
WDBC	569	None	569	B=357, M=212
WPBC	198	4	194	N=148, R=46

The purpose of feature selection or attribute selection is to reduce the number of features by deleting irrelevant and unreliable features while increasing the classifier's potency. The benefits of feature selection are to reduce the cost of running the classifier in terms of speed and memory, as well as to reduce the curse of feature dimensionality. For feature selection, we have employed WEKA machine learning tools. We have used ClassifierAttributeEval for the attribute evaluator which applies a user-defined classifier to determine the value of an attribute. We have utilized function logistic as a user specified classifier. For search method and attribute selection mode, we have used BestFirst and full training set. All others parameters have been set as default in WEKA environment.

3.5 Machine Learning Classification Algorithms
3.5.1 Logistic Regression
3.5.2 Support Vector Machine

Machine learning is an application of Artificial Intelligence (AI) that enables computers to automatically learn and develop from a past dataset without the need for manual programming. The algorithms of machine learning start with datasets, from which they analyze that data and make a prediction based on the trained model in the near future.

For breast cancer predictions, we have used two important machine learning classification algorithms namely logistic regression and quadratic support vector machine classifiers. All the classifier experiments described in this work have been carried out with MATLAB tools.

Logistic Regression (LR) is a powerful supervised machine learning algorithm. It exhibits the association between an outcome variable (label) and each of the variables that influence it (features). Individual variable contributions to the final fit can be easily understood, and the outputs of back-fitting the data can be directly evaluated as probabilities [26]. In contrast to linear regression, the response variables in logistic regression can be categorical or continuous. A logistic regression output is more informative than the output of other classification techniques and continuous data is not strictly required by the model.

SVM is a well-known supervised machine learning technique that may be used for both classification and regression. The main goal of the SVM is to find the optimum decision boundary that can divide two or more classes (with the maximum margin) so that we can correctly classify new data points. It offers the highest accuracy rate when predicting large datasets. It is a famous machine learning algorithm built on 3D and 2D modeling [27]. SVM algorithms utilize a set of mathematical functions known as kernels. A kernel function takes data as input and transforms it into the desired form. In our study, we have applied a linear and quadratic kernel SVM to classify the datasets. QSVM is a nonlinear kernel with excellent mathematical adaptability and direct geometric explanation [28] that may outperform LSVM. LSVM and QSVM kernel equations are shown in Eq. (1) and (2) [29]. Their kernel difference is depicted in Fig. 2.

^ Linear — У * X + Ь

^ Quadratic ^{— (у} * ^X + ^Ь)

Where x and y are the input n-dimensional feature values, b is the kernel parameter, and K is the Kernel function.

(a) (b)

Fig. 2. (a) A linear and (b) a quadratic kernel

4. Experimental Evaluation

We have analyzed the performance of our proposed method with the help of different performance indexes such as sensitivity (Sen), specificity (Spe), precision (Pre), miss rate, false discovery rate, and accuracy (Acc) (Table 4). These evaluation metrics have been calculated using the confusion matrix shown in Table 3. We have also incorporated Area Under the Curve (AUC) Receiver Operating Characteristics (ROC) to assess the performance criteria. Ten-fold crossvalidation (10-FCV) is performed to determine the validity of this research.

Table 3. Confusion matrix

Actual class

Predicted class

Positive Negative

Positive

True Positive, TP False Negative, FN

Negative

False Positive, FP True Negative, TN

Table 4. Performance indexes

Evaluation metrics	Formula
Sensitivity/Recall/True Positive Rate (TPR)	TP TP + FN
Miss Rate/False Negative Rate (FNR)	FN FN + TP
Precision/Positive Predictive Value (PPV)	TP TP + FP
False Discovery Rate (FDR)	FP FP + TP
Specificity/True Negative Rate (TNR)	TN TN + FN
Accuracy/Overall Accuracy	TP + TN TP + TN + FP + FN

We have applied our suggested feature selection approach on the Wisconsin breast cancer datasets. Table 5 displays the features that are found to be significant.

Table 5. Optimal feature selection result

Dataset Selected features

WBCO Clumpthickness, Uniformcellsize, Uniformcellshape, Barenuclei, Normalnucleoli, Mitoses

WDBC Texture_mean, Area_se, Smoothness_se, Concavity_se, Fractal_dimension_se, Perimeter_worst, Smoothness_worst

WPBC Time, Mean_radius, Mean_symmetry, SE_area, Worst_radius, Worst_Perimeter

Case Study I:

We have applied LR and QSVM classifiers with feature selection on the WBCO dataset. Fig. 3 and 4 depict the confusion matrix and ROC curve. Tables 6 and 7 present the summarized results.

(b)

(a)

Fig. 3. Confusion matrix of (a) LR and (b) QSVM classifier on the WBCO dataset

Table 6. Calculation of performance metrics using LR classifier on the WBCO dataset

Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Benign	436	12	227	8	98.2	1.8	97.3	2.7	95.0
Malignant	227	8	436	12	95.0	5.0	96.6	3.4	98.2	97.1
Weighted Measure					97.1		97.1		96.1	97.1

Table 7. Calculation of performance metrics using QSVM classifier on the WBCO dataset

Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Benign	432	9	230	12	97.3	2.7	98.0	2.0	96.2
Malignant	230	12	432	9	96.2	3.8	95.0	5.0	97.3	96.9
Weighted Measure					96.9		97.0		96.5	96.9

(a) (b)

Fig. 4. ROC curve of (a) LR and (b) SVM classifier on the WBCO dataset

Case Study II:

On the WDBC, we have used stated classifiers (LR and QSVM) with feature selection. The confusion matrix and ROC curve of explained classifiers are shown in Fig. 5 and 6. Tables 8 and 9 give the outcome of the performance indexes.

(a)

Fig. 5. Confusion matrix of (a) LR and (b) SVM classifier on the WDBC dataset

(b)

Table 8. Calculation of performance metrics using LR classifier on the WDBC dataset

Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Benign	352	11	201	5	98.6	1.4	97.0	3.0	94.8
Malignant	201	5	352	11	94.8	5.2	97.6	2.4	98.6	97.2
Weighted Measure					97.2		97.2		96.2	97.2
Table 9. Calculation of performance metrics using QSVM classifier on the WDBC dataset
Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Benign	351	9	203	6	98.3	1.7	97.5	2.5	95.8
Malignant	203	6	351	9	95.8	4.2	97.1	2.9	98.3	97.4
Weighted Measure					97.4		97.3		96.7	97.4

(b)

(a)

Fig. 6. ROC curve of (a) LR and (b) SVM classifier on the WDBC dataset

Case Study III:

Fig. 7 and 8 show the confusion matrix and ROC curve of the described classifier with feature selection on the WPBC dataset. Tables 10 and 11 summarize the results of performance metrics.

Fig. 7. Confusion matrix of (a) LR and (b) SVM classifier on the WPBC dataset

Table 10. Calculation of performance metrics using LR classifier on the WPBC dataset

Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Non-Recur	143	27	19	5	96.6	3.4	84.1	15.89	41.3
Recur	19	5	143	27	41.3	58.7	79.2	20.8	96.6	83.5
Weighted Measure					83.5		83.0		54.4	83.5

Table 11. Calculation of performance metrics using QSVM classifier on the WPBC dataset

Class	TP	FP	TN	FN	Sensitivity	FNR	Precision	FDR	Specificity	Accuracy
Non-Recur	140	29	17	8	95.6	4.4	82.8	17.2	37.0
Recur	17	8	140	29	37.0	63.0	68.0	32.0	95.6	80.9
Weighted Measure					81.7		79.3		50.9	80.9

Fig. 8. ROC curve of (a) LR and (b) SVM classifier on the WPBC dataset

In this study, we have applied the LSVM classifier without and with feature selection on the WBCO, WDBC, and WPBC datasets. Also, we employed LR and QSVM classifier on both the datasets with all features. Table 12 presents a comparative analysis of three classifiers with and without feature selection on both databases.

Table 12. Comparative analysis of LR LSVM, and QSVM classifiers on both datasets with and without feature selection

	WBCO					WDBC								WPBC
	Without FS			With FS		Without FS			With FS					Without FS			With FS
	й	2 >	2 > Z) О	й	2 >	2 > Z) О	й	2 > z	2 > JZ) О		2	2 JZ) О		2	2 JZ) О	й	2	2 JZ) О
Sen	96.7	98	97.0	97.1	97.8	96.9	94.9	97	97.9	97.2	95.9	97.4	78.4	80	74.2	83.5	77.4	81.7
Spe	95.9	95	96.8	96.1	95.8	96.5	94.6	99	96.8	96.2	97.5	96.7	60.3	57.9	53.1	54.4	75	50.9
Pre	96.8	97.3	97.0	97.1	97.8	97.0	95.0	99.4	97.9	97.2	98.6	97.3	78.0	95.6	73.9	82.9	99.3	79.3
Acc	96.8	96.9	96.9	97.1	97.1	96.9	94.9	97.7	97.9	97.2	96.5	97.4	78.4	77.8	74.2	83.5	77.3	80.9

5. Discussion

In this research, a wrapper-based feature selection model is used to recognize significant attributes. Wrapper methods provide a good prediction rate since they select features based on their relevancy and redundancy. It also finds the best subset of features and appears to be less vulnerable to overfitting compared to another feature selection model.

According to Table 12, it is shown that without feature selection both the LSVM and QSVM classifier outperforms the LR classifier on the WBCO dataset (96.9% vs 96.8%). It is also observed from Table 12 that both the LR and LSVM classifiers achieved the highest classification accuracy with feature selection (97.1%). Table 13 compares the accuracy of LR, LSVM, and QSVM classifiers on the WBCO dataset to other published works using various feature selection approaches.

Table 13. Accuracy comparison on the WBCO dataset

Список литературы Feature Selection based Breast Cancer Prediction

World Health Organization (WHO). Global Health Estimates 2020: Deaths by Cause, Age, Sex, by Country and by Region, 2000-2019.
Sung, H., Ferlay, J., Siegel, RL., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F. (2020). Global cancer statistics: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 71: 209- 249. https://doi.org/10.3322/caac.21660.
Elmore, JG., Wells, CK., Lee, CH., Howard, DH., Feinstein, AR. (1994). Variability in radiologists' interpretations of mammograms, N Engl J Med. 331:1493-1499.
Vimpeli, SM., Saarenmaa, I., Huhtala, H., Soimakallio, S. (2008). Large-core needle biopsy versus fine-needle aspiration biopsy in solid breast lesions: comparison of costs and diagnostic value, Acta Radiol. 49(8):863-9. doi: 10.1080/02841850802235751. PMID: 18618302.
Zhang, Y.D., Satapathy, S.C., Guttery, D.S., Gorriz, J.M., Wang, S.H. (2021). Improved breast cancer classification through combining graph convolutional network and convolutional neural network, Inf. Process. Manag. 58, 102439.
Zhang, Y.D., Pan, C., Chen, X., Wang, F. (2018). Abnormal breast identification by nine-layer convolutional neural network with parametric rectified linear unit and rank-based stochastic pooling, J. Comput. Sci. 27, 57–68.
Chandrashekar, G., Sahin, F. (2014). A survey on feature selection methods. Comput. Electr. Eng, 40, 16–28.
Saeys, Y., Inza, I., Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics. 23, 2507–2517.
Babiker, M., Karaarslan, E., Hoscan, Y. (2019). A hybrid feature-selection approach for finding the digital evidence of web application attacks, Turkish J. Electr. Eng. Comput. Sci., 27, 4102-4117.
Bayrak, E.A., Kırcı, P., Ensari, T. (2019). Comparison of Machine Learning Methods for Breast Cancer Diagnosis. Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, pp. 1-3, doi: 10.1109/EBBT.2019.8741990.
Sakri, S.B., Abdul Rashid N.B., Muhammad Zain, Z. (2018). Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction, IEEE Access, vol. 6, pp. 29637-29647. doi: 10.1109/ACCESS.2018.2843443.
Alghunaim, S., Al-Baity, H.H. (2019). On the scalability of machine-learning algorithms for breast cancer prediction in big data context, IEEE Access, vol. 7, pp. 91535-91546.
Mekha, P., Teeyasuksaet, N. (2019). Deep Learning Algorithms for Predicting Breast Cancer Based on Tumor Cells. Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), pp. 343-346, doi: 10.1109/ECTI-NCON.2019.8692297.
Azmi, MSBM., and Cob, Z.C. (2010). Breast Cancer prediction based on Backpropagation Algorithm. IEEE Student Conference on Research and Development (SCOReD), pp. 164-168, doi: 10.1109/SCORED.2010.5703994.
Alshouiliy, K., Shivanna, A., Ray, S., AlGhamdi, A., Agrawal, D.P. (2019). Analysis and Prediction of Breast Cancer using AzureML Platform. IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 0212-0218, doi: 10.1109/IEMCON.2019.8936294.
Basunia, M.R., Pervin, I.A., Al Mahmud, M., Saha, S., Arifuzzaman, M. (2020). On Predicting and Analyzing Breast Cancer using Data Mining Approach. IEEE Region 10 Symposium (TENSYMP), pp. 1257-1260, doi: 10.1109/TENSYMP50017.2020.9230871.
Kaya, S., Yağanoğlu, M. (2020). An Example of Performance Comparison of Supervised Machine Learning Algorithms Before and After PCA and LDA Application: Breast Cancer Detection. Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1-6, doi: 10.1109/ASYU50717.2020.9259883.
Ray, S, AlGhamdi, A., Alshouiliy, K., Agrawal, D.P. (2020). Selecting Features for Breast Cancer Analysis and Prediction. International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 1-6, doi: 10.1109/ICACCE49060.2020.9154919.
Pritom, A.I., Munshi, M.A.R., Sabab, S.A., and Shihab, S. (2016). Predicting breast cancer recurrence using effective classification and feature selection technique. 19th International Conference on Computer and Information Technology (ICCIT), pp. 310-314, doi: 10.1109/ICCITECHN.2016.7860215.
Pawlovsky, A. P., and Nagahashi, M. (2014). A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 189-192, doi: 10.1109/BHI.2014.6864336.
Aalaei, Sh., Shahraki, H., Rowhanimanesh, AR., Eslami, S. (2016). Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets, Iran J Basic Med Sci; 19:476-482.
Chaurasia, V., Pal, S., Tiwari, B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 119-126. doi:10.1177/1748301818756225.
Banu, AB., Subramanian, PT. (2018). Comparison of Bayes classifiers for breast cancer classification. Asian Pac J Cancer Prev (APJCP). 19(10):2917–20.
Huang, MW., Chen, CW., Lin, WC., Ki, SW., Tsai, CF. (2017). SVM and SVM ensembles in breast cancer prediction, PLoS One, 12:1–14.
UCI Breast Cancer Wisconsin (Diagnostic) Dataset, https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/, Last Access: 12.04.2021.
Nadkarni, P. (2016). Core Technologies: Machine Learning and Natural Language Processing, Clinical Research Computing, Academic Press, Pages 85-114, ISBN 9780128031308, https://doi.org/10.1016/B978-0-12-803130-8.00004-X.
Tran, H. (2019). A survey of machine learning and data mining techniques used in multimedia system, Dept. Comput. Sci., Univ. Texas Dallas Richardson, Richardson, TX, USA, Tech. Rep.
Zhang, Y.-D., Wu, L. (2012). An MR brain images classifier via principal component analysis and kernel support vector machine. Prog. Electromagn. Res. 2012, 130, 369–388.
Attallah, O., Sharkas, M. A., & Gadelkarim, H. (2020). Deep learning techniques for automatic detection of embryonic neurodevelopmental disorders. Diagnostics, 10(1), 27.

Еще