Using Artificial Immune Recognition Systems in Order to Detect Early Breast Cancer

Автор: C.D. Katsis, I. Gkogkou, C.A. Papadopoulos, Y. Goletsis, P.V. Boufounou, G. Stylios

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 2 vol.5, 2013 года.

Бесплатный доступ

In this work, a decision support system for early breast cancer detection is presented. In hard to diagnose cases, different examinations (i.e. mammography, ultrasonography and magnetic resonance imaging) provide contradictory findings and patient is guided to biopsy for definite results. The proposed method employs a Correlation Feature Selection procedure and an Artificial Immune Recognition System (AIRS) and is evaluated using real data collected from 53 subjects with contradictory diagnoses. Comparative results with commonly used artificial intelligence classifiers verify the suitability of the AIRS classifier. The application of such an approach can reduce the number of unnecessary biopsies.

Еще

Artificial Immune Recognition System, Breast Cancer, Correlation Feature Selection, Decision Trees, Multilayer Perceptron Artificial Neural Networks, Support Vector Machines

Короткий адрес: https://sciup.org/15010363

IDR: 15010363

Текст научной статьи Using Artificial Immune Recognition Systems in Order to Detect Early Breast Cancer

Published Online January 2013 in MECS

Cancer begins with the uncontrolled division of one cell and results in a visible mass named tumor. Tumor can be benign or malignant. Malignant tumor grows rapidly and invades its surrounding tissues through causing their damage. Breast cancer is a malignant tissue beginning to grow in the breast. The abnormalities like existence of a breast mass, change in shape and dimension of breast, differences in the color of breast skin, breast aches, etc., are the symptoms of breast cancer. The aforementioned disease is the second leading cause of cancer deaths in women today (after lung cancer) [1] and is the most common cancer among women, excluding non-melanoma skin cancers.

During the last decade, breast cancer outcomes have improved with development of more effective diagnostic techniques and improvements in treatment methodologies. The long-term survival rate for women in whom breast cancer has not metastasized has increased, with the majority of women surviving many years after diagnosis and treatment. A key factor in this trend is the early detection and accurate diagnosis of the disease [2]. For that reason, women are subjected to screening, by means of mammography (MG). In many cases, lesions discovered need further evaluation, accomplished by means of Ultrasonography (US) and Contrast-Enhanced Magnetic Resonance Imaging Tomography (CE-MRI). From all the above mentioned modalities, underlying lesions are evaluated, determining the possibility of malignancy. During the imaging routine, lesions are characterized using specific features related with breast cancer risk. In some cases, occult or controversial findings between the various modalities can be met, resulting in equivocal lesions’ assessment, leading to unnecessary core or open breast biopsy. Especially in these cases of diagnostic dilemmas between the MG, US and CE-MRI modalities, there is a lack of evidence regarding the correlation of these features with breast cancer.

The last decade, the use of classification systems in medical diagnosis is increasing gradually. There is no doubt that evaluation of data taken from patient and decisions of experts are the most important factors in diagnosis. Expert systems and different artificial intelligence techniques for classification also help experts in a great deal. Classification systems on the one hand help to minimize possible errors that can be done because of fatigued or inexperienced physician and on the other hand, provide medical data to be examined in shorter time and in more detail. Automated diagnostic systems have been applied to and are of interest for a variety of medical data, such as electrocardiograms (ECGs), electromyograms (EMGs), electroencephalograms (EEGs), ultrasound signals/images, X-rays, and computed tomographic images [3-13]. Moreover, the economic and social values of breast cancer diagnosis are very high. Therefore, the problem has attracted many researchers in the area of computational intelligence recently [1419].

Several examples of application of Artificial Immune System based data mining systems in bioinformatics can be retrieved in literature. Artificial Immune Systems-derived algorithms have been employed in familiarity profiling and prognosis prediction [26] in breast cancer. De Castro and colleagues focused on the use of Hierarchical Artificial Immune Network paradigm for the problem of gene expression clustering [28-29] and for rearrangement study of gene expression [30]. AIS/K-NNK-NN hybrid data mining algorithm have been tested for cancer classification in [31]. PCA-AIRS hybrid systems have been employed in the diagnosis of lung cancer [32-33]. For a brief comparative overview of the performances of these kinds of systems the reader is referred to [27]. An extended literature review providing Artificial Immune System applications in the computational biology domain is provided in [34].

In this work, we propose a methodology that ranks the multimodal extracted features of the lesions and acts as a decision support system which provides a prognosis of malignancy. In the following paragraphs, we first outline the steps of our methodology. We then present our experimental results; finally a comparison of our proposed, Artificial Immune Recognition System (AIRS) based, method with other commonly used artificial intelligence classifiers is provided.

II. Proposed Methodology

The proposed methodology uses a Correlation Feature Selection (CFS) procedure to rank the extracted multimodal features and an Artificial Immune Recognition System (AIRS) classifier in order to support breast cancer diagnosis. Table 1 provides the lesions’ features extracted from the MG, US and CE-MRI modalities. It must be noticed that no special attributes are necessary to be extracted for our methodology since the same features are used in the daily clinical routine by the physicians to diagnose breast cancer. The overall methodology schema is illustrated in Figure 1.

Table 1: Features extracted from the MG, US and CE-MRI modalities

Modality	MG	US	CE-MRI
Extracted features	density	US-margins	CE-MRI margins
	margins	Acoustic shadow	Time signal intensity curve
	Architectural distortion	vascularity	-
	microcalcifications	-	-

A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task (i.e. breast cancer). In this framework, a feature selection method has been applied aiming at reducing the set of features that efficiently describe the dataset and in this way at providing a simpler classification model. The CFS algorithm, proposed by Hall [20], is based in the central hypothesis that good feature sets contain features that are highly correlated with the class (malignancy or benignity), yet uncorrelated with each other. CFS is a filter approach [21] independent of the classification algorithm by considering the individual predictive ability of each feature along with the degree of redundancy between them. Subsets of features that are highly correlated with the class while having low intercorrelation are preferred.

Classification occurs using an unweighted k-Nearest Neighbor approach [24, 35]. According to this approach the class of a new instance is determined as the class of the majority of the k nearest training examples.

Proximity of examples is calculated using a measure of distance, commonly Euclidean distance in the case of continuous variables and Manhattan distance in the case of nominal variables. Algorithm 1 summarizes the training procedure of the AIRS algorithm [25]. The evaluation results achieved by the proposed methodology are provided next.

Fig. 1: The proposed methodology schema: For each patient Lesion’s features are extracted through the data acquisition module which consists of the Mammography, Ultrasound and the Magnetic resonance modalities. To provide a simpler classification model a CFS Feature extraction module is used. The classification module is based on the AIRS classifier in order to support breast cancer diagnosis (malignancy or benignity)

The natural immune system is a complex, robust, biological system within an organism that protects against disease by identifying and killing pathogens. It is able to distinguish organism’s own healthy cells and tissues from a wide variety of viruses and parasitic worms. It is adaptive, complex, capable of maintaining memory of previous encounters, to name just a few of its more attractive computational properties. AIRS, is a supervised, immune inspired learning algorithm [22-23]. AIRS algorithm aim is to prepare a pool of memory (recognition) cells representative of the training data the model is exposed to, and suitable for classifying unseen data. The recognition cells in the memory pool are stimulated by an antigen and each cell is allocated a stimulation value. The memory cell with the greatest stimulation selected as the best match memory cell for use in the affinity maturation process.

III. Results

To evaluate our methodology, we have gathered data arising from 53 subjects out of 4726 cases. The specific subjects presented lesions that were not highly

suggestive of benignity or malignancy when evaluated on every modality used. In all cases biopsy was conducted and the biopsy results were used as golden standard to validate our methodology. The constructed dataset consists of the features presented in Table 1 as well as the biopsy results (malignancy or benignity) for all 53 subjects. All data were collected in the University Hospital of Ioannina, Greece.

The performance of the AIRS classifier on the above dataset has been evaluated. The parameters of the classifier have been selected according to the literature and experimentally. Specifically, the initial ARB cell pool size was set to 1, the number of mutated clones to create of an ARB was set to 80 and the maximum number of resources that can be allocated to ARBs in the ARB pool was set to 300. According to [35] the usual numbers of k are in the range from 1 to 7. For this reason 7 variations of AIRS have been tested, according to the value of k. Having applied the Correlation based Feature extraction the selected features are: (i) Mammography architectural distortion, (ii) Ultrasound margins, and (iii) Ultrasound acoustic shadow.

/"training procedure*/

Create Antigens/*load training examples*/

Calculate affinity/* for all antigens */

Initialize ARB population/* random antigens from the training vector*/

Initialize set of Memory Cells

FOR Each Antigen

Present Antigen to ARB-poo

Compute Stimulation values of ARBs

Select ARBs with max Stimulation

Update set of Memory Cells

Remove least stimulated ARB

FOR all ARBs

WHILE number of clones

Clone and mutate

Compute Stimulation values for mutant number of clones number of clones+1

END WHILE

Candid memory cel Mutant with max stimulation

F Stimulation(Candid memory cel I )>Sti m u lation( ARB)

THEN Update set of Memory Cells/*add

Candid Memory Cell to the set of Memory Cells*/

END IF

END FOR

Remove similarand least stimulated ARB

ENDFOR

/"end of training procedure*/

/* classify according to a k-nn schema*/

Present new case to set of Memory Cells

Compute Stimulation values

Classify new case using a majority vote of the outputs of the к most stimulated memory cells

Algorithm 1: The AIRS pseudocode

The importance of the selected features was evaluated by to two experienced breast radiologists. According to them the Mammography architectural distortion feature is of great importance since invasive carcinoma distorts the interfaces between fat and normal breast parenchyma due to the response of host tissues to the malignancy. Especially In the very dense breast, the tumor mass can be so obscured by adjacent benign tissues as to be invisible, leaving as the only indication of underlying malignancy an area of focal architectural distortion [36]. Moreover, the Ultrasound margins have been the most commonly reported findings in the literature during the past 20 years. The presence of angular margins is a hard finding, indicative of invasive malignancy in most instances. Finally, the Acoustic shadow is a finding that reflects the surrounding tissue’s reaction induced by malignant masses. It can occur with either invasive malignancy or ductal carcinoma in situ [37]. We have compared the AIRS performance (for k=1 to 7) with the results obtained by widely used classification methodologies such as Multilayer Perceptron (MLP) Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Decision Trees (C4.5 algorithm). AIRS and the comparative classification scheme is evaluated in two modes: (i) using the full set of features, (ii) using a subset of features obtained by applying the Correlation based Feature Selection (CFS) method. In order to minimize the bias associated with the random sampling of the training and testing data samples, 10 fold cross validation is applied. The obtained results are provided in Tables 2 and 3.

Table 2: AIRS classification results obtained for different k values

	Using the full set of features	Using the subset of CFS selected features
k #	Accuracy (%)	Accuracy (%)
AIRS (k=1)	73.67 +1.99	68.00+5.92
AIRS (k=2)	67.67 +3.52	59.67+5.75
AIRS (k=3)	83.33 +6.63	81.32+5.34
AIRS (k=4)	75.33 +5.41	60.00+7.20
AIRS (k=5)	81.00 +8.19	67.67+10.66
AIRS (k=6)	73.67 +6.21	69.67+6.66
AIRS (k=7)	77.33 +5.77	77.00+2.36

Table 3: The AIRS, MLP, SVM and C4.5 classification results

Classifier	Accuracy (%) + STD		Accuracy (%) + STD
AIRS	83.33+6.63	Using the full set of features	81.32+5.34	Using the subset of CFS selected features
MLP	73.67+4.32		70.90+4.81
SVM	70.00+6.33		68.92+6.97
Decision trees (C4.5)	66.57+4.21		66.15+3.18

As it can be seen from Table 3, the AIRS classifier (using 3-NN) achieves high classification rate compared to the ANN, MLP, SVM and C4.5 approach both using the full set of features or using CFS.

IV. Conclusion

In this work, we have presented a methodology that evaluates the multimodal extracted features of the lesions and provides information to the radiologist regarding breast cancer prognosis. Moreover we have constructed a dataset containing exclusively equivocal findings between the MG, US, and CE-MRI modalities. The applicability and performance of the Artificial Immune Recognition Systems to our dataset was examined. The classification accuracy of the AIRS algorithm was superior compared to conventional classification schemas. A direct comparison with other methodologies is not feasible since according to our knowledge there is no published work using a combination of MG, US and CE-MRI modalities in obscured findings. The achieved initial results are promising keeping in mind that our constructed dataset consists exclusively of equivocal cases. Our Future work will concern to enrich the constructed database with more equivocal findings and to provide a decision support system useful to the clinical practice aiming to decrease the number of unnecessary biopsies, and by this way to reduce the cost and the rate of complications.

Список литературы Using Artificial Immune Recognition Systems in Order to Detect Early Breast Cancer

D. Max Parkin, Freddie Bray, J. Ferlay and Paola Pisani, Global Cancer Statistics, 2002, CA Cancer J Clin vol.55, (2005), pp. 74-108.
D. West, P. Mangiameli, R. Rampal, V. West, Ensemble strategies for a medical diagnosis decision support system: a breast cancer diagnosis application, Eur. J. Oper. Res. 162, (2005), pp 532–551.
C. Papaloukas, D.I. Fotiadis, A. Likas, L. K Michalis, An ischemia detection method based on artificial neural networks, Artificial Intelligence in Medicine, Vol.24, Issue 2, (2002), pp. 167-178.
T.P. Exarchos, C. Papaloukas, D.I. Fotiadis, L.K. Michalis, An association rule mining-based methodology for automated detection of ischemic ECG beats, IEEE Transactions on Biomedical Engineering, Vol 53, Issue 8, (2006), pp. 1531-1540.
Y. Goletsis, C. Papaloukas, D.I. Fotiadis, A. Likas, L.K. Michalis, A multicriteria decision based approach for ischaemia detection in long duration ECGs, 4th International IEEE EMBS Special Topic Conference on Information Technology Applications in Biomedicine, (2003), pp. 173-176.
I. Guler, E.D Ubeyli, ECG beat classiﬁer designed by combined neural network model. Pattern Recognition, Vol.38, Issue2, (2005), pp. 199–208.
A.T. Tzallas, P.S. Karvelis, C.D. Katsis, D.I. Fotiadis, S. Giannopoulos, S. Konitsiotis, A Method for Classification of Transient Events in EEG Recordings: Application to Epilepsy Diagnosis, Methods of Information in Medicine, Vol. 49, Issue 6, (2006), pp: 610-621.
C.D. Katsis, Y. Goletsis, A. Likas, D.I. Fotiadis, I. Sarmas, A novel method for automated EMG decomposition and MUAP classification, Artificial Intelligence in Medicine, Vol. 37 Issue 1, (2006), pp. 55-64.
C. D. Katsis, T.P. Exarchos, C. Papaloukas, Y. Goletsis, D. I. Fotiadis, I. Sarmas, A two-stage method for MUAP classification based on EMG decomposition, Computers in Biology and Medicine, Vol. 37, Issue 9, (2007), pp. 1232-1240.
C.I. Christodoulou, C.S. Pattichis, Unsupervised pattern recognition for the classification of EMG signals, IEEE Transactions on Biomedical Engineering, Vol.46 Issue:2, (1999), pp. 169 – 178.
E.D. Ubeyli, I. Guler, Improving medical diagnostic accuracy of ultrasound Doppler signals by combining neural network models, Computers in Biology and Medicine, Vol.35, Issue 6, (2005), pp. 533–554.
E.D. Ubeyli, I. Guler, Feature extraction from Doppler ultrasound signals for automated diagnostic systems. Computers in Biology and Medicine, Vol. 35, Issue 9, (2005), pp.735–764.
S. AlZubi, A. Amira, 3D Medical Volume Segmentation Using Hybrid Multiresolution Statistical Approaches, Advances in Artificial Intelligence, Volume 2010.
R. Setiono, Generating concise and accurate classiﬁcation rules for breast cancer diagnosis, Artiﬁcial Intelligence in Medicine, Vol. 18, Issue 3, (2000), pp. 205–219.
D. West, V. West, Model selection for a medical diagnostic decision support system: a breast cancer detection case. Artiﬁcial Intelligence in Medicine, Vol. 20, Issue 3, (2000), pp. 183–204.
H. A. Abbass, An evolutionary artificial neural networks approach for breast cancer diagnosis, Artificial intelligence in Medicine, Vol. 25, Issue 3, (2002), pp. 265-281.
E.D. Ubeyli, Implementing automated diagnostic systems for breast cancer detection, Expert Systems with Applications, Vol. 33, (2007), pp. 1054–1062.
M. Karabataka, C. Inceb, An expert system for detection of breast cancer based on association rules and neural network, Expert Systems with Applications,Vol. 36, Issue 2, (2009), pp. 3465-3469.
S. Belciug, E. El-Darzi, A partially connected neural network-based approach with application to breast cancer detection and recurrence, 5th IEEE International Conference Intelligent Systems, (2010), pp. 191–196.
M. A. Hall, Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand, 1998.
M. A. Hall and G. Holmes, Benchmarking attribute selection techniques for discrete class data mining, IEEE Transactions in Knowledge and Data Engineering. Vol.15, (2003), pp. 1437-1447.
A. Watkins. A resource limited artificial immune classifier. Master's thesis, Mississippi State University, MS. USA., December 2001.
A. Watkins, J. Timmis, L. Boggess, Artificial immune recognition system (AIRS): An immune-inspired supervised learning algorithm, Genetic Programming and Evolvable Machines, Vol. 5, Issue3, (2004), pp. 291-317.
E. Fix, J.L. Hodges, Discriminatory analysis, nonparametric discrimination: Consistency properties. Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.
Y. Goletsis. T.P. Exarchos, C.D. Katsis, Bio-Inspired Intelligence for Credit Scoring, Special Issue on Computational Methods in Financial Engineering, International Journal of Financial Markets and Derivatives, Vol.2, No.1/2, (2011), pp.32 – 49.
F. Menolascina, R.T. Alves, S. Tommasi, P. Chiarappa, M. Delgado, V. Bevilacqua, G. Mastronardi, A.A. Freitas, A. Paradiso, Improving Female Breast Cancer Prognosis by means of Fuzzy Rule Induction with Artificial Immune Systems, Proceedings of the International Conference on Life System Modeling and Simulation, 2007.
F. Menolascina, S. Tommasi, P. Chiarappa, V. Bevilacqua, G. Mastronardi,A. Paradiso, Data mining techniques in aCGH-based breast cancer subtype profiling: an immune perspective with comparative study. BMC Systems Biology 1, 2007.
G.B. Bezerra, G.M.A Cado, M. Menossi, L.N. de Castro, ,F.J. von Zuben, Recent advances in gene expression data clustering: a case study with comparative results, Genet. Mol. Res. Vol. 4, Issue 3, (2005), pp. 514–524.
E.R. Hruschka, R.J. Campello, L.N. de Castro, Evolving clusters in gene expression data. Inf. Sci. Vol. 176, Issue 13, (2006), pp. 1898–1927.
J.S. de Sousa, C.T. de Gomes, G.B. Bezerra, L.N. de Castro, F.J. von Zuben, An Immune-Evolutionary Algorithm for Multiple Rearrangements of Gene Expression Data, Genetic Programming and Evolvable Machines Vol. 5, Issue 2, (2004), pp. 157–179.
S. Sahan, K. Polat, H. Kodaz,S. Gunes, A new hybrid method based on fuzzy artificial immune system and k-nn algorithm for breast cancer diagnosis. Computers in Biology and Medicine Vol. 37, Issue 3, (2007), pp. 415–423.
K. Polat, S. Gunes, Principles component analysis, fuzzy weighting preprocessing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer, Expert Systems with Applications, Vol. 34, Issue 1, 2008.
K. Polat, S. Gunes, Computer aided medical diagnosis system based on principal component analysis and artificial immune recognition system classifier algorithm, Expert Systems with Applications, Vol. 34, Issue 1, 2008.
V. Bevilacqua , F. Menolascina , R. T. Alves ,S. Tommasi , G. Mastronardi , M. Delgado ,A. Paradiso , G. Nicosia, A. A. Freitas , Artificial Immune Systems in Bioinformatics, Computational Intelligence in Biomedicine and Bioinformatics, Volume 151, (2008), pp 271-295.
J. Brownlee, Artificial Immune Recognition System (AIRS) - A Review and Analysis,Technical Report], Centre for Intelligent Systems and Complex Processes, Faculty of Information and Communication Technologies, Swin-burne University of Technology, Victoria, Australia, Technical Report ID: 1-01, 2005.
E.A. Sickles, Mammographic features of “Early” breast cancer, American Journal of Roentgenology, (1984), pp 143-464.
A.T. Stavros, C.L. Rapp, S. H. Parker, Breast ultrasound, Lippincott Williams & Wilkins editors, 2004.

Еще

Статья научная