Critical Analysis of Data Mining Techniques on Medical Data
Автор: Zahid Ullah, Muhammad Fayaz, Asif Iqbal
Журнал: International Journal of Modern Education and Computer Science (IJMECS) @ijmecs
Статья в выпуске: 2 vol.8, 2016 года.
Бесплатный доступ
The use of Data mining techniques on medical data is dramatically soar for determining helpful things which are used in decision making and identification. The most extensive data mining techniques which are used in healthcare domain are, classification, clustering, regression, association rule mining, classification and regression tree (CART). The suitable use of data mining algorithm can enhance the quality of prediction, diagnosis and disease classification. Valuation of data mining techniques demand for medical data mining is the major goal here, particularly to examine the local frequent disease like heart ailments, breast cancer, lung cancer and so on. We examine for discovering the locally frequent patterns through data mining technique in terms of cost performance speed and accuracy.
Classification, clustering, regression, association rule mining, data mining
Короткий адрес: https://sciup.org/15014838
IDR: 15014838
Текст научной статьи Critical Analysis of Data Mining Techniques on Medical Data
Published Online February 2016 in MECS DOI: 10.5815/ijmecs.2016.02.05
In the last decade the data-mining has become one of the precious tool for manipulating and extracting data and as well as for discovering patterns to generate important information for decision making. Due to unawareness of people the structure, metals, materials such as (oil, buildings, sewage pipes, and water) were collapse in surroundings while they need to note the precedent problems that can prevent the future occurrences. However, life related activities show the same model. The activity, whether it is advertising, trade sales, money, banking, employment, human migration, populace study, production, health sector, science or education, monitoring of human or machines the entire have some approach to note the information which is known but the problem is they do not have the accurate tool to handle the uncertainties of the future while apply this identified information.
The compilation of data technology invention, like block system scanners in marketable areas and sensor in manufacturing and scientific area, have guided toward production of big quantity of information [1]. An enhance methods and tools were required urgently for this huge amount of data that can smartly and mechanically convert data hooked on helpful data. For example, Monitoring system of NASA’s earth, which is anticipated to come back by the end of the century with the speed of more than a few gigabytes per hour is now formed an enhance requirements to put this amount of information to assist people and make good choice in the available area [2]. These requirements consist of mechanical summarization of information, the removal of the basic stored data and detection of prototype in the rare information. Through data evaluation it can be obtained, which contain an easy string matching, simple queries, or mechanism for showing data [3]. These types of investing data method contains alteration, assemblage, taking out data organization, and examine to see prototype in order to make prophecy.
Image mining is an interdisciplinary endeavor that represents upon capability in different field such as computer vision, image recovery, pattern recognition and matching. Few techniques permit image mining to encompass two different techniques. First one technique is extracting images from image databases or does compilation of images. While, the second technique extracts a grouping of associated alphanumeric data and collection of images. Research in image mining can be extensively dividing in two major direction i.e. (1) Domain specific applications (2) General applications. These both applications are used to take out most appropriate image feature and after this generate image models. The huge amount of image data is produced in everyday life and as well as in different field such as medical, sports, astronomy and every kind of photographic images. This is still the rising field for research and even it is yet the experimental stage. Lack of knowledge in the research image mining is the hurdle to briskly growth [4]. Image data play a significant role in every feature of the system such as business, engineering, hospitals and so on.
-
II. Litrature Review
In this paper [5] different techniques are used to detect the cancerous tissue of mammograms. The first technique is image mining technique, the image mining technique have the capability to take the complete information from image and even extract the hidden information which is not clear in the image. The main purpose of this paper is to divide the breast mammograms by applying image mining technique to identify the affected area. The mammogram image is classified into three different classes i.e. normal, benign and malignant class. In developed countries, breast cancer is very common in women’s even it leads to death if the cancer is not understood in the initial stage. Nowadays, the most efficient model is mammography for the prediction of breast cancer. But still 10-30% patients of breast cancer are missed due to mammography. Classification methods have been used in this paper which is called decision tree classifier. It involves testing phase and training phase. So in the training phase, the important information which is taken by image feature is separate and create on the base of this training class. However, in testing phase, space partition is used to classify the image. As we know, the interpretation of mammogram is very difficult, but preprocessing make it easy by removing noise-reducing step for the improvement of image so preprocessing would be very helpful for improving of image quality.
The data mining technique has been using in medical side very largely over the last 10 years or so. In this paper [6], Multiple Knot Spline Smooth Support Vector Machine (MKS-SSVM) is suggested. The technique MKS-SSVM is an enhance SSVM which maximize the benefit task in place of the necessary sigmoid task. However, the main objective of this work is to offer a latest investigation on the usage of data mining technique for medical analysis tribulation. Beside this, two medical data sets (heart disease and diabetes disease) have been taken to judge the usefulness of this technique. Therefore, examining of medical data set MKS-SSVM was very useful, particularly for heart disease and diabetes disease. Because the exactness of the preceding result was lower than 90% by these data. Nowadays, the usage of support vector machine is very common worldwide. The multiple knot spline smooth support vector machine arrived at 93.20%. Finally, the efficiency of uniform design method are 96.62% and 96.58% respectively. In literature, too much research is available on medical diagnosis of diabetes disease. But the majority efficiency is not too good. Least Square Support Vector Machine (LS-SVM) and Generalized Discriminant Analysis (DDA) was utilized where they reported classification efficiency utilizing LS-SVM is 78.21%. Whereas, utilizing GDA-LS-SVM they reported the classification efficiency 79.16%. Utilizing Principal Component Analysis (PCA) and Adaptive Neuro-Fuzzy Inference System (ANFIS)
the efficiency was 89.47%. The efficiency attains through General Regression Neural Network (GRNN) was 80.21%. Multilayer Neural Network (MLNN) with LM algorithm was 77.08%. MLNN was 79.62%. Through conventional validation method the efficiency was 82.37%. PNN was 78.05%. Therefore, it shows that Multiple Knot Spline Smooth Support Vector (MKS-SSVM) is very effective as compared to the rest of the remaining techniques having the efficiency of 94.15%. In MKS-SSVM more searching can produce a lot of exciting outcomes.
Breast cancer is very common in women which are the second highest reason of cancer leading to death. Breast cancer diagnosis and prognosis have been tested through different data mining techniques. Unlikely, in the last decade, breast cancer soars dramatically in women especially in developed countries. This paper [7], is about breast cancer diagnosis and prognosis and also pay attention to latest research utilizing data mining techniques for improving breast cancer diagnosis and prognosis. Data mining plays a vital role for finding information, through different technique we can extract various information. Whereas, breast cancer has been very common particularly in first world states. However, there is no prime avoidance as reason is yet not unstated. On time recognition of breast cancer is of the most efficient way to minimize the fatality ratio of breast cancer. In this paper few different techniques are discussed which could be very useful for breast cancer classification. The forecasting rate of decision tree is 93.63% which is very efficient and this forecasting could be used in upcoming time for designing web based application. Decision tree is using extensively for classification method.
Medical data mining is still very famous area particularly diagnosis of heart disease, in fact a lot of researchers is working hard to improve medical decision support system to assist the physician. The suggested algorithm in this paper [8], is decision tree C4.5 algorithm to recognize the heart disease and also know the efficiency of this algorithm. Heart disease, typically called coronary artery disease (CAD). Often CAD patients feel chest pain and fatigue and this happen when the oxygen not goes to heart sufficiently. However, almost 50% patients do not feel any indication and the heart attack happen. There are a lot of reasons which maximize the hazards of coronary artery disease. i.e. having no exercise, lofty blood pressure, lack of cholesterol, smoking, fatness, cardiovascular disease. Decision tree induction algorithm is widely utilized for few years. Decision tree has a separate purpose which can produce number of helpful expressions. This is very helpful technique for the arrangement. Decision trees require two types of data: testing data and training data. Commonly training part is the superior part, if the training data is in large number so the outcome will be efficient. While, the testing is used to achieve the best efficient figure of the decision tree. But in this paper three algorithms has been used to recognize the heart disease which are C4.5 algorithm, bagging with Naïve Bayes and
Bagging with decision tree C4.5 algorithm. As usual bagging with Naïve Bayes gives a good result between the tested methods.
The suggested technique [9], only focused on analysis of brain tumor through CT-Scan (Computerized Tomography) brain images. The suggested work which contains pre-processing which is very vital data mining technique. Utilizing the association rule mining the feature selection is completed. The association rule mining and decision tree provide the best result when merge it through the suggested technique. In classification area decision tree is influential because it generate the efficient outcomes. Image preprocessing is effective in medical image, computer vision etc. If examining the object properties so it is necessary to separate that thing from the image what you want. However, it becomes a dilemma. In machine learning and as well as statistics, feature selection is conventional dilemma. CT-Scan brain images are very difficult to explain, whereas, to yield a representation of brain image which is very trustful, for this a preprocessing of brain images would be required. The suggested technique in this paper contain two segments i.e. training and testing segments. In transactional database the decision tree and association rule mining have done their job comparatively outstanding. For accessing brain tumor CT-Scan brain images are one of the effective ways. The shape priori algorithm is a new method to preprocess the images which provide the effective feature for saving in the transactional database. Decision tree categorizes the rules which are helpful for the physicians to take the best decision.
Magnetic resonance imaging (MRI) is frequently the technique medical imaging while tissue is important to mark out. An advanced approach is presented in this paper [10], for automatic analysis, which rely on the arrangement of magnetic resonance images (MRI). Feature removing and categorizing stages are suggested. The result of suggested method is very efficient when contrast with other current work. In human brain MRI is an effective and enhance medical imaging method. The useful information which MRI offer is developed considerably the value of treatment. The significant benefit of MRI is, it is non-invasive method. Computer technology usage in medical side is noticeable especially in cancer, heart diagnosis and brain tumor etc. Wavelet transform is useful device for feature extraction the reason is it permitted the image examining at different stages. However, this method needs some huge storage. Therefore, a substitute technique is used for length elimination. The k-NN produced some outstanding results for best values of k. The experimental outcomes show that the suggested method is very effective for human brain categorization of normal and abnormal. The percentage of classification was over 90% in FP-ANN as for as k-NN is concerned its percentage 99%. The classification performances represent some benefits of these techniques are, it is quick, simple in using and it is cheap as well.
K-Nearest Neighbor is a well-known algorithm for pattern recognition. The researchers [11], realize that the KNN algorithm achieves excellent results. Unlikely, the KNN algorithm has some drawbacks such as computation complication and as well as the performance is only relying on training set and the final one is no mass distinguish among samples. However, to beat these drawbacks, an advance version of KNN is suggested. Unite Genetic Algorithm and KNN to enhance the classification performance rather than taking the whole training sample and k-neighbor. All the work is compared with KNN, CART and SVM. Genetic algorithm and K-nearest neighbor (KNN) algorithm is joined which is called Genetic KNN (GKNN), to defeat the restriction of traditional KNN.
This paper [12], suggested a technique for automatic 3D segmentation of human brain Computer Tomography scan using data mining techniques. There are few steps of this suggested technique. These techniques is applied in 3D image processing for Rapid Miner stage and present it freely. However, in 2D and 3D the brain pieces are developed. To obtain the best outcomes and eliminate the misclassified tool in the picture post-processing has been done. Post-processing contain on two steps. Initially the picture is divided in 2D to choose the largest white section in the piece. In the next step, a 3D median filter having a radius of three pixels are applied on second step. This filter eliminate tiny projections whereas the mean filter make flat the resultant picture. By post-processing the efficiency of this can boost.
Data extraction and image recovering is now an intriguing area for research because of fast improvement in digital image databases capacity. In this era information is available enormously in visual shape. Now it is necessary to find an image by content. Image mining is using extensively in different fields such as medical diagnosis, space research etc. This paper [13], is about to decide the precise images when extracting data from an image using Lorenz Information Measure image matching method along with neural networks. This procedure is self-determining of many constraints to produce strong and healthy results. The main target of this work is to obtain an innovative technique for perceiving images. Therefore, an efficient method namely Lorenz Information Measure (LIM) is used on behalf of feature extraction from the images for recovery. The research recommend a narrative method to know an image. Whereas, the suggested method in this paper LIM is joined with Discrete Cosine Transform and create the answer. Finally, the video storage and image area and as well as the recovering in the multimedia area is that area which is increasing dramatically. The suggested Lorenz Information Measure (LIM) establish on image matching method is effectively calculate and applied on Matlab.
The recognition of skin area plays a vital part in different type of applications like face recognition, signal recognition and mature image filtering. Several definition concerning digital imaging and few image preprocessing and normalization as well are presented. This research paper [14], is about face detection and skin detection. Two sub-problems are handled in this paper that are skin detection and face detection. The manual assigning plan to a set of labels to a multimedia data, first save it and then compare it which is not an efficient method. Sometimes, it is very tough to explain media content in words. Therefore, this technique is not efficient. So for an effective result concerning multimedia data an advance technique is applied. The content of multimedia data is recovered automatically through the second technique, current researches on image recovery concentrate on content based image retrieval (CBIR). Skin recognition is a tough job which is very fascinating to implement the data mining technique. A flood of problems is occurred in almost every field like science business and government. The potential for gathering and accumulating data was out of control. Therefore, the conventional data analysis technique are not effective further to control the huge data sets. Nevertheless, the basic concern was to take out information in an understandable form from the large quantity of data. The one and only data mining take out the precise information from the databases which include some huge quantity of data relating operation and performance.
Automated classification of medical images is a significant tool for physicians in their day to day activity. Data mining classifiers are the suggested method for medical image classification. J48 decision tree and Random Forest (RF) classifiers has been used in this paper. For organizing brain images CT scan the brain images is further divided into three groups, such as tumor, stoke and inflammatory. This suggested classification method [15], is totally depend on efficient use of texture information of images. In the recent times Computer Aided Diagnosis system which uses CBIR to seek for experimental related and diagrammatic like images, this shows doubtful injuries and still this is a fascinating research. Computer Aided Design, automated classification techniques are required to assist the physicians through both conventional and CBIR-based in analysis of complex diseases. This is obvious that researchers have really work hard to find the precise solution of image classification problems using distinctive pattern recognition technique like Support Vector Machine (SVM), Bayesian Network (BN) and Artificial Neural Network (ANN). For medical images data mining methods show to be superior classifier presently. Classification technique for CT scan brain images is discussed in this paper.
In this paper [16], different techniques are used to detect the cancerous tissue of mammograms. The first technique is image mining technique, the image mining technique have the capability to take the complete information from image and even extract the hidden information which is not clear in the image. The main purpose of this paper is to divide the breast mammograms by applying image mining technique to identify the affected area. The mammogram image is classified into three different classes i.e. normal, benign and malignant class. In this paper a complete method is explained in which define a uniform terminology general properties and the necessity of local methods and facilitate the reader to choose the appropriate method which is best for the precise request for finding the micro calcifications in mammograms images. Though, some progress has been made up to now. This is still a research area because there are still some challenges to be solved, like enhancement and segmentation technique, building best pre-processing, making best feature extraction, selection, classification algorithms, and combination of classifier to minimize both false negatives and false positives and exploring 3-D mammograms.
As the multimedia applications became well known the video and image connected to actual life has enormously productive for rising storage techniques. In the previous several years image recovery through contented-based turn out to be very fascinating subject. The suggested technique in this paper [17], is an enhance image categorization method utilizing multiple level association rules based on the image objects. From the last decade or so the internet and multimedia application became magnificently well known, additional digital multimedia connected to our life are manufactured. In addition, the facility of data storage is increasing dramatically. However, to tackle these big quantities of multimedia data recovery of multimedia data is a rising matter. While, to solve the data recovery problem, Content-Based Image Retrieval receive the attention for research at the present time. Image classification is extremely significant problem in sustaining CBIR and different multimedia applications. Whereas, a huge quantity of multimedia data can be sorted out effectively using the effective classification method. In multimedia database developing stage it is very helpful it even enhance the performance of multimedia data mining. Finally, in this paper an enhanced technique is suggested to develop the image classification rules utilizing the hierarchical association relations between the image objects.
In this paper [18] a new method has been proposed, this method is consisted of three stages first the feature reduction method has been used by applying discrete wavelet transform on MRI images. The discrete wavelet transform compress the image and only approximate image it returned which is very informative for classification. The approximate image is still very large for classification. After feature extraction PCA has been used for feature reduction to remove the redundant features. At last stage the classifier KNN and ANN have been used for classification. KNN and ANN both are simple method for classification.
Список литературы Critical Analysis of Data Mining Techniques on Medical Data
- P. Lyman, and R.V. Hal, "How much storage is enough," Storage, pp.1-4, 2003.
- W. Jay, and E.A. Smith, "Evolution of Synthetic Aperture Radar Systems and Their Progression to the EOS SAR," IEEE Trans, Geoscience and Remote Sensing, vol. 29, no.6, pp. 962-985, 1991.
- M.F. Usama, "Data-Mining and Knowledge Discovery Making Sense out of Data," Microsoft Research IEEE Expert, vol.11, no.5, pp.2025-985, 1996.
- A. Berson, K. Thearling, and J. Stephen, Building Data Mining Applications for CRM, CRM, USA, McGraw-Hill, 1999.
- A.K. Mohantly, S.K. Lenka, "Efficient Image Mining Technique for Classification of Mammograms to Detect Breast Cancer," International journal of computer science and communication technologies, vol. 2, no.3, pp.99-106, 2010.
- S.W. Purnami, J.M. Zain, & A. Embong, "Data Mining Technique for Medical Diagnosis Using a New Smooth Support Vector Machine," Communications in Computer and Information Science, 2010, pp.15-27.
- S. Kharya, "Using Data Mining Techniques for Diagnosis and Prognosis of Cancer Disease," International Journal of Computer Science, Engineering and Information Technology, vol.2, no.2, pp.55-66, 2012.
- M.C. Tu, D.Shin, D. Shin, "A Comparative Study of Medical Data Classification Methods Based on Decision Tree and Bagging Algorithms," International Conference on Dependable, Autonomic and Secure Computing, 2009, pp.183-187,
- P. Rajendran, M. Madheswaran, K. Naganandhini, "An Improved Pre-Processing Technique with Image Mining Approach for the Medical Image Classification," Second International conference on Computing, Communication and Networking Technologies, 2010, pp.183-187.
- N.H. Rajini, R. Bhavani, "Classification of MRI Brain Images using k- Nearest Neighbor and Artificial Neural Network", International Conference on Recent Trends in Information Technology, 2011, pp.863-868.
- N. Suguna, K. Thanushkodi, "An Improved k-Nearest Neighbor Classification Using Genetic Algorithm," International Journal of Computer Science Issues, vol.7, no.4, pp.18-21, 2010.
- V. Uher, R. Burget, "Automatic 3D segmentation of human brain images using data-mining techniques," International Conference on Telecommunications and Signal Processing, 2012, pp.578-580.
- C.L. Devasena, M. Hemalatha, "A Hybrid Image Mining Technique using LIM-based Data Mining Algorithm," International Journal of Computer Applications, vol.25, no.2, pp.26-3, 2011.
- W. Moudani, A.R. Sayed, "Efficient Image Classification using Data Mining," International Journal of Combinatorial Optimization Problems and Informatics, vol.2 no.1, pp. 27-44, 2010.
- B.G. Prasad, A.N. Krishna, "Classification of medical image using data-mining techniques," Advances in Communication, Network, and Computing, 2012, pp.54-59.
- A.K. Mohanty, M.R. Senapati, S.K. Lenka, "An improved data mining technique for classification and detection of breast cancer from mammograms," International Journal of Neural Computing and Applications, vol. 22 no.6, pp.61-71, 2012.
- V.S. Tseng, M.H. Wang, J.H. Su, "A New Method for Image Classification by Using Multilevel Association Rules," Data Engineering Workshops, 2005, pp.1180-1187
- Y. Zhang. and Z. Dong. "A hybrid method for MRI brain image classification", Expert System with applications, pp. 10049-10053, 2011.
- A. H. Gondal, and M. N. A. Khan, "A review of fully automated techniques for brain tumor detection from MR images," International Journal of Modern Education and Computer Science (IJMECS), vol. 5, no. 2, pp. 55, 2013.
- A. Zia, and M. Khan, "A Scheme to Reduce Response Time in Cloud Computing Environment," International Journal of Modern Education and Computer Science (IJMECS), vol. 5, no. 6, pp. 56, 2013.
- A. Zia, and M. N. A. Khan, "Identifying key challenges in performance issues in cloud computing," International Journal of Modern Education and Computer Science (IJMECS), vol. 4, no. 10, pp. 59, 2012.
- M. A. Masood, and M. Khan, "Clustering Techniques in Bioinformatics," International Journal of Modern Education and Computer Science (IJMECS), vol. 7, no. 1, pp. 38, 2015.
- Abdul Salam Shah, M.N.A. Khan and Asadullah Shah, "An Appraisal of Off-line Signature Verification Techniques", International Journal of Modern Education and Computer Sciences (IJMECS), vol.7, no.4, pp. 67-75, 2015.