A Comparative Study on the Performance of Fuzzy Rule Base and Artificial Neural Network towards Classification of Yeast Data
Автор: Shrayasi Datta, J. Paulchoudhury
Журнал: International Journal of Information Technology and Computer Science(IJITCS) @ijitcs
Статья в выпуске: 5 Vol. 7, 2015 года.
Бесплатный доступ
Classification of yeast data plays an important role in the formation of medicines and in various chemical components. If the type of yeast can be recognized at the primary stage based on the initial characteristics of it, a lot of technical procedure can be avoided in the preparation of chemical and medical products. In this paper, the performance two classifying methodologies namely artificial neural network and fuzzy rule base has been compared, for the classification of proteins. The objective of this work is to classify the protein using the selected classifying methodology into their respective cellular localization sites based on their amino acid sequences. The yeast dataset has been chosen from UCI machine learning repository which has been used for this purpose. The results have shown that the classification using artificial neural network gives better prediction than that of fuzzy rule base on the basis of average error.
Protein Localization, Classification, Neural Network, Fuzzy Rule Base, Yeast Dataset
Короткий адрес: https://sciup.org/15012283
IDR: 15012283
Текст научной статьи A Comparative Study on the Performance of Fuzzy Rule Base and Artificial Neural Network towards Classification of Yeast Data
Published Online April 2015 in MECS
A cell usually contains approximate 1 billion (or 109) protein molecules [1], [2]. These protein molecules reside in various compartments of a cell which usually called “protein subcellular locations”. The information about these subcellular locations helps to know the functions of the cell and the biological process executed by the cells. This information also has been used for the identification of drug targets ([3], [4]). Determining the subcellular localization of a protein by conducting bio-chemical experiments is a laborious and time consuming task. But with the development of machine learning techniques [5] in computer science, together with an increased dataset of proteins of known localization, fast and accurate localization predictions for many organisms have been done successfully. This is due to the nature of machine learning approaches, which performed well in domains where there is a vast collection of data but with a little theory –which perfectly describes the situation in bioinformatics [5]. Among various prokaryotic and eukaryotic organisms, yeast is important because these are widely used in medicine and in food technology field. Biological structure of yeast has also snatched the attention of researchers for many years because of their similarity with human cell.
For predicting the subcellular localization of yeast protein, the first approach has been developed by Kanehisa and Nakai([6],[7]). Horton and Nakai[8] have proposed a probabilistic model where expert has identified those features which learn its parameters from a set of training data. The authors also have implemented and tested three machine learning techniques namely k-nearest neighbor algorithm, binary decision tree, naïve Bayes classifier in yeast dataset and E.Coli dataset[9]. Performance of these three techniques with the Probabilistic method [8] has also been compared and it has been shown that the performance of k-nearest neighbor algorithm is better among these four. Chen Y.[10] has implemented three machine learning classification algorithms: decision tree, perceptron, two-layer feedforward network for predicting subcellular localization site of a protein of yeast and E.Coli dataset. And it is concluded that three techniques has similar performance measure for this two dataset. Qasim, R, Begum, K. Jahan, N. Ashrafi, T. Idris, S. Rahman, R.M. [11], have proposed an automated fuzzy interference system for protein subcellular localization. Bo Jin, Yuchun Tang, Yan-Qing Zhang, Chung-Dar Lu and Irene Weber [12], have proposed and designed SVM with fuzzy hybrid kernel based on TSK fuzzy model and have showed that fuzzy hybrid kernel has achieved better performance in SVM classification. Prediction of protein subcellular localization work has been done in ([13]-[16]). Out of these, support vector machine techniques have been used in ([13]-[15]). A lot of decent work also has been done on webserver design for subcellular prediction ([17]-[20]). Algorithm based on Fuzzy rule base technique is proposed in heart disease and in packet delivery time ([21]-[23]).
Classification is done with some widely used machine learning techniques, like, KNN, multilayered feed forward neural network, SVM etc.([6]-[16]), but most of the work is based on some comparison with other datasets, like E.Coli , fungi etc. They mostly have concentrated on the algorithm, i.e. which algorithm is best suited for classification task of medical datasets. But for a particular dataset, which algorithm is most efficient has not been checked. And that is why the work described in this paper has been taken. Here, a popular and very important protein subcellular localization dataset, yeast, has been taken for classification, and multilayered feed forward neural network and fuzzy rule base technique has been used and compared for classification task. Yeast dataset from UCI machine learning laboratories has been used in this paper. Each input of the dataset corresponds to a protein. The output is the predicted localization site of a protein. After the implementation, performance of the two techniques has been evaluated and compared on the basis of average error.
In this research work, the yeast data set obtained from UCI machine learning repository has been used[24]. The objective of this dataset is to determine the cellular localization of the yeast proteins. Yeast dataset, representing the kingdom of eukaryote, consists of 9 features (8 attributes, 1 sequence-name) .The attributes are mcg, gvh, alm, mit, erl, pox, vac, nuc. Each of the attributes has been used to classify the localization site of a protein which is a score (between 0 and 1) corresponding to a certain feature of the protein sequence. The higher the score is, the more possible the protein sequence has such feature. Proteins are classified into 10 classes, these are cytosolic or cytoskeletal (CYT), nuclear (NUC), mitochondrial (MIT), membrane protein without N-terminal signal (ME3), membrane protein with uncleaved signal (ME2), membrane protein with cleaved signal (ME1) , extracellular (EXC), vacuolar (VAC), peroxisomal (POX), endoplasmic reticulum lumen (ERL).
The paper is organized as follows, in section 1, the importance of this research work and a brief literature review is furnished. In section 2, a brief theoretical introduction is presented about the techniques used in this work with the description of the dataset used. Section 3 deals with the detailed procedure of the work and its result with error calculation. Finally, Section 4 concludes the paper.
-
II .Methodology
-
A. Artificial Neural Network.
Artificial neural network (ANN) follows a computational paradigm that is inspired by the structure and functionality of the brain. The ANN consists of an interconnected group of artificial neurons processing the information to compute the result.
-
B. Multilayered Feed Forward Neural Network
Multilayer Feed-forward ANNs (MLFFNN) is made of multiple layers. It possesses an input and an output layer and also has one or more intermediary layers called hidden layers (fig. 1). The computational units of the hidden layer are known as the hidden neurons or hidden units.

Fig. 1. A Multilayered feed forward neural network
-
C. Fuzzy Inference System
A fuzzy inference system (FIS) is a system that transforms a given input to an output with the help of fuzzy logic (fig. 2).The procedure followed by a fuzzy inference system is known as fuzzy inference mechanism or simply fuzzy inference.

Fig. 2. A fuzzy inference system
The entire fuzzy inference process consists of five steps. These are, fuzzification of the input variables, application of the fuzzy operators on the antecedent parts of rule, evaluation of the fuzzy rules, aggregation of the fuzzy sets across the rules, and defuzzification of the resultant aggregate fuzzy set.
-
D. Fuzzy Membership Function
Fuzzy membership function determines the membership functions of objects to fuzzy set of all variables. A membership function provides a measure of the degree of similarity of an element to a fuzzy set. There are different shapes of membership functions; triangular, trapezoidal, piecewise-linear, Gaussian, bellshaped, etc.
-
a. Trapezoidal Membership Function
It is defined by a lower limit a , an upper limit d , a lower support limit b , and an upper support limit c , where a < b < c < d .
0if ( x < a ) or ( x > d ) |
||
ц а ( x ) = • |
x — a т , , , ----ifa <= x <= b b - a |
. (1) |
1ifb <= x <= d d - x т _ . д ----ifc <= x <= d _ d - c |
-
b. Gaussian Membership Function
It is defined by a central value m and a standard deviation k > 0 . The smaller k is, the narrower the “bell” is.
- ( x - m ) 2 ц а ( x ) = e 2 k k
-
c. Triangular Membership function
It is defined by a lower limit
a
, an upper limit
b
, and a value m, where a
0 if ( x <= a ) |
||
x — a |
||
----if ( a <= x <= m |
||
Ц а ( x ) = • |
m - a |
\ (3) |
b - x , |
||
----ifm < x < b |
||
b - m |
||
0 ifx <= b |
-
E. Error Analysis
The performance of the two methods of classification has been evaluated by estimated error and average error.
Estimated error (E i ) of an individual instance i is given by (4) :-
Where, Pi is the output class value estimated for a given instance, Ti is the actual output class value for that instance.
Average Error is derived using (5):
n
A =12 Ei n i=1
Where E i is the Estimated error and n is the number of instances.
-
III. Implementation and Result
-
A. Implementation.
-
a. Dataset Preprocessing.
Step 1.
As stated previously, yeast dataset[24] consists of 10 numbers of attributes. At first the first attribute (sequence name) is discarded, as this attribute is not necessary for the classification task.
Step 2.
The output class names are of non-numeric type for example MIT, CYT, VAC etc. These are replaced by numeric value 1, 2, 3 etc. The class names with their replaced numeric values are listed in table 1.
Table 1. Class name and numerical value
Class name |
Numerical value |
MIT |
1 |
NUC |
2 |
CYT |
3 |
ME1 |
4 |
EXC |
5 |
ME2 |
6 |
ME3 |
7 |
VAC |
8 |
POX |
9 |
ERL |
10 |
Now the dataset consists of 9 attributes, out of which 8 attributes have been taken for input and the last one as class name. All the attributes have been changed to numerical value as furnished in table 1. Now the dataset is ready to be classified using artificial neural network and fuzzy rule base both.
-
b. Classification Using Fuzzy Rule Base.
Step 1.
One Fuzzy Inference System(FIS) with 8 inputs and 1 output has been used.
Step 2.
The range of the input and output variables are first retrieved and then decomposed based on the range of their values. These are furnished in table 2 to Table 8. It is to note that there are 8 attributes .these are mcg, gvh, aln, mit, erl, vac, nuc and pox. Out of these the attributes pox has not been used since this attribute contains 0.00 values in all the data sets.
Table 2. Classification of Attribute 1 (mcg)
Range |
Fuzzy set value |
0.42 to 0.64 |
Low1 |
0.33 to 0.61 |
Low2 |
0.40 to 0.73 |
Low3 |
0.91 to 0.70 |
Medium1 |
0.49 to 0.89 |
Medium2 |
0.54 to 0.94 |
Medium3 |
0.28 to 0.54 |
High1 |
0.28 to 0.80 |
High2 |
0.32 to 0.68 |
High3 |
0.7 to 0.86 |
Very high |
Table 3. Classification of Attribute 2 (gvh)
Range |
Fuzzy set value |
0.40 to 0.67 |
Low1 |
0.31 to 0.60 |
Low2 |
0.39 to 0.63 |
Low3 |
0.66 to 0.88 |
Medium1 |
0.39 to 0.87 |
Medium2 |
0.42 to 0.75 |
Medium3 |
0.24 to 0.58 |
High1 |
0.32 to 0.82 |
High2 |
0.27 to 0.68 |
High3 |
0.56 to 0.92 |
Very high |
Table 4. Classification of Attribute 3(aln)
Range |
Fuzzy set value |
0.45 to 0.66 |
Low1 |
0.43 to 0.69 |
Low2 |
0.42 to 0.60 |
Low3 |
0.30 to 0.47 |
Medium1 |
0.36 to 0.58 |
Medium2 |
0.33 to 0.58 |
Medium3 |
0.21 to 0.42 |
High1 |
0.26 to 0.57 |
High2 |
0.43 to 0.59 |
High3 |
0.38 to 0.58 |
Very high |
Table 5. Classification of Attribute 4(mit)
Range |
Fuzzy set value |
0.13 to 0.65 |
Low1 |
0.13 to 0.43 |
Low2 |
0.11 to 0.35 |
Low3 |
0.23 to 0.78 |
Medium1 |
0.23 to 0.37 |
Medium2 |
0.4 to 0.49 |
Medium3 |
0.12 to 0.31 |
High1 |
0.08 to 0.28 |
High2 |
0.10 to 0.49 |
High3 |
0.25 to 0.40 |
Very high |
Table 6. Classification of Attribute 5(erl)
Range |
Fuzzy set value |
0.00 to 0.1 |
low |
1.00 to 1.11 |
high |
Table 7. Classification of Attribute 7(vac)
Range |
Fuzzy set value |
0.22 only |
Low1 |
0.22 to 0.34 |
Low2 |
0.22 to 0.40 |
Low3 |
0.22 to 0.63 |
Medium1 |
0.22 only |
Medium2 |
0.22 to 0.35 |
Medium3 |
0.22 to 0.66 |
High1 |
0.22 to 0.40 |
High2 |
0.22 to 0.41 |
High3 |
0.53 to 0.58 |
Very high |
Table 8. Classification of Attribute 8(nuc)
Range |
Fuzzy set value |
0.46 to 0.53 |
Low1 |
0.47 to 0.68 |
Low2 |
0.49 to 0.58 |
Low3 |
0.43 to 0.58 |
Medium1 |
0.39 to 0.56 |
Medium2 |
0.40 to 0.59 |
Medium3 |
0.43 to 0.55 |
High1 |
0.39 to 0.60 |
High2 |
0.40 to 0.54 |
High3 |
0.53 to 0.58 |
Very high |
Based on the input and output data, a rule base has been created which has been furnished in table 9.
Now membership function has been applied to all input variables and output variable. Here, four combination of membership function for input and output variables has been applied. The combination has been listed in Table no 10. From table 10, it is to note that the input and output membership functions have been used Gaussian 2 for serial no 1. This means all input 8 attributes, Gaussian 2 membership function has been used for each rule. Similarly, this notation has been used for other rules.
Table 9. Rule base
Rule no. |
Rules |
1. |
If (att1 is low1) and (att2 is low1) and (att3 is low1) and (att4 is low1) and (att5 is a5) and (att6 is a6) and (att7 is low1) and (att8 is lowc1) then (output1 is class1) (1) |
2. |
If (att1 is low2) and (att2 is low2) and (att3 is low2) and (att4 is low2) and (att5 is a5) and (att6 is a6) and (att7 is low2) and (att8 is low2) then (output1 is class2) (1) |
3 |
If (att1 is low3) and (att2 is low3) and (att3 is low3) and (att4 is low3) and (att5 is a5) and (att6 is a6) and (att7 is low3) and (att8 is low3) then (output1 is class3) (1) |
4 |
If (att1 is medium1) and (att2 is medium1) and (att3 is medium1) and (att4 is medium1) and (att5 is a5) and (att6 is a6) and (att7 is medium1) and (att8 is medium1) then (output1 is class4) (1) |
5 |
If (att1 is medium2) and (att2 is medium2) and (att3 is medium2) and (att4 is medium2) and (att5 is a5) and (att6 is a6) and (att7 is medium2) and (att8 is lowc1) then (output1 is class5) (1) |
6 |
If (att1 is medium3) and (att2 is medium3) and (att3 is medium3) and (att4 is medium3) and (att5 is a5) and (att6 is a6) and (att7 is medium3) and (att8 is medium3) then (output1 is class6) (1) |
7 |
If (att1 is high1) and (att2 is high1) and (att3 is high1) and (att4 is high1) and (att5 is a5) and (att6 is a6) and (att7 is high1) and (att8 is high1) then (output1 is class7) (1) |
8 |
If (att1 is high2) and (att2 is high2) and (att3 is high2) and (att4 is high2) and (att5 is a5) and (att6 is a6) and (att7 is high2) and (att8 is high2) then (output1 is class8) (1) |
9 |
If (att1 is high3) and (att2 is high3) and (att3 is high3) and (att4 is high3) and (att5 is a5) and (att6 is a6) and (att7 is high3) and (att8 is high3) then (output1 is class9) (1) |
10 |
If (att1 is very_high) and (att2 is very_high) and (att3 is very_high) and (att4 is very_high) and (att5 is a5c10) and (att6 is a6) and (att7 is very_high) and (att8 is very_high) then (output1 is class10) (1) |
Table 10. Input and Output membership functions
Sl. No. |
Membership function for Input variable |
Membership function for Output variable |
1 |
Gaussian2 |
Gaussian2 |
2 |
Gaussian2 |
Triangular |
3 |
Trapezoidal |
Trapezoidal |
4 |
Trapezoidal |
Triangular |
The estimated output has been calculated based on the combination of membership functions as listed in table 10, and, using fuzzy rule base as furnished in Table 9 for all 50 data items. The output has been furnished in table 11.
Based on the actual output(available in the dataset) and estimated output(as calculated), estimated error has been calculated for all input-output membership functions and has been furnished in Table 12.
The average error for each combination of input-output membership function has been calculated which has been furnished in Table 13.
Table 11. Input and Output fuzzy values
Index no. |
Best output value in FIS |
Estimated output for Trapezoidal-Triangular combination for input-output membership function |
Estimated output for Trapezoidal-Trapezoidal combination for input-output membership function |
Estimated output for Gaussian2-Gaussian2 combination for input-output membership function |
Estimated output for Gaussian2-Triangular combination for input-output membership function |
1. |
0.1 |
0.1 |
0.1 |
0.473 |
0.463 |
2. |
0.1 |
0.1 |
0.1 |
0.396 |
0.411 |
3. |
0.1 |
0.1 |
0.1 |
0.475 |
0.48 |
4. |
0.1 |
0.1 |
0.1 |
0.465 |
0.462 |
5. |
0.1 |
0.196 |
0.195 |
0.483 |
0.484 |
6. |
0.1 |
0.1 |
0.1 |
0.472 |
0.472 |
7. |
0.2 |
0.2 |
0.2 |
0.517 |
0.523 |
8. |
0.2 |
0.5 |
0.5 |
0.398 |
0.399 |
9. |
0.2 |
0.317 |
0.315 |
0.555 |
0.555 |
10. |
0.2 |
0.2 |
0.2 |
0.465 |
0.465 |
11. |
0.2 |
0.462 |
0.461 |
0.524 |
0.526 |
12. |
0.2 |
0.345 |
0.347 |
0.59 |
0.489 |
13. |
0.2 |
0.622 |
0.628 |
0.59 |
0.569 |
14. |
0.2 |
0.3 |
0.3 |
0.59 |
0.561 |
15. |
0.3 |
0.341 |
0.34 |
0.451 |
0.449 |
16. |
0.3 |
0.2 |
0.2 |
0.453 |
0.448 |
17. |
0.3 |
0.333 |
0.333 |
0.447 |
0.446 |
18. |
0.3 |
0.3 |
0.3 |
0.548 |
0.547 |
19. |
0.3 |
0.34 |
0.342 |
0.541 |
0.543 |
20. |
0.3 |
0.346 |
0.345 |
0.463 |
0.445 |
21. |
0.3 |
0.2 |
0.2 |
0.463 |
0.552 |
22. |
0.3 |
0.391 |
0.39 |
0.463 |
0.552 |
23. |
0.4 |
0.4 |
0.4 |
0.54 |
0.538 |
24. |
0.4 |
0.4 |
0.4 |
0.54 |
0.489 |
25. |
0.4 |
0.4 |
0.4 |
0.54 |
0.552 |
26. |
0.5 |
0.5 |
0.5 |
0.52 |
0.526 |
27. |
0.5 |
0.5 |
0.5 |
0.604 |
0.602 |
28. |
0.5 |
0.5 |
0.5 |
0.604 |
0.602 |
29. |
0.6 |
0.6 |
0.6 |
0.612 |
0.612 |
30. |
0.6 |
0.559 |
0.559 |
0.484 |
0.482 |
31. |
0.6 |
0.457 |
0.455 |
0.505 |
0.499 |
32. |
0.7 |
0.2 |
0.2 |
0.708 |
0.711 |
33. |
0.7 |
0.2 |
0.2 |
0.663 |
0.661 |
34. |
0.7 |
0.2 |
0.2 |
0.662 |
0.66 |
35. |
0.7 |
0.2 |
0.2 |
0.662 |
0.66 |
36. |
0.7 |
0.2 |
0.2 |
0.637 |
0.631 |
37. |
0.7 |
0.5 |
0.5 |
0.745 |
0.747 |
38. |
0.7 |
0.7 |
0.7 |
0.698 |
0.703 |
39. |
0.7 |
0.3 |
0.3 |
0.504 |
0.498 |
40. |
0.7 |
0.5 |
0.5 |
0.748 |
0.749 |
41. |
0.8 |
0.5 |
0.5 |
0.61 |
0.604 |
42. |
0.8 |
0.561 |
0.565 |
0.551 |
0.551 |
43. |
0.8 |
0.5 |
0.5 |
0.523 |
0.518 |
44. |
0.8 |
0.5 |
0.5 |
0.562 |
0.557 |
45. |
0.9 |
0.5 |
0.5 |
0.5 |
0.5 |
46. |
0.9 |
0.5 |
0.5 |
0.0.5 |
0.5 |
47. |
0.9 |
0.5 |
0.5 |
0.5 |
0.5 |
48. |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
49. |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
50. |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
Table 12. Estimated error for input-output membership function combination
Index no. |
Estimated Error for Trapezoidal-Triangular combination for input-output membership function |
Estimated Error for Trapezoidal- Trapezoidal combination for input-output membership function |
Estimated Error for Gaussian2- Gaussian2 combination for inputoutput membership function |
Estimated Error for Gaussian2-Triangular combination for input-output membership function |
1. |
0.0 |
0.0 |
3.73 |
3.63 |
2. |
0.0 |
0.0 |
2.96 |
3.10 |
3. |
0.0 |
0.0 |
3.75 |
3.8 |
4. |
0.0 |
0.0 |
3.65 |
3.61 |
5. |
0.96 |
0.95 |
3.83 |
3.84 |
6. |
0.0 |
0.0 |
3.71 |
3.71 |
7. |
0.0 |
0.0 |
1.585 |
1.615 |
8. |
1.499 |
1.49 |
0.99 |
0.995 |
9. |
0.585 |
0.575 |
1.775 |
1.775 |
10. |
0.0 |
0.0 |
1.325 |
1.325 |
11. |
1.31 |
1.306 |
1.61 |
1.63 |
12. |
0.72 |
0.73 |
1.949 |
1.44 |
13. |
2.11 |
2.13 |
1.949 |
1.84 |
14. |
0.499 |
0.49 |
1.949 |
1.805 |
15. |
0.136 |
0.133 |
0.50 |
0.49 |
16. |
0.333 |
0.333 |
0.51 |
0.49 |
17. |
0.11 |
0.11 |
0.49 |
0.48 |
18. |
0.0 |
0.14 |
0.82 |
0.82 |
19. |
0.133 |
0.0 |
0.80 |
0.81 |
20. |
0.15 |
0.149 |
0.54 |
0.48 |
21. |
0.333 |
0.333 |
0.54 |
0.84 |
22. |
0.30 |
0.30 |
0.54 |
0.84 |
23. |
0.0 |
0.0 |
0.35 |
0.345 |
24. |
0.0 |
0.0 |
0.35 |
0.222 |
25. |
0.0 |
0.0 |
0.35 |
0.38 |
26. |
0.0 |
0.0 |
0.0.4 |
0.052 |
27. |
0.0 |
0.0 |
0.207 |
0.203 |
28. |
0.0 |
0.0 |
0.207 |
0.203 |
29. |
0.0 |
0.0 |
0.02 |
0.02 |
30. |
0.068 |
0.06 |
0.19 |
0.19 |
31. |
0.23 |
0.24 |
0.15 |
0.16 |
32. |
00.71 |
00.71 |
0.011 |
0.01 |
33. |
00.71 |
00.71 |
0.05 |
0.055 |
34. |
00.71 |
00.71 |
0.054 |
0.0571 |
35. |
00.71 |
00.71 |
0.054 |
0.0571 |
36. |
00.71 |
00.71 |
0.08 |
0.0985 |
37. |
00.28 |
00.28 |
0.06 |
0.0671 |
38. |
0.0 |
0.0 |
0.00 |
0.004 |
39. |
0.571 |
0.571 |
0.27 |
0.2885 |
40. |
00.28 |
00.28 |
0.06 |
0.070 |
41. |
0.375 |
0.375 |
0.23 |
0.245 |
42. |
0.298 |
0.293 |
0.3112 |
0.311 |
43. |
0.375 |
0.375 |
0.346 |
0.3525 |
44. |
0.375 |
0.375 |
0.2975 |
0.3037 |
45. |
0.44 |
0.44 |
0.44 |
0.44 |
46. |
0.44 |
0.44 |
0.44 |
0.44 |
47. |
0.44 |
0.44 |
0.44 |
0.44 |
48. |
0.5 |
0.5 |
0.5 |
0.5 |
49. |
0.5 |
0.5 |
0.5 |
0.5 |
50. |
0.5 |
0.5 |
0.5 |
0.5 |
Table 13. Average error for input and output membership function
Sl. No. |
Membership function for Input variable |
Membership function for Output variable |
Average Error |
1. |
Trapezoidal |
Triangular |
0.36806 |
2. |
Trapezoidal |
Trapezoidal |
0.3751 |
3. |
Gaussian2 |
Gaussian2 |
0.92 |
4. |
Gaussian2 |
Triangular |
2.59 |
From Table 13, it has been observed that average error calculated using membership function for input variable as Trapezoidal and membership function for output variable as Triangular is minimum. Therefore the inputoutput membership function combination as trapezoidal-Triangular has to be used for classification of yeast data when using fuzzy rule base.
-
c. Classification Using Multi-Layered Feed Forward Artificial Neural Network.
Step 1.
16.
0.3
0.3989
0.329
17.
0.3
0.3121
0.040
18.
0.3
0.2569
0.14
19.
0.3
0.2970
0.01
20.
0.3
0.2764
0.07
21.
0.3
0.3558
0.186
22.
0.3
0.3626
0.208
23.
0.4
0.40680
0.016
24.
0.4
0.4393
0.09
25.
0.4
0.4638
0.15
26.
0.5
0.5002
0.00
27.
0.5
0.5035
0.00
28.
0.5
0.5035
0.00
29.
0.6
0.6093
0.01
30.
0.6
0.6284
0.04
31.
0.6
0.6322
0.05
32.
0.7
0.6951
0.006
33.
0.7
0.7229
0.032
34.
0.7
0.7563
0.080
35.
0.7
0.7563
0.080
36.
0.7
0.7507
0.072
37.
0.7
0.7302
0.0431
38.
0.7
0.7112
0.016
39.
0.7
0.7728
0.104
40.
0.7
0.7071
0.0101
41.
0.8
0.7115
0.11
42.
0.8
0.6794
0.15
43.
0.8
0.6817
0.14
44.
0.8
0.5080
0.365
45.
0.9
0.8808
0.021
46.
0.9
0.8702
0.033
47.
0.9
0.9029
0.003
48.
1.0
0.99
0.01
49.
1.0
1.01
0.01
50.
1.0
1.09
0.09
In order to improve the performance, the feed forward back propagation neural network (8 input node,10 hidden node and 1 output node) has been used.
Table 14. Neural Network characteristics
Architecture |
Multilayer feedforward neural network (MLFNN) |
Training Method |
Backpropagation training algorithm |
Learning method |
Supervised Learning |
Activation function |
sigmoid |
It is to note that from 1484 samples, 154 number of samples has been taken for training and 102 number of samples for tested. From those, estimated data and estimated error of total 50 samples have been furnished in Table 15. The average error has been found as 0.3416.
Table 15. Estimated output and Estimated error using MLFFNN
Index no. |
Best output value in neural network |
Estimated output |
Estimated Error using ANN |
1. |
0.1 |
0.0916 |
0.08 |
2. |
0.1 |
0.3117 |
2.116 |
3. |
0.1 |
-0.0183 |
1.183 |
4. |
0.1 |
0.0774 |
0.22 |
5. |
0.1 |
0.4646 |
3.646 |
6. |
0.1 |
-0.0085 |
1.085 |
7. |
0.2 |
0.3071 |
0.535 |
8. |
0.2 |
0.2609 |
0.3045 |
9. |
0.2 |
0.3909 |
0.9545 |
10. |
0.2 |
0.4819 |
1.409 |
11. |
0.2 |
0.2217 |
0.108 |
12. |
0.2 |
0.1435 |
0.28 |
13. |
0.2 |
0.2452 |
0.225 |
14. |
0.2 |
0.2648 |
0.323 |
15. |
0.3 |
0.3335 |
0.111 |
-
B. Result.
A comparative study has been made on the basis of average error of fuzzy rule base using Trapezoidal-Triangular (input-output) membership function and neural network. The result has been furnished in table 16.It has been observed that multilayer feed forward back propagation neural network is more preferable than fuzzy rule base. Therefore multilayer feed forward
Back propagation neural network can be used for classification using yeast data.
Table 16. Methodology versus average error
Methodology |
Average Error |
Fuzzy rule base with Trapezoidal- as input and Triangular membership function for output |
0.36806 |
Multilayered feed forward neural network |
0.34158 |
-
IV. Conclusion and Future Scope
In this work, two methods for classifying the yeast dataset have been evaluated using MATLAB. And it is concluded that multilayered feed forward neural network is more suitable for this classification. In fuzzy rule base it has been further observed that Fuzzy rule base with Trapezoidal membership function as input and Triangular membership function for output is preferable than other combination of membership functions. The same technique may be used in other classification problems.
Список литературы A Comparative Study on the Performance of Fuzzy Rule Base and Artificial Neural Network towards Classification of Yeast Data
- B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, J.D. Watson, Molecular Biology of the Cell, Garland, New York, 1994.
- H. Lodish, D. Baltimore, A. Berk, S.L. Zipursky, P. Matsudaira, J. Darnell, Molecular Cell Biology, Scientific American Books, New York, 1995
- Z.-P. Feng, An overview on predicting the subcellular location of a protein, Silico. Biol. 2 (3) (2002) page 291–303.
- Q. Cui, T. Jiang, B. Liu, S. Ma, Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms, BMC Bioinformatics 5 (1) (2004) 1–7.
- Shavlik, J., Hunter, L. & Searls, D. (1995).Introduction. Machine Learning, 21: 5-10.
- Nakai and Kanehisa . 1991.”Expert system for predicting protein localization sites in gram negative bacteria”,PROTEINS,structure,function and genetics,11:95-110.
- Nakai and Kanehisa 1992, A knowledge base for predicting protein localization sites in eukaryotic cells.Genomics, 14:897-911.
- Horton and Nakai,1996:A probabilistic classification system for predicting of cellular localization of sites of protein.In Proceedings of Fourth International Conference on Intelligent Systems for Molecullar Biology.109-115.St. Louis.AAAI Press.
- Paul Horton , Kenta Nakai, “Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier”, Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, p.147-152, June 21-26, 1997
- Yetian Chen, Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks, http://www.cs.iastate.edu/~yetianc/cs572/files/CS572_Project_YETIANCHEN.pdf.unpublished.
- Qasim, R, Begum, K. ; Jahan, N. ; Ashrafi, T. ; Idris, S. ; Rahman, R.M.:” Subcellular localization of proteins using automated fuzzy inference system”, published at Informatics, Electronics & Vision (ICIEV), 2013 International Conference on May 2013,pages 1-5
- Support Vector Machine with the Fuzzy Hybrid Kernel for Protein Subcellular Localization Classification “;Bo Jin, Yuchun Tang, Yan-Qing Zhang, Chung-Dar Lu and Irene Weber; The 2005 IEEE International Conference on Fuzzy Systems;pages 420-423.
- X.-B. Zhou, C. Chen, Z.-C. Li, and X.-Y. Zou;”Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine”; Amino Acids (2008) 35: 383–388.
- Ana Carolina Lorena, André C.P.L.F. de Carvalho:”Protein cellular localization prediction with SupportVector Machines and Decision Trees”; Computers in Biology and Medicine 37 (2007) 115 – 125.
- Jing Huang and Feng Shi, “support vector machines for predicting apoptosis proteins types”; Acta Biotheoretica (2005) 53: 39–47; Springer 2005.
- Ru-Ping Liang, Shu-Yun Huang, Shao-Ping Shi, Xing-Yu Sun, Sheng-Bao Suo, Jian-Ding Qiu:” A novel algorithm combining support vector machine with the discrete wavelet transform for the prediction of protein subcellular localization”; Computers in Biology and Medicine 42 (2012) 180–187.
- K.C. Chou and H.B. Shen, “Euk-Mploc: A Fusion Classifier for Large-Scale Eukaryotic Protein Subcellular Location Prediction by Incorporating Multiple Sites,” J. Proteome Research, vol. 6, no. 5, pp. 1728-1734, 2007.
- H.B. Shen and K.C. Chou, “Nuc-Ploc: A New Web-Server for Predicting Protein Subnuclear Localization by Fusing Pseaa Composition and Psepssm,” Protein Eng. Design and Selection,vol. 20, no. 11, pp. 561-567, 2007.
- K.C. Chou and H.B. Shen, “Large-Scale Plant Protein Subcellular Location Prediction,” J. Cellular Biochemistry, vol. 100, no. 3, pp. 665-678, 2007.
- H.B. Shen and K.C. Chou, “Gpos-Ploc: An Ensemble Classifier for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins,” Protein Eng. Design Selection, vol. 20, no. 1, pp. 39-46, 2007.
- P.S. Banerjee, J.Palchoudhury, S.R. Bhadra Choudhury, “Fuzzy membership function as a Trust Based AODV for MANET”, I.J. Computer Network and Information Security,2013,12,27-34.
- M. Barman, J Palchoudhury, S. Biswas,”A Framework for the Neuro Fuzzy Rule Base System in the diagonosis of heart disease”, International journal of Scientific and Engineering Research,vol-4,Issue 11,November 2013.
- M. Barman, J Palchoudhury, “A Framework for Selection of Membership Function Using Fuzzy Rule Base System for the Diagnosis of Heart Disease”,I.J. Information Technology and Computer Science, vol 5, no. 11,October 2013,pages 62-70..
- UCI machine learning repository,: http://archive.ics.uci.edu/ml.