Prediction of Adsorption of Cadmium by Hematite Using Fuzzy C-Means Clustering Technique

Автор: Satyendra Nath Mandal, Suhit Sinha, Saptarisha Chatterjee, Sankha Subhra Mullick, Sriparna Das

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 12 vol.4, 2012 года.

Бесплатный доступ

Clustering is partitioning of data set into subsets (clusters), so that the data in each subset share some common trait. In this paper, an algorithm has been proposed based on Fuzzy C-means clustering technique for prediction of adsorption of cadmium by hematite. The original data elements have been used for clustering the random data set. The random data have been generated within the minimum and maximum value of test data. The proposed algorithm has been applied on random dataset considering the original data set as initial cluster center. A threshold value has been taken to make the boundary around the clustering center. Finally, after execution of algorithm, modified cluster centers have been computed based on each initial cluster center. The modified cluster centers have been treated as predicted data set. The algorithm has been tested in prediction of adsorption of cadmium by hematite. The error has been calculated between the original data and predicted data. It has been observed that the proposed algorithm has given better result than the previous applied methods.

Еще

Clustering, Fuzzy C-means Clustering, Random data set, Cluster center, Membership function, Time series prediction, Error analysis

Короткий адрес: https://sciup.org/15010343

IDR: 15010343

Текст научной статьи Prediction of Adsorption of Cadmium by Hematite Using Fuzzy C-Means Clustering Technique

Published Online November 2012 in MECS

Clustering technique is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset share some common trait - often proximity according to some defined distance measure. Machine learning typically regards data clustering as a form of unsupervised learning [1].Cluster analysis or data clustering is a branch in data analysis and implies a bundle of algorithms for unsupervised classification [2].Cluster analysis is the organization of a collection of patterns into clusters based on similarity [1]. Clustering is important to mine databases to construct relations among data and to transform relations into knowledge in the form of fuzzy rules[3].Clustering is useful in several exploratory pattern-analysis, grouping, decision-making, and machine-learning situations; including data mining, document retrieval, image segmentation, and pattern classification. However, in many such problems, there is little prior information available about the data, and the decision-maker must make as few assumptions about the data as possible. It is under these restrictions that clustering methodology is particularly appropriate for the exploration of interrelationships among the data points to make an assessment of their structure.

On the other hand, fuzzy sets have played a prominent role in the modeling of uncertainty in the processing of data and information [4] .Tanaka et.al[5], have modified the idea of fuzzy and proposed the fuzzy regression of a non-parameter approach for evaluating the relation between independent variables and dependent variables. The fuzzy regression for time series analysis has been used in forecasting by Watada[6] .In Song and Chissom ([7]-[8]) and Sullivan and Woodall[9] have developed the fuzzy time series models and they have applied their model in forecasting. Kim et. al. [10] has proved that the forecasting error using fuzzy model is better than the statistical regression. Chang ([11]-[13]) has used the fuzzy regression model in seasonal analysis.

Tseng et. al.[14] have obtained a reliable forecasting interval by using fuzzy ARIMA (Auto-Regressive Integrated Moving Average) method. Hwang et al [15] have modified the fuzzy time series models for forecasting of university enrollments. Castillo and Melin [16] have forecasted financial and economic time series data based on fuzzy fractal method. Stefano Serafin, Alessio Bertò, Dino Zardi [17] has divided into a series of geographical subsets from whole set of available data. In each subsets have been made by the elements which have similar precipitation pattern. Many researchers have used other fuzzy methods to predict the data ([17]-[22]).

In this paper, fuzzy C-mean cluster method has been used in Prediction of adsorption of Cadmium by Hematite. At first, a search space has been created by random number restricted by universe discourse of given data set. The original data set has been placed within search space. This data set has been taken as initial cluster center. The proposed algorithm has been applied on the search space based on initial cluster center and cluster center has been modified. The modified cluster center has been treated as predicted data. The mean absolute percentage error has been calculated between the original data and predicted data. It has been proved that the proposed method has given better result than others applied methods. This type of approach to predict data based on fuzzy clustering has never been used before. This is the reason for making this paper.

In the next section, the basic concepts and principles on time series prediction, clustering and more precisely on fuzzy c-mean clustering and the error analysis method used in this paper is discussed. Section 3, is devoted to the discussion of the methodology used in this paper for predicting the adsorption of cadmium by hematite. The detailed, step by step, implementation of the method, in the applied field, is presented in section 4. Section 5 provides the comparative result of the method with other applied methods. Finally, in Section 6, the conclusion and the scope for future extension of application of this method is discussed.

II. Theory2.1 Time Series
2.2 Clustering

Quantities that represent the values have been taken by a variable over a period such as a month, quarter, or year. Time series data is a series of statistical data that is related to a specific instant or a specific time period. Time series plotted from a data of monthly bookings for an airline is shown in Figure 1.

Fig. 1: Monthly bookings for an airline

Time series analysis comprises methods for analyzing time series data in order to meaningful statistics and other characteristics of the data. Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data analysis problems, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their education level, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A time series model will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values. To estimate the future values of the series, most authors, use the terms ‘forecasting’ and ‘prediction’ interchangeably and we follow this convention.

Clustering is the classification of similar objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. Cluster analysis is the organization of a collection of patterns is shown in figure 2.

Fig. 2: Stages in Clustering

Clustering based prediction models are appealing because clustering time series data captures relations and granular representations whereas, linear, statistical and neural network models capture function. Therefore clustering based prediction methods are more general than conventional methods. Clustering is useful in several exploratory pattern-analysis, grouping, decision- making, and machine-learning situations; including data mining, document retrieval, image segmentation, and pattern classification.

1) Fuzzy C-mean Clustering (FCM)

Traditional clustering approaches generate partitions; in a partition, each data point belongs to one and only one cluster. Hence, the clusters in a hard clustering are disjoint. Fuzzy clustering (also known as soft clustering) extends this notion to associate each data point with every cluster using a membership function [18]. The output of such algorithms is a clustering, but not a partition. Fuzzy set theory was initially applied to clustering in Ruspini [19]. Fuzzy clustering is a technique that integrates the fuzzy logic and the concept of clustering. The most popular fuzzy clustering algorithm is the fuzzy c -means (FCM) algorithm [3]. FCM was proposed by Dunn and Bezdek [20] and their variations including recent studies. FCM was originally introduced by Jim Bezdek[21] .

The FCM algorithm attempts to partition a finite collection of n elements X = { x 1 ,..., x n } into a collection of c fuzzy clusters with respect to some given criterion.

U = uz. 7 e [0,1], i = 1,..., n, j = 1,..., c

Given a finite set of data, the algorithm returns a list of c cluster centers C = { c ₁,..., c_c } and a partition matrix. Where, each element u_i j tells the degree to which element x i belongs to cluster c j . Like the k-means algorithm, the FCM aims to minimize an objective function. The standard function is:

u k ( x ) =

( ( m - 1 )

v d(centerk, x)^j [ d (center,, x) J

which, differs from the k-means objective function by the addition of the membership values u ij and the fuzzifier m. The fuzzifier m determines the level of cluster fuzziness. A large fuzzifier value results in smaller memberships u ij and hence, fuzzier clusters. In the limit m=1, the memberships u_i j converge to 0 or 1, which implies a crisp partitioning. In the absence of experimentation or domain knowledge, m is commonly set to 2. The basic FCM Algorithm, given n data point(x 1 , . . .,x n ) to be clustered, a number of c clusters with (c 1 , . . .,c n ) the center of the clusters, and m the level of cluster fuzziness.

2.3 Error analysis

Error analysis is an important part of prediction. A time series forecast cannot be expected to be perfect. It will surely and always have some prediction error. Calculation of error helps in analyzing the result obtained by the applied method. It is useful to analyze and summarize the accuracy of the forecasts. In this paper the predicted error and the corresponding average predicted error is calculated.

1) Predicted Error and Average Predicted Error

The Predicted error and average Predicted error are calculated using the formula:-

Predicted error = | (Predicted value - actual value)| / (actual value) * 100 %

Average Predicted error = (sum of Predicted errors) / (total no of errors).

III. Methodology

We used modified fuzzy c means clustering technique to design our algorithm; the basic outer structure of our technique is illustrated by a flowchart described as below.

The process will take a time series data set as input and will generate a predicted data set of that time series as output.

Fig. 3: Flowchart of our flow chart

IV. Implementation

To implement the algorithm, the data has been taken from T Singh, V Singh and S Sinha.[22]. In this paper, the adsorption of Cadmium by Hematite has been predicted. The adsorption is dependent on Cadmium concentration, temperature, pH, agitation rate, and the particle size of the hematite. The experiment has been set up by fifteen times with different values of parameters. The adsorption of cadmium for each setup has been furnished in table 1.

Table 1: Data Set -adsorption of Cadmium by Hematite

1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
103	103	61	57	52	62	66	54	43	55	60	66	55	58	67

The proposed algorithm has been implemented as follows:

Step 1: The original data set has been place in the sample space.

Fig. 4: Original data within search space

Step 2: Random data set has been generated and place in the search space.

Fig. 5: Random data set in the search space

Step 3: Using proposed algorithm, the adsorption of cadmium has been predicted and place in the search space predicted data set. The two different colors have been used for original and predicted value.

® _е® * • •


	®- ©♦ -ф
- ®
1 1 1	i 1	1

О 2 4 В 8 10 12 14 16

Fig. 6: Predicted, original and random data

Step 4: The original adsorption of cadmium and predicted adsorption of cadmium has been place in the search space together in different colors using the Gaussian’s curve equation a1*exp(-((x-b1)/c1)^2) + a2*exp(-((x-b2)/c2)^2.

Fig. 7: Predicted data and original data

Step 5: The graph has been generated using Gaussian formula i.e. a1*exp (-((x-b1)/c1)^2) are shown in fig 8.

Fig. 8: Curve representation of original and predicted data

Step 6: Finally, the original data and predicted data by single layer feed forward network, Adaptive Neuro Fuzzy System and Proposed method have been calculated and furnished in table 2.

Table 2: The mean absolute percentage error

Instance	Original Data values	Predicted data values by ANN	Predicted data values by ANFIS	Predicted data values by Proposed Algorithm
1	103	103.15	106.953	102.9267
2	103	104.797	109.427	102.5854
3	61	66.1753	58.638	60.9115
4	57	63.9655	60.6183	55.82025
5	52	50.0257	52.2948	51.92866
6	62	53.0571	62.3739	62.67114
7	66	46.7472	46.2464	67.78139
8	54	44.251	54.2082	53.52619
9	43	43.943	44.2244	42.19383
10	55	65.1223	54.6764	54.33279
11	60	67.0971	59.4698	59.91554
12	66	69.3485	65.9554	67.75711
13	55	66.2673	53.6896	54.66099
14	58	59.0288	48.3139	56.49532
15	67	63.0959	58.3298	67.22738

V. Result

T Singh, V Singh and S Sinha.[22] have used Single Layer Feed Forward Network and Adaptive Neuro-Fuzzy Interface System to predict the cadmium adsorption by Hematite. The average error of the two applied methods and proposed method is furnished in table 3.

Table 3: The average error of different methods

Model	Average Error
Single Layer Feed Forward Network	10.24%
Adaptive Neuro-Fuzzy Interface System	5.87%
Proposed method	1.128%

VI. Conclusion and Future Work

In this paper, an algorithm has been proposed based on fuzzy c-mean clustering technique. The algorithm has been applied on adsorption of Cadmium used by T Singh, V Singh and S Sinha. [22]. The result has been furnished in table 3. It has been prove that the proposed algorithm is given better result compare to other applied methods in same problem. To establish this method, more data set and other methods will be tested in future. This approach can be extended to time series data like weather prediction, industrial process, financial data and stock market analysis.

Acknowledgements

The authors would like to thank to the All India Council for Technical Education (F.No-1-51/RID/CA/28/2009-10) for funding this research work.

Список литературы Prediction of Adsorption of Cadmium by Hematite Using Fuzzy C-Means Clustering Technique

S. Miyamoto et al.: “Algorithms for Fuzzy Clustering”, STUDFUZZ 229, pp. 1–7, 2008. springerlink.com
Song Q, Chissom B.S, “Forecasting enrollments with fuzzy time Series part I. J Fuzzy Sets Syst 54:pp1–9,1993
A.K. Jain, Michigan State University; M.N. MURTY, Indian Institute of Science AND P.J. FLYNN, “Data Clustering: A Review”, The Ohio State University; ACM Computing Surveys, Vol. 31, No. 3, September ,1999
Zadeh L.A” Fuzzy sets Inform. Control”,8(3):pp338–353,1965.
Hisao Ishibuchi, Ken Nozaki, Hideo Tanaka, “Efficient fuzzy partition of pattern space for classification problems”, Elsevier Volume 59 issue 3, pages 295-304, 10th November 1993
Watada J ,”Fuzzy time series analysis and forecasting of sales volume”, 1992.
Song Q, Chissom B.S, “Fuzzy time series and its models”. Fuzzy Sets Syst 54:pp269–277, 1993
Song Q, Chissom B.S,” Forecasting enrollments with fuzzy time series”—part II. 62:pp1–8, 1994
Sullivan, J. H. and Woodall, W. H. , "A Comparison of Fuzzy Forecasting and Markov Modeling," Fuzzy Sets and Systems, 64(3), 279-293,1994.
Kim, M. J., Min, S. H., & Han, I. G. , “An evolutionary approach to the combination of multiple classifiers to predict a stock price index” Expert Systems with Applications, 31, 241–247,2006
Chang P.T, “Fuzzy seasonality forecasting”, Fuzzy Sets and Systems 90(1):pp1–10.
Chang P.T, Lee ES, Konz S.A., ”Applying fuzzy linear regression to VDT legibility”, Fuzzy Sets and Systems 80(2): pp197–204,1996
Chang SC, “The TFT–LCD industry in Taiwan: competitive advantages and future developments”, Technology in Society 27(2): pp199–215, 2005
Tseng FM, Tzeng GH, Yu HC, Yuan Benjamin JC,” Fuzzy ARIMA model for forecasting the foreign exchange market”. Fuzzy Sets and Systems 118(1):pp9–19,2001.
Jeng-Ren Hwang, Shyi-Ming Chen, Chia-Hoang Lee,”Handling forecasting problems using fuzzy time series”,Elsevier volume 100 issues 1-3, pages 217-228, 16th November 1998.
Castillo O, Melin P.A, “New-fractal approach for forecasting financial and economic time series” J IEEE, pp 929–934
Serafin Stefano, Bertò Alessio, Zardi Dino ,”Application Of Cluster Analysis Techniques To The Verification Of Quantitative Precipitation Forecasts”,pp395-398, http://www.map.meteoswiss.ch/map-doc/icam2005/pdf/poster-sesion-c/C18.pdf, date of access 15.04.2012.
Song Q, Chissom B.S,‖ Forecasting enrollments with fuzzy time series‖—part II. 62:pp1–8, 1994
Enrique H. Ruspini “A new approach to clustering”, Space Biology Laboratory, University of California, Los Angeles, USA, July 1969.
J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3: 32-57.
J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum Press, New York.
T Singh, V Singh and S Sinha,” Prediction of Cadmium Removal Using an Artificial Neural Network and a Neuro-Fuzzy Technique”, Mine Water and the Environment, Volume 25,Number4,pp214-219,20

Еще

Статья научная