Научные статьи \ Прикладные науки. Медицина. Технология \ Oтрасли промышленности и ремесла для изготовления и обработки различных изделий

Performance Improvement of Plant Identification Model based on PSO Segmentation

Автор: Heba F. Eid

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 2 vol.8, 2016 года.

Бесплатный доступ

Plant identification has been a challenging task for many researchers. Several researches proposed various techniques for plant identification based on leaves shape. However, image segmentation is an essential and critical part of analyzing the leaves images. This paper, proposed an efficient plant species identification model using the digital images of leaves. The proposed identification model adopts the particle swarm optimization for leaves images segmentation. Then, feature selection process using information gain and discritization process are applied to the segmented image's features. The proposed model was evaluated on the Flavia dataset. Experimental results on different kind of classifiers show an improvement in the identification accuracy up to 98.7%.

Еще

Plant identification, Segmentation, Particle Swarm Optimization, Information Gain, Discretization

Короткий адрес: https://sciup.org/15010796

IDR: 15010796

Текст научной статьи Performance Improvement of Plant Identification Model based on PSO Segmentation

Published Online February 2016 in MECS

Plants play an important role in preserving earth ecology and balance of the environment. However, identifying plant species is a challenging task considering the large number of existing species. Also, plant identification is difficult and time consuming due to species similarity and variability.

Plants leaves are one of the important organs of the plant. Leaf based plant identification models are preferred; due to leaf easiness to access, carry and process. Several researches have been proposed for plants identification using its leaves [1, 2].

However, a key issue of developing such plant identification models lies in extracting leaves features; which have good ability to classify the different kinds of species. To reach a highest performance of plant identification; image segmentation is an important part of extracting information from leaves images. Thus, the selection of the image segmentation technique is very critical [3].

Swarms intelligence (SI) has been applied in numerous fields including optimization. For which, particle swarm optimization (PSO) has been successfully applied for solving many optimization problems [4].

This paper presents a plant identification model based on the information extracted from leaves images. The proposed model adopts the PSO-segmentation to segment the digital leaf images. Then, two preprocessing phases are performed before classification; features selection and discretization. The effectiveness of the proposed plant identification model is evaluated by conducting several experiments on the flavia data set using three different classifiers.

The rest of this paper is organized as follows: section II gives an overview of image segmentation and particle swarm optimization (PSO). Section III and IV discuss the concept of feature selection and discritization respectively. While, section V describes the proposed framework of the plant identification model. Section VI presents the Flavia dataset. The experimental results and conclusions are presented in Section VII and VIII respectively.

II. Image Segmentation

Image segmentation is a processing task that aims to locate different objects and boundaries in the image content [5]. Its goal is to partition an image into multiple segments "sets of pixels" that are more meaningful to analyze [6]. For which, the image is divided into two parts: background and foreground, where the foreground is the interesting objects and the background is the rest of the image. All the pixels in the foreground are similar with respect to a specific characteristic, such as intensity, color, or texture [7].

Image segmentation methods have been classified into numerous approaches: Threshold Based Image

Segmentation [8], Region Based Image Segmentation [9], Edge Based Image Segmentation [10] and cluster Based Image Segmentation [11].

Swarm Intelligence techniques have been used for different number of applications and got good performances [12-15]. Particle Swarm Optimization (PSO) can be applied for the threshold based segmentation, where the PSO technique is used to search the optimal threshold for the segmentation process.

A. Particle Swarm Optimization (PSO)

Particle Swarm Optimization (PSO) is an evolutionary method developed by Kennedy and Eberhart in 1995 [16]. PSO simulates the social behavior of bird flocking. The swarm is initialized with a random population of particles; where each particle of the swarm represents a candidate solution in the search space. However, to find the best solution, each particle changes its searching direction according to: The individual best previous (pbest), represented by Pi=(pi1,pi2,...,pid) ;

position and the

global best position of the swarm (gbest) G_i =( g_i ₁, g_i ₂,..., g_id ) [17].

The position for the particle i at iteration t for d-dimensional can be represented as:

t tt

X = X 1 , X 2

t xid

While, the velocity for the particle i at iteration t is given by vi = v 11, vi2,..", vid (2)

The uncertainty about the value of Y is measured by its entropy; when Y and X are discrete variables that take values y ...y and x ...x

H ( Y ) = - 2 P ( Z )log 2 ( P ( Z )) (6)

i = 1

Where P ( y ) is the prior probabilities for all values of Y .

The uncertainty about the value of Y after observing values of X is given by the conditional entropy of Y given X nk

H ( Y \ X ) = - ^ P ( x_y ) ^ P ( y , \ x_; )log 2 ( P ( Z i \ X j )) ⁽⁷⁾

j = 1 i = 1

The particle updates its velocity according to:

t+1 t t

where P(y |x ) is the posterior probabilities of Y given the values of X.

Thus, the information gain is:

d=1,2,...,D

Where, w is the inertia weight and r1 and r2 are random numbers distributed in the range [0, 1]. Positive constant c1 and c2 represent the cognition learning factor and the social learning factor. p^t_id denotes the best previous position found so far for the i^th particle and g^t_iddenotes the global best position so far [18].

Each particle at the swarm moves to its new potential position based on the following equation:

t+1

t t+1

xid⁺vid

d=1,2,...,D

III. Feature Selection

Feature selection (FS) can be used as a preprocessing phase before classification. It aims to improve the classification performance through the removal of redundant and irrelevant features. FS methods select a new subset of features from the original ones [19].

Feature selection methods fall into two categories; based on the evaluation criteria [20]: filter approach and wrapper approach. Filter approaches evaluate the new set of features according to the general characteristics of the data. For which, features are ranked based on certain statistical criteria. Frequently used FS methods include information gain (IG) [21].

A. Information Gain

The information gain (IG) [22] of an attribute X with respect to the class attribute Y is the reduction in uncertainty about the value of Y, after observing the values of X. IG is given by

IG = Y \ X (5)

IG (Y \ X) = H (Y) - H (Y \ X) (8)

Thereby, attribute X is regarded more correlated to class Y than attribute Z, if IG(Y|X)>IG(Y|Z).

IV. Discretization

Discretization is converting the continuous feature space into a nominal space [23]. The discretization process goal is to find a set of cut-points which partition the range into a small number of intervals. A cut-point is a real value within the continuous values range; and divides the range into two intervals. Thus, a continuous interval [a, b] is partitioned into [a, c] and (c, b], where c is a cut-point [24]. Discretization is usually performed as a pre-processing phase for classification.

Fayyad et al. [25] proposed the Information Entropy Maximization (IEM) discretization method. IEM method is based on the information entropy. For a set of instances S, feature A and a partition boundary T; the class information entropy E(A,T;S) is given by:

E(A,T;S) = ^ Ent (S1) + ^ Ent (S2) ⁽⁹⁾

Where Ent(S) is the class entropy of a subset S for k classes C1,^.,Ck, Ent(S) is given by

Ent(S ) = -^P(с.,S)log(P(C,S)) ⁽¹⁰⁾

V. Implementation of the Proposed Plant Leaf Identification Model

The framework of the proposed plant leaf identification model is shown in Fig 1. It comprises of the following four fundamental phases:

(1) Image PSO segmentation phase.
(2) Data reduction by IG feature selection phase.
(3) IEM discretization phase.
(4) finally, classification phase.

At the first phase; the plant leaves digital images are resized and segmented by the PSO segmentation. Then, the Histogram of Oriented Gradients (HOG) [26] is used to extract the segmented image features. HOG features are calculated by counting the occurrences of gradient orientation of edge intensity in localized portions of an image.

Secondly, the IG algorithm is applied as a feature selection method to reduce the dimensionality of the features extracted by HOG.

Fig.1. The Proposed Plant Leaf Identification Model

{TC "1 The Proposed Plant Leaf Identification Model" \f f}

At the third phase, the Information Entropy Maximization (IEM) discretization method is applied to the IG selected features. The IEM methos was proposed by Fayyad et al. [25]. IEM method criterions are based on the information entropy, where all the cut points should be set between points with different class labels. Finally, the discritized features are passed to the classifier; to identify the plant species.

VI. Flavia Dataset
The Flavia dataset helps researchers to judge the performance of their systems. Flavia dataset was collected by Wu et al. [27], it contains 1907 RGB leaf images of 32 species, each species has 40 to 60 sample leaves. Each image in the Flavia dataset has a white background and with no leafstalk. Samples of the flavia dataset images are given in Fig. 2. The latins and common names of the Falvia dataset are given in Table 1.

Table 1. Latins and common names of the Falvia dataset

label	Plant Latin Name	Plant Common Name
1	Phyllostachys edulis (Carr.) Houz.	pubescent bamboo
2	Aesculus chinensis	Chinese horse chestnut
3	Berberis anhweiensis Ahrendt	Anhui Barberry
4	Cercis chinensis	Chinese redbud
5	Indigofera tinctoria L.	true indigo
6	Acer Palmatum	Japanese maple
7	Phoebe nanmu (Oliv.) Gamble	Nanmu
8	Kalopanax septemlobus (Thunb. ex A.Murr.) Koidz.	castor aralia
9	Cinnamomum japonicum Sieb.	Chinese cinnamon
10	Koelreuteria paniculata Laxm.	goldenrain tree
11	Ilex macrocarpa Oliv.	Big-fruited Holly
12	Pittosporum tobira (Thunb.) Ait. f.	Japanese cheesewood
13	Chimonanthus praecox L.	wintersweet
14	Cinnamomum camphora (L.) J. Presl	camphortree
15	Viburnum awabuki K.Koch	Japan Arrowwood
16	Osmanthus fragrans Lour.	sweet osmanthus
17	Cedrus deodara (Roxb.) G. Don	deodar
18	Ginkgo biloba L.	ginkgo, maidenhair tree
19	Lagerstroemia indica (L.) Pers.	Crape myrtle, Crepe myrtle
20	Nerium oleander L.	oleander
21	Podocarpus macrophyllus (Thunb.) Sweet	yew plum pine
22	Prunus serrulata Lindl. var. lannesiana auct.	Japanese Flowering Cherry
23	Ligustrum lucidum Ait. f.	Glossy Privet
24	Tonna sinensis M. Roem.	Chinese Toon
25	Prunus persica (L.) Batsch	peach
26	Manglietia fordiana Oliv.	Ford Woodlotus
27	Acer buergerianum Miq.	trident maple
28	Mahonia bealei (Fortune) Carr.	Beale's barberry
29	Magnolia grandiflora L.	southern magnolia
30	Populus ×canadensis Moench	Canadian poplar
31	Liriodendron chinense (Hemsl.) Sarg.	Chinese tulip tree
32	Citrus reticulata Blanco	tangerine

{TC "2 Flavia dataset images sample" \f f}

VII. Experimental Results and Analysis

The flavia dataset is used for the evaluation of the proposed plant leaf identification model. For which, 10 leaves per species were used for testing purpose and 30 leaves per species were used as training. All experiments were performed using Intel Core i3 processor with 3 GB of RAM.

A. Performance Evaluation Measures

To evaluate the proposed leaf pant identification model, the performance accuracy of three different classifiers; J48, naive bayes (NB) and support vector machine (SVM) are measured. The classification performances are measured by the Recall, Precision and F-measure [28].

B. Results and Analysis

To experiment the proposed plant identification model; first we try to analyze the effect of PSO-segmentation before applying IG feature selection and discritization. For which, each digital image are segmented by PSO-segmentation; and the HOG features vectors are extracted from these segmented images. Then, three different categories of classifiers; J48, naive bayes and SVM classifier are used to measure the classification performance. The accuracy measures of the classifiers are given in Table 2.

Table 2. Classification Accuracy of PSO-segmented images {TC "1 Classification Accuracy of PSO-segmented images" \f t}

Classifier	TP Rate	FP Rate	Precision	Recall
J48	0.959%	0.004%	0.96%	0.959%
NB	0.821%	0.02%	0.849%	0.821%
SVM	0.915%	0.01%	0.923 %	0.915%

Case Study 1: Applying IG Feature Selection

Table 3 shows the accuracy performance of the three different classifiers, after applying IG feature selection on the HOG feature vectors.

Table 3. Classification Accuracy of Applying IG Feature Selection

{TC "2 Classification Accuracy of Applying IG Feature Selection" \f t}

Classifier	TP Rate	FP Rate	Precision	Recall
J48	0.962%	0.004%	0.962%	0.962%
NB	0.824%	0.02%	0.841%	0.824%
SVM	0.802%	0.029%	0.765%	0.802%

A performance comparison of the effect of applying the IG feature selection on HOG features is given in table 4. For which, the identification speed is improved and the classification accuracy achieved using J48 is increased to 96.15%.

where,

Recall =--- TP + FN

Precision =--------

TP + FP

· True positives (TP) refers to a correct prediction of the classifier.
· False positives (FP) and False negatives (FN) correspond to the classifier incorrect predicted.

And the F-measure is given by:

2 * Recall * Precision F - measure =------------------- Recall + Precision

Table 4. Comparison of F-measurs and speed for applying IG Feature selection

{TC "3 Comparison of F-measurs and speed for applying IG Feature selection " \f t}

Classifier	Non (64 Features)		IG (20 Features)
	F-measure	Time seconds	F-measure	Time seconds
J48	95.87%	0.33	96.15%	0.08
NB	82.14%	0.04	82.42%	0.03
SVM	91.48%	0.31	80.21%	0.19

Case Study 2:Applying IG Feature Selection and Discritization

The detection accuracy of the three classifiers, after applying IEM discritization on the IG selected features vectors are shown in Table 5.

Table 5. Classification Accuracy of Applying IG FS and Discritization {TC "4 Classification Accuracy of Applying IG FS and Discritization "

\f t}

Classifier	TP Rate	FP Rate	Precision	Recall
J48	0.942%	0.008%	0.945%	0.942%
NB	0.904 %	0.011%	0.911%	0.904%
SVM	0.997%	0.001%	0.997%	0.997%

Table 6, gives the comparison impact of applying IEM discritization. It is clear from Table 6 that; identification performance of the Naive bayes and SVM classifier is increased to 90.38% and 98.72% respectivily. While, the identification speeds for the whole three different classifiers are improved.

Table 6. Comparison of F-measurs and speed for applying IG and discritization

{TC "5 Comparison of F-measurs and speed for applying IG and discritization " \f t}

Classifier	Non		IG and Discritization
	F-measure	Time seconds	F-measure	Time seconds
J48	95.87%	0.33	94.23%	0.03
NB	82.14%	0.04	90.38%	0.02
SVM	91.48%	0.31	98.72%	0.27

Fig.3. F-measures Comparison of the proposed plant identification model

Concerning to the proposed plant identification model, Figure 3 shows the F-measures of the three classifiers J48, Naive bayes and SVM; after applying PSO segmentation only without any pre-processing. Then, applying IG Feature selection and combining IG feature selection with IEM discritization. While Figure 4, gives the time speed comparision of the three classifiers.

Fig.4. Time speed Comparison of the proposed plant identification model

{TC "3 F-measures Comparison of the proposed plant identification model" \f f}

VIII. Conclusions

This paper develops an efficient and computational model for plant species identification using the digital images of plant leaves. The proposed identification model adopts the particle swarm optimization for segmenting the digital leaves images. To enhance the identification performance accuracy, several case studies are applied. For which, feature selection process using information gain (IG) is applied to the segmented image features. Then, combining IG with discritization process is applied. The proposed model was evaluated on the Flavia dataset. Experiments are conducted on three different kind of classifiers; J48, naïve bayes and SVM. The experimental results show an improvement of the F-measure accuracy up to 90.38% for naïve bayes and 98.7% for SVM.

Список литературы Performance Improvement of Plant Identification Model based on PSO Segmentation

A. Kadir, LE. Nugroho, A. Susanto, and PI. Santosa,"Neural Network Application on Foliage Plant Identification", International Journal of Computer Applications, vol. 29, pp.15-22, 2011.
T. Suk, J. Flusser, and P. Novotny, "Comparison of Leaf Recognition by Moments and Fourier Descriptors", Computer Analysis of Images and Patterns Lecture Notes in Computer Science, vol. 8047 , pp. 221-228 , 2013.
J. Acharya, S. Gadhiya, and K. Raviya, "Segmentation techniques for image analysis: A review", International Journal of Computer Science and Management Research, vol. 2, pp. 2278-733, 2013.
Vivek G, and V. Shetty, "Survey on Swarm Intelligence Based Optimization Technique for Image Compression",Int. J. of Innovative Research in Computer and Communication Engineering, vol. 3, pp.1058-1063,2015.
R.C. Gonzalez, and R.E. Woods, "Digital Image Processing", Prentice-Hall, Englewood Cliffs, NJ, 2002.
L. Shapiro, and G. C. Stockman, "Computer Vision", New Jersey, Prentice-Hall, 2001.
C. Hoi, and M. Lyu, "A novel log based relevance feedback technique in content based image retrieval", In Proc. ACM Multimedia, 2004.
A. kaur, and N. kaur, "Image Segmentation Techniques",International Research Journal of Engineering and Technology, vol.2, pp.944-947 , 2015.
H. G. Kaganami, and Z. Beij, "Region based detection versus edge detection", IEEE Transactions on Intelligent Information Hiding and Multimedia Signal Processing, pp. 1217-1221, 2009.
S. Lakshmi, and D. V. Sankaranarayanan, "A study of edge detection techniques for segmentation computing approaches", IJCA Special Issue on Computer Aided Soft Computing Techniques for Imaging and Biomedical Applications (CASCT), 2010.
Barghout, Lauren, and J. Sheynin. "Real-world scene perception and perceptual organization: Lessons from Computer Vision", Journal of Vision, vol. 13 pp.709-709 ,2013.
D. Karaboga, and B. Basturk. "On the performance of artificial bee colony (ABC) algorithm", Appl. Soft Comput., Vol. 8, pp. 687–697, 2008.
I. Brajevic, M. Tuba, and M. Subotic, "Performance of the improved artificial bee colony algorithm on standard engineering constrained problems", International journal of mathematics and computers in simulation, vol.5, pp. 135-143, 2011.
N. Ibrahim, H. E. M. Attia, Hossam E.A. Talaat, A. H. Alaboudy, “Modified Particle Swarm Optimization Based Proportional-Derivative Power System Stabilizer”, International Journal of Intelligent Systems and Applications, vol. 3, pp.62-76, 2015.
Hardiansyah, Junaidi, Yohannes MS, “Solving Economic Load Dispatch Problem Using Particle Swarm Optimization Technique”, International Journal of Intelligent Systems and Applications, vol. 12, pp.12-18, 2012.
R. Eberhart , and J. Kennedy," A new optimizer using particle swarm theory", In Proc. of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp.39-43,1995.
G. Venter, and J. Sobieszczanski-Sobieski, "Particle Swarm Optimization," AIAA Journal, vol. 41, pp. 1583-1589, 2003.
Y. Liu, G. Wang, H. Chen, and H. Dong, "An improved particle swarm optimization for feature selection", Journal of Bionic Engineering, vol.8, pp.191-200, 2011.
L. Yu and H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution," In Proceedings of the twentieth International Conference on Machine Learning, pp. 856-863, 2003.
H. F. Eid, M. A. Salama, A. Hassanien:
“A Feature Selection Approach for Network Intrusion Classification: The Bi-Layer Behavioral Based”, International Journal of Computer Vision and Image Processing , vol. 3, pp. 51-59 , 2013.
M. Ben-Bassat, "Pattern recognition and reduction of dimensionality," Handbook of Statistics II, North-Holland, Amsterdam, vol. 1, 1982.
T. Mitchell. Machine Learning. McGraw-Hill, 1997.
M. Mizianty, L. Kurgan, and M. Ogiela, “Discretization as the enabling technique for the Na?ve Bayes and semi-Na?ve Bayes-based classification", The Knowledge Engineering Review, vol. 25, pp. 421–449, 2010.
S. Kotsiantis, and D. Kanellopoulos, “Discretization Techniques: A recent survey",GESTS International Transactions on Computer Science and Engineering, vol.32, pp. 47-58, 2006.
U. Fayyad, and K. Irani, "Multi-interval discretization of continuous-valued attributes for classification learning", In Proceedings of the International Joint Conference on Uncertainty in AI. Morgan Kaufmann, San Francisco, CA, USA, pp. 1022–1027, 1993.
N. Dalal, and B. Triggs, "Histograms of oriented gradients for human detection", Computer Vision and Pattern Recognition, 2005.
S. Wu, S. Bao, E. Xu, X. Wang, F. Chang, and Q. L. Xiang, "A Leaf Recognition Algorithm for Plant Classification using Probabilistic Neural Network", The 7th IEEE International Symposium on Signal Processing and Information Technology,Cairo, Egypt,2007.
R. O. Duda, P. E. Hart, and D. G. Stork, "Pattern Classification", JohnWiley & Sons, USA, 2nd edition, 2001.

Еще