Comparative study of convolutional neural network with word embedding technique for text classification
Автор: Amol C. Adamuthe, Sneha Jagtap
Журнал: International Journal of Intelligent Systems and Applications @ijisa
Статья в выпуске: 8 vol.11, 2019 года.
Бесплатный доступ
This paper presents an investigation of the convolutional neural network (CNN) with Word2Vec word embedding technique for text classification. Performance of CNN is tested on seven benchmark datasets with a different number of classes, training and testing samples. Test classification results obtained from proposed CNN are compared with results of CNN models and other classifiers reported in the literature. Investigation shows that CNN models are better suitable for text classification than other techniques. The main objective of the paper is to identify best-fitted parameter values batch size, epochs, activation function, dropout rates and feature maps values. Results of proposed CNN are better than many other classification techniques reported in the literature for Yelp Review Polarity dataset and Amazon Review Polarity dataset. For all the seven datasets, accuracy obtained by proposed CNN is close to the best-known results from the literature.
Convolutional Neural Network, Text Classification, Text mining, Word2Vec
Короткий адрес: https://sciup.org/15016616
IDR: 15016616 | DOI: 10.5815/ijisa.2019.08.06
Текст научной статьи Comparative study of convolutional neural network with word embedding technique for text classification
Published Online August 2019 in MECS
In recent year, the volume of text data is increasing rapidly due to digitalization [1]. Application field like life science, social media, business intelligence, healthcare etc needs text mining [2]. The purpose of text mining is multi-fold namely feature selection [1,3], feature extraction [4], web opinion mining [3,5] etc. Text classification applications are present in various domains namely social media [6,7], healthcare industries [8], business intelligence, image processing etc. In text classification system, words related to one specific concept are organized in a document. The main purpose of test classification is to assign a label to document based on the present words. Text classification is designed to assign predefined classes to text documents [9].
Increase in large volume of unstructured text created the need for efficient and effective algorithms on text mining such as classification and clustering [9]. Text classification is a challenging open problem in natural language processing [10]. The broad outline of classification methods is presented in [11]. Yang et al. reported the significance of five techniques for text categorization [12]. Techniques namely support vector machine [13], KNN [14], association-based classification [5], term gram model [15], Bayesian classification [16,17], and neural network [18] used for text classification. Each classification technique has some limitations. Support vector machine has limitations for scale-up with an increase in the number of documents. KNN requires more computation time and has challenges for high dimensions and high samples. Association based classification is subjected to the overfitting problem. The graph representation is a computational task for text classification using term graph model. Naive-Bayes works well only for low dimensions [18].
Better ability to learn complex feature representations is the main advantage of CNN over traditional machine learning approaches [8]. CNN approaches shown good results for many research domains. CNN is effective for applications of natural language processing (NLP) and achieved superior results in sentence modeling, search query retrieval, semantic parsing and other traditional NLP tasks [19]. CNN has given better results for many problems from the medical domain [8], social networks [6], image classification and image analysis [7]. CNN works better for image processing applications than many other neural network structures. CNN found superior for document classification [19]. CNN gives outstanding performance for computer vision. The important objective of CNN is its application to feature extractor [20]. CNN used to extract character to sentence level features for the performance of sentiment analysis [21].
Text classification is a supervised learning task in which labels are assigned to text documents [12]. CNN has the ability to learn complex feature representations which differs it from traditional machine learning approaches [13]. In literature, CNN effectively solved may application in different domains. General applications of CNN to text classification is a well-studied topic in the literature.
The paper presents convolutional neural network with word embedding technique for text classification. The main objective is divided into three sub-objectives,
-
1. To confirm the performance of CNN for text classification.
-
2. Researchers have suggested for investigation of new methods for improvement in accuracy of text classification. To perform a comparison of CNN with other classifiers for text classification. This identifies the need for improvement in text classifier.
-
3. To tune the hyperparameters of CNN to improve the results of text classification
To test the performance of CNN models, seven benchmark datasets with varying classes from 2 to 14 are used. Performance of the proposed CNN model is compared with CNN and other text classifiers reported in the literature.
In literature, different text classifiers are investigated for text classification such as Naïve Bayes, support vector machine, logistic regression, stochastic gradient descent, long short-term memory and hybrid models of these. Results indicate that CNN models give the best accuracy for all datasets except dataset 1. The performance of CNN is good to text classification.
Paper presents results of convolutional neural network with variations of hyperparameters namely activation function, batch size, epochs, dropout rates, feature map values for text classification. Results show that the accuracy is increased with increase in epochs and dropout rate to a certain number. Subsequently, it shows a reduction in accuracy. Performance of ReLU activation function is good for all the datasets. CNN accuracy increases with a number of feature maps.
The rest of this paper is organized as follows. Section II is about the discussion of related work on text classification using different machine learning algorithms including convolutional neural network. Section III demonstrated the proposed methodology. Experimental details, datasets, results and discussion are presented in section IV. Section V is present about conclusions.
-
II. Related Work
In literature, many techniques have experimented for text classification. Researchers used different datasets for performance testing of text classification algorithms.
In literature, many researchers used IMDB dataset with 2 classes for performance testing of proposed algorithms. Results of different techniques for text classification in IMDB dataset are available in [3, 5, 23-37]. IMDB dataset for web opinion mining is investigated in [5]. The paper shows the results of lexicon method with 71% classification accuracy. Y. He, et al. [23] tested on multidomain sentiment dataset and movie review data using self -training from labeled features. Oracle labeling and naïve Bayes achieved best results for sentiment analysis with 81.36% and 82.53% respectively. Better results are reported than existing weakly supervised sentiment classification. Various techniques and approaches for sentiment analysis and opinion mining are reported in [23]. The paper shows result using maximum entropy, naïve Bayes and support vector machine techniques. SVM achieved 82.9% classification accuracy which shows greater results than others. B. Pang, et al. [18] shows the results of unigram support vector machine method for sentiment analysis. Performance of this technique achieved 86.9% classification accuracy. Paper [3] used text classification for web forum opinions in multiple languages namely English and Arabic. Support vector machine method tested on IMDB dataset and achieved 88.04% classification accuracy. Performance of the method increased with better classification results than [5,22-24]. A. Maas, et al. [24] present a technique to learn word vectors for capturing semantic term by using a mixture of supervised and unsupervised techniques. The paper shows the result on various methods namely bag of words, Latent Semantic Analysis, Latent Dirichlet Allocation etc. Semantic bag of words achieved 88.28% classification results which shows better performance than others. Paper [25] represents for sentiment classification on IMDB dataset. Paper implements comprehensive attention recurrent neural network technique. Comprehensive Attention Long Short-Term Memory and Comprehensive Attention Gated Recurrent Network methods show 90.1% classification accuracy respectively. The technique improves classification performance compared with the standard recurrent method. Y. Long, et al. [26] implemented a cognitionbased attention model for sentiment analysis. Tested model on four datasets namely IMDB, Yelp14, Yelp13 and IMDB2 with a different number of classes. Paper [27] tested techniques on IMDB dataset and achieved 90.3% accuracy. The paper shows better results on recurrent convolution neural network with highway layers technique. Naïve Bayes and support vector machine classifiers tested on nine benchmark datasets [28]. Support vector machine with naïve Bayes feature bidirectional method achieved 91.2% classification result with improved performance. R. Johnson, et al. [29] evaluated one hot bidirectional long short-term memory technique on four benchmark datasets namely IMDB, 20-newsgroup, Elec, and RCV1. The paper shows 91.86% improved classification accuracy. Paper [30] shows fast dropout training technique for multilayer neural network, regression and classification. Technique evaluated on IMDB dataset and achieved 91.1% classification result. Jiachen Du et al. [31] introduced CNN attention approach with RNN. Tested on various five datasets with different class labels, vocabulary size and average length. Pretrain convolutional-recurrent attention network gives 92.1% result on IMDB dataset. P. Liu, et al. [32] implemented recurrent neural network (RNN) with multitasking learning for text classification. 91.3% classification accuracy achieved on IMDB dataset. Q. Le et al. [33] shows paragraph vectors perform better than bag of word and other text presentation technique. Achieved good results on several text analysis and text classification tasks. The paper shows 92.7% classification results on IMDB dataset. The paper [34] presented adversarial training methods for semi-supervised text classification. Tested methods on five datasets namely IMDB, RCV1, Rotten tomatoes, DBpedia and Elec. The paper shows result of virtual adversarial method with 94.09% accuracy. Performance of text classification improved than previously reported papers in the literature. J. Hong, et al. [35] surveyed and implemented various algorithms namely averaged paragraph vector, long short-term memory, logistic regression with paragraph vector and 2-layer multi-layer perceptron with paragraph vector.
Algorithms tested on three text datasets and achieved better performance for text classification. Performance of 2-layer multi-layer perceptron with paragraph vector algorithm gives 94.5% result which shows better than literature.
In literature, many researchers used Yelp polarity review dataset with 2 classes for performance testing of proposed algorithms. Paper [36-38] shows the result of different techniques for text classification in Yelp polarity review dataset. D. Yogatama, et al. [36] presented results on different models of naïve Bayes and long short-term memory. Tested results on six datasets with a different number of class labels and shows better performance than other literature. Discriminative Long Short-Term Memory model achieved 92.6% classification accuracy on Yelp polarity review dataset. The paper [37] evaluated various techniques on three datasets namely TripAdvisor, Yelp, Yelp* with a different number of classes. Performance of n-gram consistency activity* model achieved 91.09% than previously reported literature. Salinca [38] implemented various approaches for automatic sentiment classification by using four machine learning models and two feature extraction methods. Unigram stochastic gradient descent with stop words removing punctuations and handling negations method gives 92.6% classification results on Yelp polarity review dataset.
In literature, many researchers used Amazon review polarity dataset with 2 classes for performance testing of proposed algorithms. Paper [39-42] shows the result of different techniques for text classification in Amazon review polarity dataset. B. Nguy, et al. [39] used text classification for evaluating helpfulness purpose in Amazon reviews. The paper shows result of the deep learning models namely recurrent neural network and long short-term memory. One Layer recurrent neural network and two-layers LSTM achieved 65% result on Amazon review polarity dataset. Paper [40] implemented support vector machine and random forest for regression and classification. Tested models on Amazon review polarity dataset SVM and random forest achieved 57.7% and 75.4% classification accuracy respectively. M. Agarwal, et al. [41] represents Amazon reviews for predicting the usefulness of product review. The paper shows result of various methods namely multinomial model, stemming with stop words removed, logistic regression, bigrams and trigrams with 66.02%, 78.11%, 79.66% and 54.65% classification results respectively. J. Vijaybhaskar, et al. [42] used for sentiment classification or user review purpose. Tested models on Amazon review polarity dataset. Paper shows result on four methods namely multinomial naïve Bayes, logistic regression, stochastic gradient descent classifier and linear support vector machine with 84%, 92%, 91% and 93% classification results respectively.
In literature, many researchers used AG's News Topic Classification, Yahoo! Answers Topic Classification and DBpedia Ontology Classification Datasets with 4, 10 and 14 classes respectively for performance testing of proposed algorithms. Paper [36] shows result of different techniques for text classification in these datasets. D. Yogatama, et al. [36] presents results on different models of naïve Bayes and long short-term memory. Tested results on six datasets with different number of class labels and shows better performance than other literature. Discriminative Long Short-Term Memory model achieved 92.1%, 73.7% and 98.7% classification accuracy on AG's News Topic Classification, Yahoo! Answers Topic Classification and DBpedia Ontology Classification Datasets respectively.
In literature, many researchers used Amazon Review Polarity, Yelp Review Polarity, AG's News Topic Classification, Yahoo! Answers Topic Classification and DBpedia Ontology Classification Datasets with 2, 2, 4, 10 and 14 classes respectively for performance testing of proposed algorithms. Paper [43-46] shows result of different techniques for text classification in these datasets. C. Qiao, et al. [43] implemented two methods of region embeddings namely word context region embedding and context word region embedding for CNN for text classification. Tested on five datasets and achieved better performance than other methods from the literature. Context word region embedding achieved better results for 2, 4 and 14 classes of datasets. Word context region embedding achieved better results for 2 and 10 classes of datasets. The paper [44] implemented text classification on FastText with convolutional neural network. The paper shows better result on various datasets. Performance for text classification evaluated on two sub-techniques namely h=10 and bigram h=10. Paper [45-46] implemented CNN by using character level vector for text classification. Achieved better results on various datasets.
In literature, many researchers used IMDB datasets for performance testing of proposed algorithms. Paper [10, 47-50] shows result of different parameters on CNN for text classification in IMDB datasets. Paper [51] presents results of CNN (random, static, non-static) on MR, SST-1, SST-2, Subj, TREC, Irony, Opi, Tweet and Polite datasets. Different hyperparameters of CNN are tuned for better accuracy. Simple convolutional neural network with one layer of convolution, trained with top of Word2Vec embeddings, performs remarkably well on sentence classification tasks. Paper [52] represented an application of recurrent neural networks and CNN. Tested models on Reuters-21578 and RCV1-V2 datasets. CNN MODEL is used for extracting text features and RNN used for multi-label prediction.
-
III. Methodology and Implementation Details
This section presents algorithmic and implementation details of convolutional neural network and word embedding technique (Word2Vec) for text classification. Deeplearning4j library is used for implementing the techniques. It is distributed and open source library for deep learning algorithms in java programming language.
-
A. Word Embedding technique- Word2Vec
Classification algorithm requires input in numeric format. To implement CNN we need to convert words into numeric format. Word embedding techniques assign one-hot vector to each word in the document. This vector put unique index for each word except all zeros [51]. In literature, different word embedding techniques namely random vectors, Word2Vec, Glove (Global vectors for word representation), FastText [48] and Character vector [46] are experimented to construct input matrix for CNN classifier. Word2Vec is a predictive model [51]. Word2Vec is a two-layer neural network that processes text for numeric conversion. The goal of Word2Vec is to combine vectors of similar words in vector space. This vector is used as input for deep learning network or queried to detect the relationship between words. In this paper, skip-gram model is used to predicts source context-words from the target words [51]. Algorithm 1 describes the working of Word2Vec technique.
Parameters namely batch size, minimum word frequency, layer size and learning rate are initialized. Batch size refers to the number of words that the algorithm process at a time. Minimum word frequency is the minimum number of times a word must appear in the corpus. Layer size equals the number of dimensions in the feature space.
Algorithm1 : Working of Word2Vec technique
Input: D = input text document.
Output: IV = word vector document.
Step 1. Load D
Step 2. while ( D ! = null) do
Whitespace stripping.
Tokenization.
Initialize the model parameters.
Fit the model to given input.
Store vectors into W file.
Algorithm 1 . Working of Word2Vec technique.
Algorithm 2 : Convolutional neural network for text classification
E number of epochs, B number of batch size, F rate of feature maps, V size of word vectors and F S filter size.
Input: DTr = { d,, d,, d 3 ..., d „ } set of n training documents.
DTe = {d i, d 2, d3 ..., dn} set of n testing documents.
= word vector document.
Output: C = class label
Step 1. for each ( D j in DTr) do for E epochs do while eof (Dj) do
Transfer sentences into matrix considering B and W
Generate F from V and F s for kernel size on matrix
Produce max-pooling on each F
Perform fully connected layer with dropout
Save Dj into trained module
Step 2. for each ( Dj in DTe) do while eof (Dj) do
Transfer sentences into matrix with B and W
Load trained module
Evaluate Dj on trained module.
Algorithm 2 . Working of convolutional neural network for text classification.
-
B. Convolutional Neural Network
The output of Word2Vec in the format of numeric is given as input to convolutional neural network.
Tokenization, stemming and stop-word removal operations are performed on the input text. The proposed convolutional neural network model for text classification is on word level. The output layer provides a class label for the given input file and accuracy for training/testing. Algorithm 2 describes the working of convolutional neural network for text classification.
Convolutional neural network has experimented on various parameters. The proposed method used 3 convolutional layers with windows sizes 3, 4, 5 respectively and the stride is 2. After that, a max-pooling layer is activated to the whole sequence on each filter. Finally, the output of each kernel is merged into a unique vector and fed to a fully connected layer. L2 regularization and learning rate values are 0.1 and 0.001 respectively. Word2Vec vectors size is 300. Adam optimizer is used for experimentation. Experiments are conducted with different batch sizes 10, 20, 40, 60, 80, and 100 and different number of epochs 10, 50 and 100.
Sigmoid, Tanh, Softplus, SoftMax and ReLU are activation functions are tested. The best number of feature map is depending on datasets. 10, 50, 100, 200, 300 feature map values are used for each region size.
-
IV. Experimental Details, Results, and Discussion
This section presents experimental details on varying datasets, obtained results and discussion. To test the performance of the proposed methodology, seven benchmark datasets are used.
-
A. Dataset
Datasets vary with respect to the number of classes, number of training and testing samples. Datasets are collected for implementation purpose from [53]. The details are given in table 1.
Table 1. Datasets used for text classification
Name of Dataset |
Number of Classes |
Number of Train Sets |
Number of Test Sets |
|
Dataset 1 |
IMDB |
2 |
25,000 |
25,000 |
Dataset 2 |
Amazon review full score |
2 |
30,00,000 |
6,50,000 |
Dataset 3 |
Amazon review polarity |
2 |
3,000,000 |
650,000 |
Dataset 4 |
Yelp review polarity |
2 |
560,000 |
38,000 |
Dataset 5 |
AG news topic classification |
4 |
1,20,000 |
7,600 |
Dataset 6 |
Yahoo! Answers topic classification |
10 |
1,400,000 |
60,000 |
Dataset 7 |
DBpedia ontology classification |
14 |
1,800,000 |
200,000 |
Performance of classifiers is tested using accuracy measure (equation 1). Accuracy depends on true positive, true negative, false positive and false negative values.
. TPi+TN,
Accuracyt = (1)
TPi+TNt+FPi+FNt v '
Where, TP is True Positive, TN is True Negative, FP is False Positive and FN is False Negative.
B. Experimental Results of CNN
93.54%, 88.16%, 72.77% and 96.67% respectively. The accuracy is increased with increase in epoch up to a certain number. Subsequently, it shows a reduction in accuracy.
Experiment No. 1: Identifying the suitable number of epochs and batch size
An epoch is one forward pass and backward pass of all training samples. Table 2 shows the accuracy on seven benchmark datasets with varying batch size and epochs. For dataset 1, 4 and 5 best text classification accuracy is obtained at 100th epoch. For remaining datasets, the best accuracy obtained at 50th epoch. The best accuracy obtained for datasets are 90.01%, 62.45%, 94.28%,
Dataset 1, 3, 7 got better results on 100 batch size. Dataset 2, 4 achieve improved results on 40 batch. Dataset
5 and 6 achieved better results on 20 and 60 batch sizes respectively. There is no significant impact of batch size on the accuracy of the classifier. Batch size and epochs effects on the accuracy of datasets but it’s not common for all datasets.
Experiment No. 2: Comparison of different activation functions
Performance of CNN is tested with different activation functions such as Sigmoid, Tanh, SoftMax, ReLU and Softplus. Table 3 represent effects of activation functions on the accuracy of seven datasets. Results show that the experimented five activation functions gives good accuracy for all datasets except dataset 2 and 6.
For datasets 1, 3, 5 and 6 best results are obtained with ReLU activation function. For the remaining three datasets, hyperbolic tangent activation function gives the best result. Performance of ReLU activation function is good for all the datasets. It is best or close to best in all cases.
Experiment No. 3: Effect of dropout rate on the performance of CNN MODEL
Dropout is a regularization method used to reduce overfitting problem in full connected layer of convolutional neural network. Table 4 shows the results of convolutional neural network with varying dropout rate accuracy obtained with 0.5 dropout rate. The accuracy is from 0.1 to 0.7. For dataset 1, 3 and 4 best accuracy increased with increase in dropout rate to a certain number. obtained with 0.2 dropout rate. For dataset 2, 5 and 7 best Subsequently, it shows a reduction in accuracy.
Table 2. Accuracy (in %) on various batch size and epochs
No. of Epochs |
No. of Batch Size |
Dataset 1 |
Dataset 2 |
Dataset 3 |
Dataset 4 |
Dataset 5 |
Dataset 6 |
Dataset 7 |
10 |
10 |
76.56 |
54.00 |
76.23 |
74.34 |
76.34 |
61.36 |
83.34 |
20 |
77.20 |
55.65 |
77.87 |
75.87 |
76.89 |
62.38 |
84.04 |
|
40 |
76.43 |
54.20 |
78.20 |
77.85 |
78.43 |
64.43 |
86.64 |
|
60 |
76.33 |
52.34 |
79.26 |
78.65 |
80.60 |
65.22 |
88.95 |
|
80 |
77.60 |
53.10 |
82.20 |
79.98 |
81 |
65.23 |
89.17 |
|
100 |
78.89 |
54.89 |
85.11 |
83.42 |
82.87 |
66.85 |
92.53 |
|
50 |
10 |
80.55 |
55.55 |
87.13 |
84.92 |
84.10 |
69.33 |
92.94 |
20 |
82.10 |
60.10 |
87.32 |
85.44 |
85 |
71.83 |
95.44 |
|
40 |
83.45 |
62.45 |
88.23 |
89.68 |
85.76 |
68.84 |
95.81 |
|
60 |
85.76 |
57.70 |
90.52 |
90.84 |
86.80 |
72.77 |
94.29 |
|
80 |
88.32 |
60.25 |
93.65 |
90.44 |
87.90 |
69.55 |
94.35 |
|
100 |
88.30 |
61.31 |
94.28 |
92.34 |
87.43 |
64.56 |
96.67 |
|
100 |
10 |
88.44 |
56.60 |
94.23 |
91.17 |
88.10 |
60.56 |
94.87 |
20 |
89.20 |
62.33 |
92.45 |
92.69 |
88.16 |
69.23 |
96.46 |
|
40 |
89.61 |
60.20 |
91.44 |
93.54 |
87.30 |
69.57 |
95.72 |
|
60 |
89.50 |
60.76 |
90.33 |
90.78 |
87.40 |
70.45 |
95.34 |
|
80 |
89.40 |
60.25 |
90.76 |
90.78 |
87 |
68.84 |
95.24 |
|
100 |
90.01 |
60.30 |
89.25 |
89.76 |
86.23 |
67.66 |
94.34 |
Table 3. Comparison of five activation functions (Accuracy is in %)
Datasets |
Activation Functions |
||||
Sigmoid |
Tanh |
Softplus |
SoftMax |
ReLU |
|
Dataset 1 |
89.45 |
88.22 |
89.75 |
89.12 |
90.01 |
Dataset 2 |
62.15 |
62.58 |
62.35 |
62.45 |
62.06 |
Dataset 3 |
89.46 |
91.26 |
93.57 |
93.28 |
94.27 |
Dataset 4 |
92.58 |
93.53 |
91.25 |
92.65 |
93.05 |
Dataset 5 |
84.25 |
85.06 |
85.52 |
86.8 |
88.16 |
Dataset 6 |
70.45 |
71.89 |
70.85 |
70.55 |
72.77 |
Dataset 7 |
94.56 |
96.67 |
95.88 |
95.76 |
94.98 |
Table 4. Comparison of accuracy (in %) with different dropout rates
Datasets |
Different Number of Dropout |
||||
0.1 |
0.2 |
0.3 |
0.5 |
0.7 |
|
Dataset 1 |
89.92 |
90.01 |
86.75 |
88.74 |
89.15 |
Dataset 2 |
61.15 |
61.89 |
60.2 |
62.45 |
61.59 |
Dataset 3 |
93.56 |
94.28 |
93.88 |
93.76 |
93.98 |
Dataset 4 |
92.58 |
93.53 |
91.25 |
92.65 |
93.05 |
Dataset 5 |
84.25 |
85.06 |
85.52 |
88.16 |
86.22 |
Dataset 6 |
72.45 |
72.52 |
72.57 |
72.77 |
74.27 |
Dataset 7 |
93.46 |
94.26 |
93.57 |
96.67 |
94.27 |
Table 5. Effects of feature maps on accuracy (%) of CNN MODEL
Datasets |
Number of Feature Maps |
||||
10 |
50 |
100 |
200 |
300 |
|
Dataset 1 |
89.89 |
89.94 |
90.0 |
90.00 |
90.01 |
Dataset 2 |
62.21 |
62.35 |
62.55 |
62.56 |
62.6 |
Dataset 3 |
94.01 |
94.12 |
94.28 |
94.31 |
94.30 |
Dataset 4 |
93.23 |
93.34 |
93.54 |
93.57 |
93.59 |
Dataset 5 |
88.04 |
88.12 |
88.16 |
88.18 |
88.19 |
Dataset 6 |
72.65 |
72.71 |
72.77 |
72.89 |
72.76 |
Dataset 7 |
96.21 |
96.38 |
96.67 |
96.70 |
96.73 |
Table 6. Parameter details of CNN model for best results (parameter value, accuracy of obtained best result in %)
Datasets |
Epoch |
Batch Size |
Activation Function |
Dropout rate |
Feature Map |
Dataset 1 |
(100, 90.01) |
(100, 90.01) |
(ReLU, 90.01) |
(0.2, 90.01) |
(300, 90.01) |
Dataset 4 |
(50, 62.45) |
(40, 62.45) |
(Tanh, 62.58) |
(0.5, 62.45) |
(300, 62.6) |
Dataset 3 |
(50, 94.28) |
(100, 94.28) |
(ReLU, 94.27) |
(0.2, 94.28) |
(200, 94.31) |
Dataset 2 |
(100, 93.54) |
(40, 93.54) |
(Tanh, 93.53) |
(0.2, 93.53) |
(300, 93.59) |
Dataset 5 |
(100, 88.16) |
(20, 88.16) |
(ReLU, 88.16) |
(0.5, 88.16) |
(300, 88.19) |
Dataset 6 |
(50, 72.77) |
(60, 72.77) |
(ReLU, 72.77) |
(0.7, 74.27) |
(200, 72.78) |
Dataset 7 |
(50, 96.67) |
(100, 96.67) |
(Tanh, 96.68) |
(0.5, 96.67) |
(300, 96.73) |
Experiment No. 4: Selection of the number of feature maps
Table 5 reported classification accuracy on different number feature maps on convolutional neural network classifier. Results show that the performance of CNN changes with values of feature maps. Best results are obtained with 200 and 300 feature maps. Table 5 indicates that performance increases with a number of feature maps. Increase in feature maps takes a longer time to train the model.
Table 6 summaries the best results obtained with different CNN parameters experimented in this section.
-
C. Comparison of results with techniques other than CNN
This subsection presents a comparison of CNN with literature.
Table 7. Comparison of results for dataset 1 (IMDB dataset)
Список литературы Comparative study of convolutional neural network with word embedding technique for text classification
- M. Ikonomakis, S. Kotsiantis and V. Tampakas, “Text Classification Using Machine Learning Techniques”, WSEAS transactions on computers, vol. 4, no. 8, pp. 966-974, 2005
- R. Talib, M. Kashif Hanif, S. Ayesha and F. Fatima, “Text Mining: Techniques, Applications and Issues”, International Journal of Advanced Computer Science and Applications, vol. 7, no. 11, 2016. http://dx.doi.org/10.14569/IJACSA.2016.071153
- A. Abbasi, H. Chen and A. Salem, “Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums”, ACM Transactions on Information Systems (TOIS), vol. 26, no. 3, p. 12, 2008. http://dx.doi.org/10.1145/1361684.1361685
- S. Günal, S. Ergin, M. Gülmezoğlu and Ö. Gerek,“On Feature Extraction for Spam E-Mail Detection“, International Workshop on Multimedia Content Representation, Classification and Security, 2006, pp. 635--642. http://dx.doi.org/10.1007/11848035_84
- A. Harb, M. Plantié, G. Dray, M. Roche, F. Trousset and P. Poncelet, “Web Opinion Mining: How to extract opinions from blogs?”, Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology, pp. 211--217, 2008. http://dx.doi.org/10.1145/1456223.1456269
- X. Zhu, J. Huang, Z. Zhou and Y. Han, “Chinese Article Classification Oriented to Social Network Based on Convolutional Neural Networks”, Data Science in Cyberspace (DSC), IEEE International Conference on, pp. 33--36, 2016. http://dx.doi.org/10.1109/DSC.2016.28
- S. Liaoa, J. Wangb, R. Yu, K. Sato and Z. Cheng, “CNN for situations understanding based on sentiment analysis of twitter data”, Procedia computer science, vol. 111, pp. 376--381, 2017. http://dx.doi.org/10.1016/j.procs.2017.06.037
- M. Hughes, I. Li, S. Kotoulas and T. Suzumura, “Medical text classification using convolutional neural networks”, Stud Health Technol Inform, vol. 235, pp. 246--50, 2017.
- M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. Trippe, J. Gutierrez and K. Kochut, “A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques”, arXiv preprint arXiv:1707.02919, 2017.
- C. Liu, S. Zhao and M. Volkovs, “Learning Document Embeddings With CNNs.” arXiv preprint arXiv:1711.04168, 2017.
- R. Duda, P. Hart and D. Stork, “Pattern Classification”, 2012.
- Y. Yang and X. Liu, "A re-examination of text categorization methods", Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42--49, 1999. http://dx.doi.org/10.1145/312624.312647
- P. Bednár, “Active learning of SVM and decision tree classifiers for text categorization”, Fourth Slovakian-Hungarian Joint Symposium on Applied Machine Intelligence, Herlany, Slovakia, 2006.
- W. Wang, D. Do and X. Lin,“Term graph model for text classification“, International Conference on Advanced Data Mining and Applications, pp. 19--30, 2005. http://dx.doi.org/10.1007/11527503_5
- V. Korde and C. Mahender, “Text classification and classifiers: A survey”, International Journal of Artificial Intelligence & Applications, vol. 3, no. 2, p. 85, 2012.
- S. Kamruzzaman, F. Haider and A. Hasan, “Text classification using data mining”, arXiv preprint arXiv:1009.4987, 2010.
- B. Pang and L. Lee,”Opinion mining and sentiment analysis”, Foundations and Trends in Information Retrival, vol. 2, No. 1-2, pp. 1--135,2008. http://dx.doi.org/10.1561/1500000011
- V. Gupta, “Recent trends in text classification techniques”, International Journal of Computer Applications, vol. 35, no. 6, 2011.
- Y. Kim, “Convolutional neural networks for sentence classification”, arXiv preprint arXiv:1408.5882, 2014. http://dx.doi.org/10.3115/v1/D14-1181
- A. Hassan and A. Mahmood,“Deep learning for sentence classification.”, Systems, Applications and Technology Conference (LISAT), 2017 IEEE Long Island, 2017, pp. 1--5. http://dx.doi.org/10.1109/LISAT.2017.8001979
- C. dos Santos and M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts”, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69--78, 2014.
- Y. He and D. Zhou, “Self-training from labeled features for sentiment analysis”, Information Processing & Management, vol. 47, no. 4, pp. 606--616, 2011. http://dx.doi.org/10.1016/j.ipm.2010.11.003
- B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts”, Proceedings of the 42nd annual meeting on Association for Computational Linguistics, 2004, p. 271, http://dx.doi.org/10.3115/1218955.1218990
- A. Maas, R. Daly, P. Pham, D. Huang, A. Ng, and C. Potts, “Learning word vectors for sentiment analysis”, Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, 2011, pp. 142--150.
- Y. Zhang, M. Er, R. Venkatesan, N. Wang and M. Pratama, “Sentiment classification using comprehensive attention recurrent models”, Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 1562--1569, 2016. http://dx.doi.org/10.1109/IJCNN.2016.7727384
- Y. Long, Q. Lu, R. Xiang, M. Li and C. Huang, “A cognition based attention model for sentiment analysis”, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 462--471, 2017.
- Y. Wen, W. Zhang, R. Luo and J. Wang, “Learning text representation using recurrent convolutional neural network with highway layers”, arXiv preprint arXiv:1606.06905, 2016.
- S. Wang and C. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification”, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, 2012, pp. 90--94.
- R. Johnson and T. Zhang, “Supervised and semi-supervised text categorization using LSTM for region embeddings”, arXiv preprint arXiv:1602.02373, 2016.
- S. Wang and C. Manning, “Fast dropout training”. international conference on machine learning, 2013, pp. 118--126.
- J. Du, L. Gui, R. Xu and Y. He, “A Convolutional Attention Model for Text Classification”, National CCF Conference on Natural Language Processing and Chinese Computing, pp. 183--195, 2017. http://dx.doi.org/10.1007/978-3-319-73618-1_16
- P. Liu, X. Qiu and X. Huang, “Recurrent neural network for text classification with multi-task learning”, arXiv preprint arXiv:1605.05101, 2016.
- Q. Le and T. Mikolov, “Distributed representations of sentences and documents”, International Conference on Machine Learning, pp. 1188--1196, 2014.
- T. Miyato, A. Dai and I. Goodfellow, “Adversarial training methods for semi-supervised text classification”, arXiv preprint arXiv:1605.07725, 2016.
- J. Hong and M. Fang, “Sentiment analysis with deeply learned distributed representations of variable length texts”, Technical report, Stanford University, 2015.
- D. Yogatama, C. Dyer, W. Ling, and P. Blunsom, “Generative and discriminative text classification with recurrent neural networks”, arXiv preprint arXiv:1703.01898, 2017.
- S. Mukherjee, S. Dutta, and G. Weikum, “Credible Review Detection with Limited Information using Consistency Analysis”, arXiv preprint arXiv:1705.02668, 2017.
- A. Salinca, “Business reviews classification using sentiment analysis”, 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2015, pp. 247--250. http://dx.doi.org/10.1109/SYNASC.2015.46
- B. Nguy, “Evaluate Helpfulness in Amazon Reviews Using Deep Learning.”
- K. Jónsson and D. Platt, “CSE 255 Assignment 1: Helpfulness in Amazon Reviews.”
- M. Agarwal and M. Bhand, “Classification of Amazon Reviews.”, 2009.
- J. Vijayabhaskar, R. Sridhar and P. Vijayaragavan, “User Review Sentiment Classification and Aspect Extraction”, Imperial Journal of Interdisciplinary Research, vol. 3, no. 4, 2017.
- C. Qiao, B. Huang, G. Niu, D. Li, D. Dong, W. He, D. Yu and H. Wu, “A new method of region embedding for text classification.”, ICLR, 2018
- A. Joulin, E. Grave, P. Bojanowski and T. Mikolov, “Bag of tricks for efficient text classification.” arXiv preprint arXiv:1607.01759, 2016. http://dx.doi.org/10.18653/v1/E17-2068
- Y. Xiao and K. Cho, “Efficient character-level document classification by combining convolution and recurrent layers.” arXiv preprint arXiv:1602.00367, 2016.
- X. Zhang, J. Zhao and Y. LeCun, “Character-level convolutional networks for text classification”, Advances in neural information processing systems, 2015, pp. 649--657.
- G. Lee, J. Jeong, S. Sao, C. Kim and P. Kang, “Sentiment classification with word localization based on weakly supervised learning with a convolutional neural network.” Knowledge-Based Systems, vol. 152, pp.70-82, 2018. http://dx.doi.org/10.1016/j.knosys.2018.04.006
- G. Lee, J. Jeong, S. Seo, C. Kim and P. Kang1, “Sentiment Classification with Word Attention based on Weakly Supervised Leaning with a Convolutional Neural Network”, arXiv preprint arXiv:1709.09885, 2017.
- H. Bedi and N. Cheke, “Sentiment Analysis of Movie Reviews ”, 2012.
- S. Chintala, “Sentiment Analysis using neural architectures.” New York University, 2012.
- A. Mandelbaum and A. Shalev, “Word embeddings and their use in sentence classification tasks”, arXiv preprint arXiv:1610.08229, 2016.
- G. Chen, D. Ye, Z. Xing, J. Chen and E. Cambria, “Ensemble application of convolutional and recurrent neural networks for multi-label text categorization”, Neural Networks (IJCNN), 2017 International Joint Conference on, pp. 2377--2383, 2017. http://dx.doi.org/10.1109/IJCNN.2017.7966144
- https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZU cVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M