Bruteporter: a hybrid approach

Balamurugan Mahalingam; Kannan S.; Vairaprakash Gurusamy

doi:10.5815/ijeme.2018.05.02

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Software

Bruteporter: a hybrid approach

Author: Balamurugan Mahalingam, Kannan S., Vairaprakash Gurusamy

Journal: International Journal of Education and Management Engineering @ijeme

Article in issue: 5 vol.8, 2018.

Free access

Stemming fetches the main root word from the inflectional words called stem. Stem gives different meaning when suffix or prefix is added to it. The stem need not give perfect meaning. Lemmatization gives lemma from inflectional words. Lemma should give meaning that in the dictionary form. Natural Language processing, Information retrieval, Text mining are the areas which use the stemming as preprocessing step. Through stemming, the size of the document can be reduced and ambiguity is also removed. It makes the work easy for other process likes information retrieval, semantic analysis, text categorization etc. Though there is a need for linguistic improvements in the existing stemming algorithms, all these algorithms fail in some cases to give an exact Root word and are not able to handle informal verbs. Hence, Bruteporter Hybrid approach is proposed in order to improve the linguistic process of stemming in English Texts. It combines the Wordnet and Modified Porter Algorithm. A Wordnet is a lexical dictionary created by linguistics people. Modified porter algorithm has both suffix removal and suffix substitution functionality. This proposed approach can extract root word from both inflectional words and informal verbs. In this paper, Experiment is conducted on proposed algorithm and the accuracy is calculated.

Porter, Inflection, Wordnet, Stemming

Short address: https://sciup.org/15015777

IDR: 15015777 | DOI: 10.5815/ijeme.2018.05.02

References Bruteporter: a hybrid approach

Brychcín, T., & Konopík, M. (2015). HPS: High recision stemmer. Information Processing & Management, 51(1), 68-91.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137.
Lovins, J. B. (1968). Development of a stemming algorithm.
Paice, C. D. (1994, August). An evaluation method for stemming algorithms. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 42-50). Springer-Verlag New York, Inc..
Weiss, D. (2005). Stempelator: A hybrid stemmer for the Polish language. Institute of Computing Science: Poznan University of Technology Research Report.
Mishra, U., & Prakash, C. (2012). MAULIK: An effective stemmer for Hindi language. International Journal on Computer Science and Engineering, 4(5), 711.
Dhawan, C., Singh, J., & Garg, K. (2013). Hybrid Approach for Stemming in Punjabi. International Journal of Computer Science & Communication Networks, 3(2), 101.
Jiandani, K. S. D., & Bhattacharyya, P. (2011, November). Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati. In Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP 2011) (p. 1).
Wiese, A., Ho, V., & Hill, E. (2011, September). A comparison of stemmers on source code identifiers for software search. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on (pp. 496-499). IEEE.
Moral, C., de Antonio, A., Imbert, R., & Ramírez, J. (2014). A survey of stemming algorithms in information retrieval. Information Research: An International Electronic Journal, 19(1), n1.
Flores, F. N., & Moreira, V. P. (2016). Assessing the impact of Stemming Accuracy on Information Retrieval–A multilingual perspective. Information Processing & Management, 52(5), 840-854.