Part-of-speech Tagging for Marathi using Maximum Entropy Markove Model

Автор: Swati Prakash Sonawane, Kavita Tukaram Patil, R.P. Bhavsar, B.V. Pawar

Журнал: International Journal of Information Technology and Computer Science @ijitcs

Статья в выпуске: 3 Vol. 18, 2026 года.

Бесплатный доступ

Part-of-Speech (POS) tagging is an essential and important pre-processing activity for many Natural Language Processing (NLP) applications, this is particularly more evident for morphologically rich languages such as Marathi. This research investigates POS tagging for Marathi using the Maximum Entropy Markov Model (MEMM). MEMM combines the strengths of conditional probability modelling and sequence prediction, allowing the integration of rich contextual features. Features used include word forms, suffixes, prefixes, and neighboring tags, effectively tackling the challenges presented by inflectional variations and ambiguity in Marathi. Experimental results demonstrate that the MEMM-based POS tagger achieves an accuracy of 83.72%. This performance marks a notable advancement in Marathi POS tagging, given the linguistic diversity and the scarcity of annotated data. Error analysis enhances the issues like ambiguity in homonyms and out-of-vocabulary words, providing methods for further improvement through enriched datasets and sophisticated modelling techniques. This study enhances NLP applications such as machine translation, spell checking, and sentiment analysis for Indian languages and offers a solid foundation for future research in Marathi POS tagging.

Еще

Bureau of Indian Standard Tagset, Maximum Entropy Markov Model, Morphologically Rich Languages, Natural Language Processing, Part of Speech Tagging

Короткий адрес: https://sciup.org/15020434

IDR: 15020434   |   DOI: 10.5815/ijitcs.2026.03.02