Boosting Afaan Oromo Named Entity Recognition with Multiple Methods

Автор: Abdo Ababor Abafogi

Журнал: International Journal of Information Engineering and Electronic Business @ijieeb

Статья в выпуске: 5 vol.13, 2021 года.

Бесплатный доступ

Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more.

Еще

Afaan Oromo, Named Entity Recognition, Word Sense Disambiguation, NLP, Information Extraction

Короткий адрес: https://sciup.org/15017798

IDR: 15017798   |   DOI: 10.5815/ijieeb.2021.05.05

Список литературы Boosting Afaan Oromo Named Entity Recognition with Multiple Methods

  • A. Sani, “Afan Oromo Named Entity Recognition using Hybrid Approach:, M.S. thesis, Addis Ababa Univ., 2015.
  • Ws. Li and A. McCallum, “Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction”, 2003.
  • I. Bedane, “The Origin of Afan Oromo : Mother Language,” Glob. J. Hum. Soc. Sci. G Linguist. Educ., vol. 15, no. 12, 2015.
  • W. Tegegne “The Development of Written Afan Oromo and the Appropriateness of Qubee, Latin Script, for Afan Oromo Writing”, Int. Journ. of Computer Appl. Techn and Res., pp 8-14, Vol.28, 2016.
  • M. Legesse, “Named Entity Recognition for Afan Oromo”, M.S. thesis, Addis Ababa Univ., 2012.
  • N. K. Raja, N. Bakala, S. Suresh, “NLP: Rule Based Name Entity Recognition”, IJITEE, Vol. 8, no. 11, Sep. 2019.
  • A. D. Sitter, Calders, T. and W. Daelemans, “A formal framework for evaluation of information extraction”, 2004.
  • M. Hassen, “A Brief Glance at the History of the Growth of Written Oromo Literature in Cushitic and Omotic Languages” 3rd, Int. Symp., Berlin, 1996.
  • T. Gamta, “The Oromo language and the latin alphabet”, Journal of Oromo Studies, 1992. http://www.africa.upenn.edu/Hornet/Afaan_Oromo_19777.html last visited on Friday, October 31, 2014.
  • A. Goyal, M. Kumar, V. Gupta, “Named Entity Recognition: Applications, Approaches and Challenges”, Int. J. of Adv. Res. in Sci. and Eng. vol. 6, no. 10, pp. 1902-1919, 2017.
  • R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations, In Proc.of the 49th Annu. Meeting of the Assoc. for Comput. Linguistics: Human Lang. Techno., 1, 541-550, 2011.
  • S. Riedel, L. Yao, and A. McCallum, “Modeling Relations and their Mentions without Labeled Text”, In Joint European Conf. on Machine Learn. Knowl. Discovery in Databases, Springer Berlin Heidelberg, 2010.
  • A. M.Popescu, and Etzioni, O., Extracting Product Features and Opinions from Reviews, In Natural language processing and text mining, SPRINGER, pp. 9-28, 2007.
  • O. Etzioni, et al. “Unsupervised Named-Entity Extraction from the Web: An Experimental Study, Artificial intelligence”, 165(1), ELSEVIER, pp. 91-134, 2005.
  • Cao, T. H., Tang, T. M. and Chau, C. K., Text Clustering with Named Entities: A Model, Experimentation and Realization, In Data mining: Foundations and intelligent paradigms, 267-287. Springer Berlin Heidelberg, 2012.
  • I. Habernal, and M. KonopíK, SWSNL: “Semantic Web Search using Natural Language. Expert Systems with Applications, vol. 40(9), pp. 3649-3664, 2013.
  • A. Thomas and S. Sangeetha, “Deep Learning Architectures for Named Entity Recognition: A Survey”, Advan. Computing and Intelligent Eng, pp. 2015-2025, 2020.
  • M. Gupta, “Review of Named Entity Recognition (NER) Using Automatic Summarization of Resumes” https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175 (accessed apr.15, 2021).
  • C. S. Malarkodi and S. L. Devi, A Deeper Study on Features for Named Entity Recognition, Proc. of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pp. 66–72, 2020.
  • W. Tesema and D. Tamirat, Investigating Afan Oromo Language Structure and Developing Effective File Editing Tool as Plug-in into Ms Word to Support Text Entry and Input Methods.
  • M. Oljira, et al. Sentiment analysis for Afaan Oromo using combined covolutional neural network and bidirectional long-short memory, IJARET, pp. 101-112, 2020.
  • M. S. Bari, S. Joty, and P. Jwalapuram, Zero-Resource Cross-Lingual Named Entity Recognition, The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020.
  • J. Xie, Z. Yang, G. Neubig, A. Smith, and G. Carbonell; Neural cross-lingual named entity recognition with minimal resources. 2018
  • Y. Lin, S. Yang, V. Stoyanov, and H. Ji. A multi-lingual multi-task architecture for low-resource sequence labeling. Association for Computational Linguistics, In ACL, pp. 799–809. Melbourne, Australia: 2018.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding, 2018
  • A. Akbik, D. Blythe, and R. Vollgraf. Contextual string embeddings for sequence labeling. In COLING, pp. 1638–1649, 2018
  • M.E. Peters, W. Ammar, C. Bhagavatula, and R. Power, Semi-supervised sequence tagging with bidirectional language models, 2017.
Еще
Статья научная