Ambiguity in question paper translation

Автор: Shweta Vikram, Sanjay K. Dwivedi

Журнал: International Journal of Modern Education and Computer Science @ijmecs

Статья в выпуске: 1 vol.10, 2018 года.

Бесплатный доступ

Word sense ambiguity is a prevalent nature of machine translation for various language pairs including English-Hindi language. For example, the word "paper" has several senses which may refer to a question paper, research paper, newspaper, simple paper or a white paper. The specific sense intended is determined by the context in which an instance of the ambiguous word appears. This specific sense which is determined by the context is known as Word Sense Disambiguation (WSD). Translation of question paper is a specific application of MT wherein any type of ambiguity in question may affect the overall meaning of questions. This paper discusses types of ambiguity in the context of question paper translation (English to Hindi) and their impact on translation by analyzing a set of questions taken from National Council of Educational Research and Training (NCERT) and some other resources.

Еще

Question paper, Word Sense Disambiguation, Hindi, English, Translation

Короткий адрес: https://sciup.org/15016726

IDR: 15016726   |   DOI: 10.5815/ijmecs.2018.01.02

Текст научной статьи Ambiguity in question paper translation

Published Online January 2018 in MECS DOI: 10.5815/ijmecs.2018.01.02

India is a multilingual country with a major language spoken is Hindi. Most of the people in India work in Hindi and it is the mother tongue in most of the states [26]. But people who speak Hindi face language problem in machine translation because machines often fail to translate the actual context of a sentence. Sentences sometimes have ambiguous words due to which MT tools usually fail to correctly translate the sentence into the target language. There are many approaches of WSD which use techniques such as tagging, chunking, parsing, name identity recognition, place identity recognition [913]. WSD is a very vast area of the research. WSD is not an easy problem and is considered as the NP-complete problem. Past few decades have witnessed researches in word sense disambiguation [7]. Machine Translation and WSD are the complementary or subsidiary of each other. Whenever machine translates from one language to another, it requires the knowledge about certain words which are ambiguous, so that the sentence can be correctly translated. It is done through appropriate WSD algorithm. Manual translation is a very cumbersome problem as it takes too much time [3]. The fundamental order is lexical, character, syntactic, and semantic features [34]. The complete rule-based expert system has been evaluated for good result. The result of that evaluation is a good and a very good range [35]. For machine translation, query terms are automatically translated from source language to the desired language by using a context [36].

Many approaches have been proposed since 1950 for assigning senses to words in context, although early attempts only served as models for toy systems [15]. Approaches used in WSD can be categorized as supervised, unsupervised, semi-supervised, knowledgebased, bootstrapped, hybrid and dictionary-based approaches [2, 16, 19, 21-24]. The Dictionary based approach is the oldest approach and it was proposed by Karov and Edelman [28]. The supervised approach uses trained data, a major problem with supervised approaches is that it requires a large sense-tagged training set. It is widely used in medical field to get better results [38]. The unsupervised approach does not require trained data as well as a corpus. The main reason for the development of this approach is the complexity of creation of marked corpus and other necessary resources. Hybrid Approach combines two or more than two approaches. Corpusbased machine translation systems have gained much interest in recent few years. It is fully automatic and requires less human labor than other approaches, but they need sentence-aligned parallel text for each language pair. Corpus-based machine translation is classified into statistical machine translation (SMT) and Example-based Machine Translation (EBMT) [14].

  • II.    R elated W ork

This section reports some significant contribution to the translation of questions and question answering system.

In 2001, Dave and Bhattacharya [10] used interrogative sentences to detect the presence of Wh-word like what, where, why, whom, how etc and also find question mark symbol at the end of the sentences. These interrogative sentences are divided into two categories one is ‘wh-questions’ and another is ‘yes-no’ questions. It is shown that when the Hindi question sentence is written in more than one way by changing the order of words, the meaning of the sentence remains the same.

In 2005, Kumar et al. [8] developed a question answering system for Hindi documents. This paper also gives an idea for question classification, question parsing, question formulation and query expansion. In 2005, Metzler and Croft [30] analyze fact based question classification through statistical method. These fact based questions are different question types.

In 2007, Singh et al [25] introduced the concept of Tense Aspect and Modality (TAM) Marker. It pointed out that many errors occur in MT are due to a wrong translation of TAM markers.

In 2011, Silva et al [31] worked on a question answering system by using question classification from symbolic to sub-symbolic information. Authors also gave the information about last few year work done on supervised machine learning approaches to question classification.

In 2014, Dwivedi and Goyal [26] work on the status of machine translation in India through an experimental analysis of question paper translation. They used BLEU (Bi-Lingual Evaluation Understudy) for evaluating experimental analysis. In 2014, Dwivedi and Singh [29] focus on integrated classification in a higher education domain which is based on rules and pattern matching. In this Wh- questions, considered for question classification.

In 2015, Graesser et al [33] gave some idea for question generation mechanism, question categorization and assumptions behind the questions for question classification scheme.

In 2015, Kamdi and Agrawal [37] give the concept of question answering system for Indian Penal code section and Indian amendment laws by using keywords based closed domain. In this work authors also defined the process of the question in three types as determining the type of question, determining the type of answer and extracting keywords from the question and formulate a query.

In 2017, Dwivedi and Vikram [32] introduced some external resources for machine translation, question classification and discussed some ambiguity related question sentences.

  • III.    T ypes of A mbiguity

Word sense disambiguation techniques can determine the correct sense of ambiguous words with respect to the context. Machine Translation is automated translation and it translates one natural language to another with or without any assistance of a human. Sense ambiguity may be of different types that have been summarized below.

  • A.    Lexical ambiguity

In this, a word or phrase pertains to it, is having more than one meaning [20]. For Example, English WordNet [41] has more than one senses of word “master”. Table 1 shows all the senses of this word in Hindi WordNet [4, 42]. Word “Master” has different sense with respect to context, in our example, “ 41^4 ” has been identified as sense in the question sentences.

  • 1.    Explain maste r method.

  • 2.    What is the Master -Slaves flip flop?

MT (Google): ЛЛ? Ж ЛЛТ^Ч

MT (Google): ЛТ^ ЛШ ^^Л ^ТЛ ^T t _\

It is interesting to see that same MT tool translates the words “Master { ЛТ^Ч , Л|[йф }” differently in two examples above.

The WordNet and its Hindi version (Hindi WordNet) provide various senses of the word “Master” as shown in Table1 and Table 2. WordNet is an ongoing lexical resource at Princeton University since the 1980s with a hierarchical structure, where a node is a synset and a link is a relationship between two synsets. Hindi WordNet is a repository of Hindi words connected by lexical and semantic relation along with the browsing interface and associated software. Both there WordNet are considered as machine-readable dictionaries. English WordNet has the collection of all English senses, for a large number of words in English. We collect all the senses of word “master” from English WordNet, which is shown in Table 1, but it does not contain “ekfyd” sense of the word “master”.

Both WordNet (Hindi and English) do not have the ekfyd (malik) meaning of the word ‘master’. But example1 has the correct sense malik (ekfyd) related to context.

Table 1. Sense of word “Master” in WordNet

Senses of ‘Master’ word

The noun master has 10 senses (first 6 from tagged texts)

Maestro

Overlord, lord

Victor, superior

headmaster, schoolmaster

master copy, original

captain, sea captain, skipper

master's degree

Professional

passkey, passe-partout,  master

key

directs the work of other

The verb master has 4 senses (first 3 from tagged texts)

get the hang

overcome,  get over,  subdue,

surmount

Dominate

Control

Table 2. Sense of word “ Ч^Ч ” in WordNet (Hindi)

Gloss

Senses of ‘ нт^е< ’ word

^^ ^f^ ^ l^^if^Ti ^г ч^г^т |

^ЬЧIЧ*, f^44, ^чтЧ, чтчтч, ^чт^Ч, ^Ч, нжч Н^Рич, ^чч, ^ч^ч, £Гчч, ^Ч, Ч^Т

Ч^ ЧТЧЧГЧ^Ч Ч^ ^г f^^T ^ ЧТ 1ЧЧЧ Гччт f^^

Р^чч, ^ч, чтчгч, ^ьчlч*, £Гчч

ч^ ^1^*т(Г 1ч^Т чт^ ГТТГ ^тч|(Г ч^тч чг f^Tf^4 ч^ч чт чтччч ^гчт 1

ччтч, чч!Ч, НЖЧ ^ft44

  • B.    Syntactical ambiguity

A sentence can be elucidated in more than one way. Often sentences may have more than one meaning because of the structure of the sentence, such as not placing appropriate punctuation [5].

Panda eats, shoots and leaves” or “Panda eats shoots and leaves.” (Comma “,” arise ambiguity) [27]. For Example,

There is ambiguity in the above sentence that arises due to ‘comma’. The two meanings are inferred from the same sentence but the words are not ambiguous. The human translation (Hindi) also gives two meaning as sown below.

чт^т ФТчЧ ^ Ч^ЧТ ^ТЧТ t чт^т ^тчт ^ чтгчт ^ ^г ЧИТ чтит t

  • C.    Semantic ambiguity

More than one way of reading a sentence is known as semantic ambiguity [1]. Below example shows semantic ambiguity.

Example: He saw a man on the hill with a telescope. (“with a telescope” arise ambiguity)

Different interpretations are possible from the above sentence. For example

All the above Interpretations are possible from the sentence. Such type of ambiguity is very difficult to handle and it requires prior contextual knowledge to get the most appropriate meaning in the context.

  • D.    Lack of information

This problem arises in translation because one language does not have full information in translation. For example, as reported by English newspaper [30]. A Question was asked in some examination and while the Tamil version was asking about “three impacts of solar energy”, the English version of the same question has “three environmental impacts of solar energy”.

Paper (English language): write the three Impact of solar energy.

Paper (Tamil language):   Write the three

‘environmental’ impact of solar energy.

A Tamil version does not have any ambiguity, but the English sentence missing one word and then it arises ambiguity, English question does not bother for ‘environmental' impact of solar energy, it mentions any impact of solar energy.

  • IV.    C hallenges for Q uestion P aper T ranslation

In the previous section, we discussed the various form of ambiguity. In this section, we try to explain how ambiguity and some other issues can change the translation and meaning of question sentences.

Now, we will discuss the challenges in question paper translation. One of the major problems that occur is ambiguity. Due to this, the sense of the question may change. Along with Ambiguity, we will also discuss some other related issues that may affect the translation of question paper.

  • A.    Ambiguity Issue

Ambiguity is one of the major challenges for Question Paper Translation through the machine. Question sentences may be affected by ambiguity due to individual words or due to the syntax of the question and hence translated meaning may change. For example, let us take a question from the NCERT online book [39] [44]. The question in English has been translated in Hindi by some popular MT tools.

NCERT (English): Explain the causes of the Great Depression.

Список литературы Ambiguity in question paper translation

  • T. Hao, D. Hu, L. Wenyin, and Q. Zeng, “Semantic patterns for user interactive question answering”, Concurrency and Computation: Practice and Experience, 20(7), pp.783-799, 2008
  • R. Navigli, “Meaningful clustering of senses helps boost word sense disambiguation performance”, In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 105-112, 2006
  • P. Dungarwal, R. Chatterjee, A. Mishra, A. Kunchukuttan, R. M. Shah, and P. Bhattacharyya, “The IIT Bombay Hindi-English Translation System at WMT 2014”. In WMT@ ACL, pp. 90-96, 2014
  • D. S. Chaplot, S. Bhingardive, and P. Bhattacharyya, “IndoWordnet visualizer: A graphical user interface for browsing and exploring wordnets of Indian languages”, In Global WordNet Conference (GWC 2014), 2014
  • D. Chakrabarti and P. Bhattacharya, “Syntactic Alternations of Hindi Verbs with Reference to the Morphological Paradigm”, Language Engineering Conference (LEC 2002), Hyderabad, India 2002.
  • M. Sinha, M. Kumar, P. Pande, L. Kashyap, and P. Bhattacharyya, Hindi “word sense disambiguation. In International Symposium on Machine Translation”, Natural Language Processing and Translation Support Systems, Delhi, India, 2004
  • R. V. Bhala, and S. Abirami, S. “Trends in word sense disambiguation. Artificial Intelligence Review”, 42(2), pp 159-171, 2014
  • P. Kumar, S. Kashyap, A. Mittal, and S. Gupta, “A Hindi question answering system for E-learning documents”, In Intelligent Sensing and Information Processing, 2005. ICISIP 2005. Third International Conference on (pp. 80-85). IEEE, 2005
  • R. Navigli, S. Faralli, A. Soroa, O. De Lacalle, and E. Agirre, “Two birds with one stone: learning semantic models for text categorization and word sense disambiguation. In Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 2317-2320, ACM. 2011
  • S. Dave, and P. Bhattacharyya, “Knowledge extraction from Hindi text”. IETE Technical Review, 18(4), pp 323-331, 2001
  • R. Navigli, and P. Velardi, “Structural semantic interconnections: a knowledge-based approach to word sense disambiguation”. IEEE Transactions on pattern analysis and machine intelligence, 27(7), pp 1075-1086, 2005
  • L. Li, B. Roth, and C. Sporleder, “Topic models for word sense disambiguation and token-based idiom detection”, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1138-1147, 2010
  • F. Mandreoli, and R. Martoglia, “Knowledge-based sense disambiguation (almost) for all structures”. Information Systems, 36(2), pp 406-430, 2011
  • L. R. Nair, and S. David Peter, “Machine translation systems for indian languages”, International Journal of Computer Applications pp 0975–8887, 2012
  • A. Montoyo, A. Suárez, G. Rigau, and M. Palomar, “Combining Knowledge-and Corpus-based Word-Sense-Disambiguation Methods”. J. Artif. Intell. Res.(JAIR), pp 299-330, 2005
  • H. C. Seo, H. Chung, H. C. Rim, S. H. Myaeng, and S. H. Kim, “Unsupervised word sense disambiguation using WordNet relatives”, Computer Speech & Language, pp 253-273, 2004
  • R. Tromble, and J. Eisner, “Learning linear ordering problems for better translation”. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2, Association for Computational Linguistics, pp. 1007-1016, 2009
  • S. Chaudhury, A. Rao, and D. M. Sharma, “Anusaaraka: An expert system based machine translation system” In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on IEEE. pp. 1-6, 2010
  • R. Navigli, “A quick tour of word sense disambiguation, induction and related approaches”, SOFSEM 2012: Theory and practice of computer science, pp 115-129, 2012
  • R. Navigli, “Word sense disambiguation: A survey”. ACM Computing Surveys (CSUR), 2009
  • F. Mandreoli, and R. Martoglia, “Knowledge-based sense disambiguation (almost) for all structures”. Information Systems, 36(2), pp 406-430, 2011
  • H. Li, and C. Li, “Word translation disambiguation using bilingual bootstrapping”. Computational Linguistics, pp pp 1-22, 2004
  • B. Broda, and M. Piasecki, “Semi-supervised word sense disambiguation based on weakly controlled sense induction”, In Computer Science and Information Technology, 2009. IMCSIT'09. International Multiconference on IEEE. pp. 17-24, 2009
  • C. Y. Yang, and J. C. Hung, “Word sense determination using wordnet and sense co-occurrence”, In Advanced Information Networking and Applications2006. AINA 2006. 20th International Conference on Vol. 1, IEEE. pp. 779-784, 2006
  • A. K. Singh, S. Husain, H. Surana, J. Gorla, D. M. Sharma, and C. Guggilla, “Disambiguating tense, aspect and modality markers for correcting machine translation errors”. In Proceedings of RANLP, 2007
  • S. Dwivedi, and A. Goyal, “Machine Translation status in India”, Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies - ICTCS '14, 2014
  • M. Bryant, “Eats, Shoots and Leaves: The Zero Tolerance Approach to Punctuation”. Law Now, 29, pp. 94. 2004.
  • Karov, Yael, and Shimon Edelman. "Similarity-based word sense disambiguation." Computational linguistics 24, no. 1, pp 41-59. 1998
  • S. K. Dwivedi and V. Singh, “Integrated question classification based on rules and pattern matching”. In Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies pp. 39. ACM. 2014
  • D. Metzler and B. W. Croft, “Analysis of statistical question classification for fact-based questions”. Information Retrieval, 8(3), pp. 481-504.2005
  • J. Silva et al., “From symbolic to sub-symbolic information in question classification”. Artificial Intelligence Review, pp. 137-154. 2011
  • S. K. Dwivedi and S. Vikram, “Word Sense Ambiguity in Question Sentence Translation: A Review”, In International Conference on Information and Communication Technology for Intelligent Systems pp. 64-71. Springer, Cham, 2017
  • A. Graesser et al., “Question classification schemes”. In Proc. of the Workshop on Question Generation. 2008
  • I. S. Abuhaiba and M. F. Eltibi, “ Author Attribution of Arabic Texts Using Extended Probabilistic Context Free Grammar Language Model”, International Journal of Intelligent Systems and Applications, 8(6), 27. 2016
  • M. A. Kadhim et al. “A Multi-intelligent Agent System for Automatic Construction of Rule-based Expert System. International Journal of Intelligent Systems and Applications, 8(9), 62. 2016
  • G. Chandra and S. K. Dwivedi, “Assessing Query Translation Quality Using Back Translation in Hindi-English CLIR”, International Journal of Intelligent Systems and Applications, 9(3), 51. 2017
  • P. R. Kamdi et al., “Keywords based closed domain question answering system for indian penal code sections and indian amendment laws”, International Journal of Intelligent Systems and Applications, 7(12), 54. 2014
  • A. S. Medjahed et al., “Urinary System Diseases Diagnosis Using Machine Learning Techniques”, International Journal of Intelligent Systems and Applications, 7(5), 1. 2015
  • http://epathshala.nic.in/e-pathshala-4/flipbook/
  • http://timesofindia.indiatimes.com/city/chennai/Class-12-girl-cites-ambiguity-in-biology-paper-seeks-full-marks/articleshow /33146287.cms
  • English WordNet http://wordnetweb.princeton.edu/perl/webwn
  • Hindi WordNet: http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
  • https://drive.google.com/file/d/0B3HSpNixd2_YVUM5U3d2ZjlldHc/view
  • NCERT: http://ncert.nic.in/NCERTS/textbook/textbook.htm English Parser
  • http://nlp.stanford.edu:8080/parser/
  • Hindi Tagging: http://text-processing.com/demo/tag/
  • English-HindiDictionary: http://www.shabdkosh.com/
Еще
Статья научная