Lunch at 12:30pm, talk at 1pm, in 148 Fitzpatrick

Title: Integrating Dictionaries into Neural Machine Translation

Abstract: Human translators often resort to external knowledge sources when translating a text. If a word or phrase has a one-to-many mapping from the source language to the target language, then additional context must be either provided in the text itself or retrieved from an external knowledge source, such as dictionaries, thesauruses, or encyclopedias. At present, neural machine translation (NMT) systems are unable to retrieve any additional information not already included in the training data. Moreover, rare words, i.e. those that occur with a low frequency or are otherwise absent from the training corpus, present a challenge for current NMT systems since they are not often seen enough in context. In the past, attempts have been made to append dictionaries to training corpora directly, but those attempts proved ineffective in the current neural era without the surrounding context of the source sentence. We propose a potential solution to help alleviate the rare word problem: integrate mono- and bilingual dictionaries into NMT systems by using attention masking to associate rare words in the input sequence with their corresponding mono- and bilingual definitions, giving the model the additional context necessary to produce an accurate translation.

Bio: Ken Sible is a second-year PhD student in Dr. David Chiang’s NLP Lab at the University of Notre Dame. His primary research focus is on improving neural machine translation by incorporating external knowledge sources. His research interests are in machine translation and computational linguistics.