I am giving a talk at Columbia University on evaluating Grammar Error Correction systems across domains.
Attended EMNLP in Brussels.
PhD Graduation ceremony in Edinburgh:
I started a new job as a Research Scientist at Grammarly.
At Grammarly, I am working on tools that improve how people communicate in writing - Give it a try! Recently, I have been applying Neural Machine Translation techniques to scale Grammar Correction to different domains.
I worked on adapting Google Translate to new domains. I also had my most viewed YouTube appearance so far.
As part of the Edinburgh's Machine Translation Group, I was at the forefront of research on neural machine translation. For example, I invented a novel neural translation model that learns a better syntactic representation of sentences and improves translation for several language pairs.
Nădejde, M., Reddy, S., Sennrich, R., Dwojak, T., Junczys-Dowmunt, M., Koehn, P., Birch, A. (2017), Proceedings of the Second Conference on Machine Translation (WMT17)
Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling targetsyntax improves machine translation quality for German→English, a high-resource pair, and for Romanian→English, a lowresource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English.
[ PDF ]
Nădejde, M., Birch, A., Koehn, P. (2016), Proceedings of the International Workshop on Spoken Language Translation (IWSLT16)
Abstract: String-to-tree MT systems translate verbs without lexical or syntactic context on the source side and with limited targetside context. The lack of context is one reason why verb translation recall is as low as 45.5%. We propose a verb lexicon model trained with a feedforward neural network that predicts the target verb conditioned on a wide source-side context. We show that a syntactic context extracted from the dependency parse of the source sentence improves the model’s accuracy by 1.5% over a baseline trained on a window context. When used as an extra feature for re-ranking the n-best list produced by the string-to-tree MT system, the verb lexicon model improves verb translation recall by more than 7%.
[ PDF ]
Nădejde, M., Birch, A., Koehn, P. (2016), Proceedings of the First Conference on Machine Translation (WMT16)
Abstract: We address the problem of mistranslated predicate-argument structures in syntaxbased machine translation. This paper explores whether knowledge about semantic affinities between the target predicates and their argument fillers is useful for translating ambiguous predicates and arguments. We propose a selectional preference feature based on the selectional association measure of Resnik (1996) and integrate it in a string-to-tree decoder. The feature models selectional preferences of verbs for their core and prepositional arguments as well as selectional preferences of nouns for their prepositional arguments. We compare our features with a variant of the neural relational dependency language model (RDLM) (Sennrich, 2015) and find that neither of the features improves automatic evaluation metrics. We conclude that mistranslated verbs, errors in the target syntactic trees produced by the decoder and underspecified syntactic relations are negatively impacting these features.
[ PDF ]
Nădejde, M., Williams, P., Koehn, P. (2013), Proceedings of the Eighth Workshop on Statistical Machine Translation (WMT13)
Abstract: We present the syntax-based string-totree statistical machine translation systems built for the WMT 2013 shared translation task. Systems were developed for four language pairs. We report on adapting parameters, targeted reduction of the tuning set, and post-evaluation experiments on rule binarization and preventing dropping of verbs.
[ PDF ]