Machine Translation

11-731 | Spring 2013

Homeworks (Leaderboard)

Machine Translation | 11-731 | Spring 2013

At a Glance


Welcome to Machine Translation (11-731). This 12-credit graduate course will provide a comprehensive overview of current techniques in statistical machine translation, such as those used by Google Translate and Bing Translator.


DateTopicBookFurther Readings
1 Jan 15 Introduction
SMT Ch. 1 Knight, Automating Knowledge Acquisition for Machine Translation
Weaver, Translation
2 Jan 17 Probability Review
SMT Ch. 3 Johnson, Joint and conditional estimation of tagging and parsing models
Borman, The Expectation Maximization Algorithm
Klein and Manning, Maxent Models, Conditional Estimation, and Optimization, without the Magic
Lafferty et al., Conditional Random Fields
3 Jan 22 Language Models
SMT Ch. 4.3, 7 Kneser and Ney, Improved Backing-Off for m-Gram Language Modeling
Pauls and Klein, Large-Scale Syntactic Language Modeling with Treelets
Mnih and Hinton, Three New Graphical Models for Statistical Language Modelling
Teh, A Hierarchical Bayesian Language Model based on Pitman-Yor Processes
4 Jan 24 Lexical Translation Models I
SMT Ch. 4.1-4.2, 4.5 Brown et al., A Statistical Approach to Machine Translation
Collins, Statistical Machine Translation; IBM Models 1 and 2
Dyer et al., A Simple, Fast, and Effective Reparameterization of IBM Model 2
5 Jan 29 Lexical Translation Models II
L-in-10: Latin
SMT Ch. 4.4, 4.6 Brown et al., The Mathematics of Statistical Machine Translation; Parameter Estimation
Vogel and Ney, HMM-based word alignment in statistical translation
Dyer et al., Unsupervised Word Alignment with Arbitrary Features
Yamada and Knight, A Syntax-based Statistical Translation Model
6 Jan 31 Noisy Channel Translation
L-in-10: Mandarin
none Norvig, How to Write a Spelling Corrector
Kernighan et al., A Spelling Correction Program Based on a Noisy Channel Model
Yuret and Yatbaz, The Noisy Channel Model for Unsupervised Word Sense Disambiguation
7 Feb 05 Phrase-based MT I
L-in-10: Russian
SMT Ch. 5 Marcu and Wong, A Phrase-Based, Joint Probability Model for Statistical Machine Translation
DeNero et al., Sampling Alignment Structure under a Bayesian Translation Model
8 Feb 07 Phrase-based MT II
SMT Ch. 6 Koehn et al., Statistical Phrase-Based Translation
9 Feb 12 Evaluation I
none Callison-Burch et al., Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Snover et al., Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
10 Feb 14 Evaluation II
none Lavie and Denkowski, The METEOR Metric for Automatic Evaluation of Machine Translation
Papineni et al., BLEU: a Method for Automatic Evaluation of Machine Translation
Snover et al., A Study of Translation Edit Rate with Targeted Human Annotation
Dreyer and Marcu, HyTER: Meaning-Equivalent Semantics for Translation Evaluation
11 Feb 19 Discriminative Training I
none Och and Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
Macherey et al., Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Smith and Eisner, Minimum Risk Annealing for Training Log-Linear Models
12 Feb 21 Discriminative Training II
none An end-to-end discriminative approach to machine translation, Liang et al.
Gimpel and Smith, Structured Ramp-Loss Minimization for Machine Translation
13 Feb 26 Syntax in MT I
none Wu, Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
Chiang, Hierarchical Phrased-Based Translation
Chiang, An Introduction to Synchronous Grammars
14 Feb 28 Syntax in MT II
none Galley et al., What’s in a translation rule?
Galley et al., Scalable inference and training of context-rich syntactic translation models
Zollmann and Venugopal, Syntax augmented machine translation via chart parsing
Ambati et al., Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Hanneman et al., A General-Purpose Rule Extractor for SCFG-Based Machine Translation
14 Mar 05 Syntax Decoding
none Huang and Chiang, Forest rescoring: faster decoding with integrated language models
15 Mar 07 Decoding
none Germann et al., Fast Decoding and Optimal Decoding for Machine Translation
Knight, Decoding Complexity in Word-Replacement Translation Models
Rush and Collins, Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
16 Mar 19 Synchronous Parsing
none Dyer, Two monolingual parses are better than one (synchronous parse)
17 Mar 21 Large Scale Language Modeling
Lecturer: Kenneth Heafield
none Talbot and Osborne, Smoothed Bloom filter language models; Tera-Scale LMs on the Cheap
18 Mar 26 System Combination
none Rosti et al., Combining outputs from multiple machine translation systems
Heafield and Lavie, Voting on N-grams for Machine Translation System Combination
Fiscus, A post-processing system to yield reduced word error rates; Recognizer Output Voting Error Reduction (ROVER)
19 Mar 28 Morphology in MT II
none Koehn and Hoang, Factored Translation Models
Yeniterzi and Oflazer, Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
Fraser et al., Modeling Inflection and Word-Formation in SMT
Dyer et al., Genralizing Word Lattice Translation
20 Apr 02 Topics in Modeling
none Xiong et al., Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation
Mauser et al., Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models
Blunsom et al., A Discriminative Latent Variable Model for Statistical Machine Translation
Levenberg et al., A Bayesian Model for Learning SCFGs with Discontiguous Rules
21 Apr 04 Nizar Habash talk (GHC 6105)
none Salloum and Habash, Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
El Kohly and Habash, Orthographic and morphological processing for English–Arabic statistical machine translation
Habash and Sadat, Arabic Preprocessing Schemes for Statistical Machine Translation
22 Apr 09 Example-Based MT
Lecturer: Ralf Brown
none Somers, Review Article; Example-based Machine Translation
Brown, The CMU-EBMT machine translation system
Phillips, Cunei; Open-Source Machine Translation with Relevance-Based Models of Each Translation Instance
23 Apr 11 Mining Parallel Data
Lecturer: Wang Ling
none Resnik and Smith, The Web as a Parallel Corpus
Uszkoreit et al., Large Scale Parallel Document Mining for Machine Translation
24 Apr 16 Quality Estimation
none Specia et al., Predicting Machine Translation Adequacy
Specia and Giménez, Combining Confidence Estimation and Reference-based Metrics for Segment-level MT Evaluation
Soricut et al., The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task
Callison-Burch et al., Findings of the 2012 Workshop on Statistical Machine Translation
25 Apr 18 NO CLASS (Carnival)
26 Apr 23 Commercial MT
27 Apr 25 Bitext++
none Hwa et al., Bootstrapping parsers via syntactic projection across parallel texts
Bannard and Callison-Burch, Paraphrasing with bilingual parallel corpora
Schneider et al., Supersense tagging for Arabic; the MT-in-the-middle attack
29 May 02 Course Wrap-up


This semester we will be using Piazza for class discussion. The system is designed to get you help fast and efficiently from classmates, Wang, Alon, and me. Rather than emailing questions to the instructors, we encourage you to post your questions on Piazza. It supports LaTeX for equations, syntax highlighting for code and keeps all materials related to the course in one place.

Find our class page here.


  • cdec is a machine translation research platform developed at CMU (C++)
  • Moses is a widely-used machine translation toolkit that includes phrase-based and syntactic model support (C++)
  • Joshua is a translation toolkit designed for syntax-based models (Java)
  • Cubit is a very simple phrase-based decoder (Python)
  • SRILM is SRI’s language modeling toolkit (C++)
  • KenLM is a highly optimized library for representing and querying $n$-gram language models (C++)
  • BerkeleyLM is another $n$-gram language model library (Java)
  • Giza++ implements EM training for the IBM translation models and is widely used for word alignment (C++)

Freely Available (Parallel) Corpora

  • WIT3 is a transcribed corpus of TED talks in many languages (small/medium).
  • The 2012 Workshop on Machine Translation Shared Task distributes European Parliament (large) and News Commentary (medium) parallel data as well as standard development and test sets.
  • OPUS is a growing collection of parallel data in many domains and languages.

Other MT Courses

Creative Commons License Unless otherwise indicated, this content has been adapted from this course by Chris Dyer. Both the original and new content are licensed under a Creative Commons Attribution 3.0 Unported License.