Machine Translation

11-731 | Spring 2014

Homeworks (Leaderboard)

Machine Translation | 11-731 | Spring 2014

At a Glance


Welcome to Machine Translation (11-731). This 12-credit graduate course will provide a comprehensive overview of current techniques in statistical machine translation, such as those used by Google Translate and Bing Translator.


DateTopicBookFurther Readings
1 Jan 14 Introduction
SMT Ch. 1 Knight, Automating Knowledge Acquisition for Machine Translation
Weaver, Translation
2 Jan 16 Probability Review
SMT Ch. 3 Johnson, Joint and conditional estimation of tagging and parsing models
Borman, The Expectation Maximization Algorithm
Klein and Manning, Maxent Models, Conditional Estimation, and Optimization, without the Magic
Lafferty et al., Conditional Random Fields
3 Jan 21 Language Models
SMT Ch. 4.3, 7 Kneser and Ney, Improved Backing-Off for m-Gram Language Modeling
Pauls and Klein, Large-Scale Syntactic Language Modeling with Treelets
Mnih and Hinton, Three New Graphical Models for Statistical Language Modelling
Teh, A Hierarchical Bayesian Language Model based on Pitman-Yor Processes
4 Jan 23 Lexical Translation Models I
SMT Ch. 4.1-4.2, 4.5 Brown et al., A Statistical Approach to Machine Translation
Collins, Statistical Machine Translation; IBM Models 1 and 2
Dyer et al., A Simple, Fast, and Effective Reparameterization of IBM Model 2
5 Jan 28 Lexical Translation Models II
SMT Ch. 4.4, 4.6 Brown et al., The Mathematics of Statistical Machine Translation; Parameter Estimation
Vogel and Ney, HMM-based word alignment in statistical translation
Dyer et al., Unsupervised Word Alignment with Arbitrary Features
Yamada and Knight, A Syntax-based Statistical Translation Model
6 Jan 30 Noisy Channel Translation
none Norvig, How to Write a Spelling Corrector
Kernighan et al., A Spelling Correction Program Based on a Noisy Channel Model
Yuret and Yatbaz, The Noisy Channel Model for Unsupervised Word Sense Disambiguation
7 Feb 04 Phrase-based MT I
SMT Ch. 5 Marcu and Wong, A Phrase-Based, Joint Probability Model for Statistical Machine Translation
DeNero et al., Sampling Alignment Structure under a Bayesian Translation Model
8 Feb 06 MT tools
9 Feb 11 Lexical decoding
SMT Ch. 6 Koehn et al., Statistical Phrase-Based Translation
10 Feb 13 Phrase-based MT II
SMT Ch. 6 Koehn et al., Statistical Phrase-Based Translation
11 Feb 18 Evaluation I
L-in-10: Dialectal Arabic
none Callison-Burch et al., Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Snover et al., Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
12 Feb 20 Evaluation II
none Lavie and Denkowski, The METEOR Metric for Automatic Evaluation of Machine Translation
Papineni et al., BLEU: a Method for Automatic Evaluation of Machine Translation
Snover et al., A Study of Translation Edit Rate with Targeted Human Annotation
Dreyer and Marcu, HyTER: Meaning-Equivalent Semantics for Translation Evaluation
13 Feb 25 HW1 and HW2 review
L-in-10: Esperanto
14 Feb 27 Mining Parallel Data
L-in-10: Portuguese
none Resnik and Smith, The Web as a Parallel Corpus
Uszkoreit et al., Large Scale Parallel Document Mining for Machine Translation
15 Mar 04 Hierarchical phrase-based models
none Wu, Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
Chiang, Hierarchical Phrased-Based Translation
Chiang, An Introduction to Synchronous Grammars
16 Mar 06 Syntax in MT I
none Galley et al., What’s in a translation rule?
Galley et al., Scalable inference and training of context-rich syntactic translation models
Zollmann and Venugopal, Syntax augmented machine translation via chart parsing
Ambati et al., Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Hanneman et al., A General-Purpose Rule Extractor for SCFG-Based Machine Translation
Mar 11 NO CLASS (Spring Break)
Mar 13 NO CLASS (Spring Break)
17 Mar 18 Syntax decoding
none Germann et al., Fast Decoding and Optimal Decoding for Machine Translation
Knight, Decoding Complexity in Word-Replacement Translation Models
Rush and Collins, Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
18 Mar 20 Synchronous Parsing
L-in-10: Hindi
none Dyer, Two monolingual parses are better than one (synchronous parse)
19 Mar 25 Cube pruning
none Huang and Chiang, Better k-best Parsing
Huang and Chiang, Forest Rescoring- Faster Decoding with Integrated Language Models
20 Mar 27 Topics in Modeling
none Xiong et al., Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation
Mauser et al., Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models
Blunsom et al., A Discriminative Latent Variable Model for Statistical Machine Translation
Levenberg et al., A Bayesian Model for Learning SCFGs with Discontiguous Rules
22 Apr 03 Discriminative Training I
none Och and Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
Macherey et al., Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Smith and Eisner, Minimum Risk Annealing for Training Log-Linear Models
23 Apr 08 Discriminative Training II
L-in-10: Esperanto
none An end-to-end discriminative approach to machine translation, Liang et al.
Gimpel and Smith, Structured Ramp-Loss Minimization for Machine Translation
Apr 10 NO CLASS (Carnival)
24 Apr 15 System Combination
none Rosti et al., Combining outputs from multiple machine translation systems
Heafield and Lavie, Voting on N-grams for Machine Translation System Combination
Fiscus, A post-processing system to yield reduced word error rates; Recognizer Output Voting Error Reduction (ROVER)
25 Apr 17 Quality estimation Commercial MT
L-in-10: Portuguese
none Specia et al., Predicting Machine Translation Adequacy
Specia and Giménez, Combining Confidence Estimation and Reference-based Metrics for Segment-level MT Evaluation
Soricut et al., The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task
Callison-Burch et al., Findings of the 2012 Workshop on Statistical Machine Translation -
Many, AMTA Commercial MT Proceedings
26 Apr 22 Bitext++
L-in-10: Sanskrit
none Hwa et al., Bootstrapping parsers via syntactic projection across parallel texts
Bannard and Callison-Burch, Paraphrasing with bilingual parallel corpora
Schneider et al., Supersense tagging for Arabic; the MT-in-the-middle attack
27 Apr 24 Neural MT
L-in-10: Russian, French
none Devlin, et al., Fast and Robust Neural Network Joint Models for Statistical Machine Translation
Kalchbrenner, et al., A Convolutional Neural Network for Modelling Sentences
29 May 01 Course Wrap-up


This semester we will be using Piazza for class discussion. The system is designed to get you help fast and efficiently from classmates, Wang, Alon, and me. Rather than emailing questions to the instructors, we encourage you to post your questions on Piazza. It supports LaTeX for equations, syntax highlighting for code and keeps all materials related to the course in one place.

Find our class page here.


  • cdec is a machine translation research platform developed at CMU (C++)
  • Moses is a widely-used machine translation toolkit that includes phrase-based and syntactic model support (C++)
  • Joshua is a translation toolkit designed for syntax-based models (Java)
  • Cubit is a very simple phrase-based decoder (Python)
  • SRILM is SRI’s language modeling toolkit (C++)
  • KenLM is a highly optimized library for representing and querying $n$-gram language models (C++)
  • BerkeleyLM is another $n$-gram language model library (Java)
  • Giza++ implements EM training for the IBM translation models and is widely used for word alignment (C++)

Freely Available (Parallel) Corpora

  • WIT3 is a transcribed corpus of TED talks in many languages (small/medium).
  • The 2012 Workshop on Machine Translation Shared Task distributes European Parliament (large) and News Commentary (medium) parallel data as well as standard development and test sets.
  • OPUS is a growing collection of parallel data in many domains and languages.

Other MT Courses

Creative Commons License Unless otherwise indicated, this content has been adapted from this course by Chris Dyer. Both the original and new content are licensed under a Creative Commons Attribution 3.0 Unported License.