Machine Translation

11-731 | Spring 2015

Syllabus
Grading
Homeworks (Leaderboard)

Machine Translation | 11-731 | Spring 2015

At a Glance

Overview

Welcome to Machine Translation (11-731). This 12-credit graduate course will provide a comprehensive overview of current techniques in statistical machine translation, such as those used by Google Translate, Bing Translator, and Baidu Online Translate.

Syllabus

DateTopicBookFurther Readings
1 Jan 13 Introduction
[pdf]
L-in-10: Finnish
SMT Ch. 1 Knight, Automating Knowledge Acquisition for Machine Translation
Weaver, Translation
2 Jan 15 Probability Review
[pdf]
SMT Ch. 3 Johnson, Joint and conditional estimation of tagging and parsing models
Klein and Manning, Maxent Models, Conditional Estimation, and Optimization, without the Magic
3 Jan 20 Language Models
[pdf]
SMT Ch. 4.3, 7 Kneser and Ney, Improved Backing-Off for m-Gram Language Modeling
Pauls and Klein, Large-Scale Syntactic Language Modeling with Treelets
4 Jan 22 Lexical Translation Models I
[pdf]
SMT Ch. 4.1-4.2, 4.5 Collins, Statistical Machine Translation; IBM Models 1 and 2
Dyer et al., A Simple, Fast, and Effective Reparameterization of IBM Model 2
5 Jan 27 Lexical Translation Models II
[pdf]
L-in-10: Turkmen
SMT Ch. 4.4, 4.6 Brown et al., The Mathematics of Statistical Machine Translation; Parameter Estimation
Vogel and Ney, HMM-based word alignment in statistical translation
6 Jan 29 Microblogs and Translation
L-in-10: Mongolian
none Ling et al., Microblogs as Parallel Corpora
Ling et al., Paraphrasing 4 Microblog Translation
Eisenstein, What to do about bad language on the internet
7 Feb 03 Discriminative Alignment Models
[pdf]
L-in-10: Bengali
none Blunsom and Cohn, Discriminative Word Alignment with Conditional Random Fields
Ammar et al., Conditional Random Field Autoencoders for Unsupervised Structured Prediction
8 Feb 05 Evaluation I
[pdf]
SMT Ch. 8 Callison-Burch et al., Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Snover et al., Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
9 Feb 10 Evaluation II
[pdf]
SMT Ch. 8 Lavie and Denkowski, The METEOR Metric for Automatic Evaluation of Machine Translation
Papineni et al., BLEU: a Method for Automatic Evaluation of Machine Translation
Snover et al., A Study of Translation Edit Rate with Targeted Human Annotation
10 Feb 12 Noisy Channel Translation
none Norvig, How to Write a Spelling Corrector
Yuret and Yatbaz, The Noisy Channel Model for Unsupervised Word Sense Disambiguation
11 Feb 17 Phrase-based Machine Translation I
L-in-10: Tamil
SMT Ch. 5 Marcu and Wong, A Phrase-Based, Joint Probability Model for Statistical Machine Translation
DeNero et al., Sampling Alignment Structure under a Bayesian Translation Model
12 Feb 19 Decoding: Dynamic Programming and Beam Search
[pdf]
L-in-10: Hungarian
SMT Ch. 5 Koehn et al., Statistical Phrase-Based Translation
Mariño et al., N-gram-based Machine Translation
13 Feb 24 Discriminative Training I
[pdf]
L-in-10: Siraya
none Och and Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
Macherey et al., Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Smith and Eisner, Minimum Risk Annealing for Training Log-Linear Models
14 Feb 26 Discriminative Training II
[pdf]
none An end-to-end discriminative approach to machine translation, Liang et al.
Gimpel and Smith, Structured Ramp-Loss Minimization for Machine Translation
15 Mar 03 Hierarchical phrase-based models
none Wu, Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
Chiang, Hierarchical Phrased-Based Translation
Chiang, An Introduction to Synchronous Grammars
Mar 05 LECTURE BY KYUNGHYUN CHO (GHC 6115)
none
Mar 10 SPRING BREAK
none
Mar 12 SPRING BREAK
none
16 Mar 17 Decoding with Hierarchical Models
none Germann et al., Fast Decoding and Optimal Decoding for Machine Translation
Knight, Decoding Complexity in Word-Replacement Translation Models
Rush and Collins, Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
17 Mar 19 Synchronous parsing
none Dyer, Two monolingual parses are better than one (synchronous parse)
18 Mar 24 Syntax in MT
[pdf]
L-in-10: German
none Galley et al., What’s in a translation rule?
Galley et al., Scalable inference and training of context-rich syntactic translation models
Zollmann and Venugopal, Syntax augmented machine translation via chart parsing
Ambati et al., Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Hanneman et al., A General-Purpose Rule Extractor for SCFG-Based Machine Translation
19 Mar 26 Morphology in MT
[pdf]
none Koehn and Hoang, Factored Translation Models
Toutanova et al., Applying Morphology Generation Models to Machine Translation
Chahuneau et al., Translating into Morphologically Rich Languages with Synthetic Phrases
20 Mar 31 Semantics in MT
none
21 Apr 02 Quality estimation
[pdf]
none Specia et al., Predicting Machine Translation Adequacy
Specia and Giménez, Combining Confidence Estimation and Reference-based Metrics for Segment-level MT Evaluation
Soricut et al., The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task
Callison-Burch et al., Findings of the 2012 Workshop on Statistical Machine Translation -
Many, AMTA Commercial MT Proceedings
Apr 07 LECTURE BY CHRIS CALLISON-BURCH (GHC 6501)
none
22 Apr 09 System combination
[pdf]
none Rosti et al., Combining outputs from multiple machine translation systems
Heafield and Lavie, Voting on N-grams for Machine Translation System Combination
Fiscus, A post-processing system to yield reduced word error rates; Recognizer Output Voting Error Reduction (ROVER)
23 Apr 14 Neural MT I
none Devlin, et al., Fast and Robust Neural Network Joint Models for Statistical Machine Translation
Kalchbrenner, et al., A Convolutional Neural Network for Modelling Sentences
Apr 16 NO CLASS (Carnival)
none
24 Apr 21 Neural MT II
none
26 Apr 23 Bitext++
none Hwa et al., Bootstrapping parsers via syntactic projection across parallel texts
Bannard and Callison-Burch, Paraphrasing with bilingual parallel corpora
Schneider et al., Supersense tagging for Arabic; the MT-in-the-middle attack
28 Apr 28 POSTERS
none
29 Apr 30 Course wrap-up
none

Piazza

This semester we will be using Piazza for class discussion. The system is designed to get you help fast and efficiently from classmates, Austin, Alon, and me. Rather than emailing questions to the instructors, we encourage you to post your questions on Piazza. It supports LaTeX for equations, syntax highlighting for code and keeps all materials related to the course in one place.

Find our class page here.

Software

  • cdec is a machine translation research platform developed at CMU (C++)
  • Moses is a widely-used machine translation toolkit that includes phrase-based and syntactic model support (C++)
  • Joshua is a translation toolkit designed for syntax-based models (Java)
  • Phrasal is the Stanford phrase based translation engine (Java)
  • Cubit is a very simple phrase-based decoder (Python)
  • GroundHog is a recurrent neural network toolkit based on Theano (Python)
  • KenLM is a highly optimized library for representing and querying $n$-gram language models (C++)
  • SRILM is SRI’s language modeling toolkit (C++)
  • BerkeleyLM is another $n$-gram language model library (Java)
  • fast_align is a simple extension of IBM Model 1 that substantially improves performance (C++)
  • Giza++ implements EM training for the IBM translation models (C++)

Freely Available (Parallel) Corpora

  • WIT3 is a transcribed corpus of TED talks in many languages (small/medium).
  • The 2015 Workshop on Machine Translation Shared Task distributes European Parliament (large), News Commentary (medium) parallel data in several European languages, as well as standard development and test sets. Large amounts of monolingual data are also available here.
  • OPUS is a growing collection of parallel data in many domains and languages.

Other MT Courses

Creative Commons License Unless otherwise indicated, this content has been adapted from this course by Chris Dyer. Both the original and new content are licensed under a Creative Commons Attribution 3.0 Unported License.