Natural Language Processing

11-411 for undergrads | 11-611 for grads

Course Description

This course is about a variety of ways to represent human languages (like English and Chinese) as computational systems, and how to exploit those representations to write programs that do neat stuff with text and speech data, like
  • translation,
  • summarization,
  • extracting information,
  • question answering,
  • natural interfaces to databases, and
  • conversational agents.

This field is called Natural Language Processing or Computational Linguistics, and it is extremely multidisciplinary. This course will therefore include some ideas central to Machine Learning and to Linguistics.

meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches to applications like translation and information extraction.

From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.

Course Prerequisites

CS courses on data structures and algorithms, and strong programming skills.

Lectures

Date Topic Readings Assignments
1 Aug 30 Course overview; What does it mean to know language?
Slides
Chap 1
2 Sep 1 Information extraction, question answering, and NLP
in IR
Slides
Chap 22.0-2, 23.0-2
3 Sep 6 Project
Slides    Notes    Example video
4 Sep 8 Words, morphology, and lexicons
Slides
Chap 3.1-3.9
5 Sep 13 Language models and smoothing
Slides
Chap 4.3-8 Assignment 1 due
6 Sep 15 Noisy channel models and edit distance
Slides
Chap 3.10, 3.11, 5.9
7 Sep 20 Classification
Slides
Assignment 2 due
8 Sep 22 Part of speech tags
Slides
Chap 5.0-3 Project Initial Report due
9 Sep 27 Hidden Markov models
Slides
Chap 6.0-4 Assignment 3 due

10 Sep 29 Syntactic representations of natural language
Slides
Chap 12.0-3
11 Oct 4 Chomsky hierarchy and natural language
Slides
Chap 16 Assignment 4 due

12 Oct 6 Context-free recognition, CKY
Slides
13 Oct 11 Parsing algorithms
Slides
Chap 13 Assignment 5 due
14 Oct 13 Parsing algorithms contd.
Slides
Chap 12.7, Chap 14-14.2 Project Progress Report due
15 Oct 18 Revision
Oct 20 Midterm
Practice Problems    Practice Solutions
16 Oct 25 Treebanks and PCFGs
Slides
Chap 12.4, 14.7
17 Oct 27 Lexical semantics
Slides
Chap 17.0-2, 19.0-3 Project Progress Report II due

18 Nov 1 Word embeddings/vector semantics
Slides
JM v3 Chap 19 Assignment 6 due
19 Nov 3 Verb/sentence semantics
Slides A
Slides B
Chap 17.2-4, Chap 19.4-6
20 Nov 8 Compositional semantics, semantic parsing
Slides
Chap 18.1-3 Assignment 7 due
20 Nov 10 Discourse, entity linking, pragmatics
Slides
21 Nov 15 Word Sense Disambiguation and Semantic Role Labelling
Slides
Chap 20.0-6, 20.8-11 Assignment 8 due
22 Nov 17 Speech 1
Slides
23 Nov 22 Speech 2
Slides
24 Nov 24 No lecture
25 Nov 29 Machine translation
Slides
Chap 25.0-1, 25.9 Final project submission due (on project server)
26 Dec 1 Deep Learning
Slides
Final project report due (via YouTube)
Dec 5 Question evaluations due
27 Dec 6 Non-English NLP
Slides
Slides
28 Dec 8 Conclusion
Slides I
Slides II
Slides III
Answer evaluations due
TBDFinal exam
Practice problems
Practice problem solutions

Competitive Project

A major component will be the project: build a program whose input is a web page P and whose output is a set of questions about the content in P (that a human could answer if she read P), and can also, if given a question Q about the content of P, answer the question intelligently. Projects will be pitted against each other in a competition at the end of the course.

Evaluation

Students will be evaluated by exam (midterm and final, totaling 40%), regular short quizzes and weekly pencil-and-paper or small programming homework problems (30% together), and the group project (30%).

FAQ

Should I take this course?

Yes, if:

  • you're a CS student interested in languages, language technology, or information processing
  • you're a CS student who needs an "applications" credit
  • you're a language technology minor (this course is an elective option)
  • you're a linguistics student who can write computer programs (this course is an elective option)
  • you always suspected natural language was kind of like Lisp (or Java or ...)
  • you want computers to take over the world
  • you don't want computers to take over the world, but if they do, you want to negotiate your release
  • you like AI, machine learning, and/or theoretical computer science, and want to apply them to a hard real-world problem

Related courses elsewhere (not exhaustive!)

University of California, Berkeley, Brown University, University of Colorado, Columbia University, Cornell University, University of Illinois at Urbana-Champaign, Johns Hopkins University, University of Maryland, New York University, University of Pennsylvania, Stanford University, University of Utah, University of Wisconsin-Madison