This course is about a variety of ways to represent human languages (like English and Chinese) as computational systems, and how to exploit those representations to write programs that do neat stuff with text and speech data, like
This field is called Natural Language Processing or Computational Linguistics, and it is extremely multidisciplinary. This course will therefore include some ideas central to Machine Learning and to Linguistics.
We'll cover computational treatments of words, sounds, sentences, meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches to applications like translation and information extraction.
From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.
CS courses on data structures and algorithms, and strong programming skills.
Course overview; What does it mean to know language?
Information extraction, question answering, and NLP in IR
|Chap 22.0-2, 23.0-2||Assignment 1 out|
Words, morphology, and lexicons
Assignment 1 due
Assignment 2 out
Language models and smoothing
Noisy channel models and edit distance
||Chap 3.10, 3.11, 5.9||
Assignment 2 due
Assignment 3 out
Part of speech tags
Hidden Markov models
Assignment 3 due
Assignment 4 out
Project Initial Plan due
|10||Feb 13||Classification 2
Assignment 4 due
Assignment 5 out
Syntactic representations of natural language
Chomsky hierarchy and natural language
Assignment 5 due
Assignment 6 out
Context-free recognition, CKY
||Project Progress Report due 11:59pm|
||Chap 12.7, Chap 13, Chap 14-14.2|
Treebanks and PCFGs
||Chap 12.4, 14.7|
|—||Mar 5||Midterm exam
|—||Mar 9-13||Spring Break|
||Chap 17.0-2, 19.0-3|
Word embeddings/vector semantics
||SLP3 Chap 6|
||Chap 17.2-4, Chap 19.4-6|
Compositional semantics, semantic parsing
Assignment 6 due
Assignment 7 out
|20||Mar 31||Discourse, entity linking, pragmatics
||Chap 20.0-6, 20.8-11|
|21||Apr 2||Sentiment Analysis and Computational Argumentation
||Project dry run code due 11:59 PM|
||Assignment 7 due|
Interpreting Social Media
||Chap 25.0-1, 25.9||Final Project code due 11:59 PM|
||Final Project report due May 5, 11:59pm|
A major component will be the project: build a program whose input is a web page P and whose output is a set of questions about the content in P (that a human could answer if she read P), and can also, if given a question Q about the content of P, answer the question intelligently. Projects will be pitted against each other in a competition at the end of the course.
Students will be evaluated by exam (midterm and final, totaling 40%), regular short quizzes and weekly pencil-and-paper or small programming homework problems (30% together), and the group project (30%).
Should I take this course?
University of California, Berkeley, Brown University, University of Colorado, Columbia University, Cornell University, University of Illinois at Urbana-Champaign, Johns Hopkins University, University of Maryland, New York University, University of Pennsylvania, Stanford University, University of Utah, University of Wisconsin-Madison