This field is called Natural Language Processing or Computational Linguistics, and it is extremely multidisciplinary. This course will therefore include some ideas central to Machine Learning and to Linguistics.
We'll cover computational treatments of words, sounds, sentences, meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches to applications like translation and information extraction.
From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.
CS courses on data structures and algorithms, and strong programming skills.
|Date||Topic||Readings|| Assignments and |
Course overview; What does it mean to know language?
Information extraction, question answering, and NLP
|Chap 22.0-2, 23.0-2|
Slides Example video
Words, morphology, and lexicons
Language models and smoothing
Noisy channel models and edit distance
|Chap 3.10, 3.11, 5.9||Assignment 1 due|
Part of speech tags
|Chap 5.0-3||Assignment 2 due|
Hidden Markov models
Syntactic representations of natural language
|Chap 12.0-3||Assignment 3 due|
Chomsky hierarchy and natural language
Context-free recognition, CKY
|Assignment 4 due|
|Chap 13||Preliminary Project Report due|
Parsing algorithms contd.
|Chap 12.7, Chap 14-14.2||Assignment 5 due|
Treebanks and PCFGs
|Chap 12.4, 14.7|
|—||Mar 12-16||Spring Break|
|Chap 17.0-2, 19.0-3|
Word embeddings/vector semantics
|JM v3 Chap 15 and Chap 16||Project Progress Report due|
||Chap 17.2-4, Chap 19.4-6|
Compositional semantics, semantic parsing
||Chap 18.1-3||Assignment 6 due|
Discourse, entity linking, pragmatics
Word Sense Disambiguation and Semantic Role Labelling
||Chap 20.0-6, 20.8-11||Assignment 7 due|
||Project dry run code due|
Interpreting Social Media
||Final Project code due|
|26||Apr 26||Machine Translation ||Chap 25.0-1, 25.9|
||Final Project report due|
A major component will be the project: build a program whose input is a web page P and whose output is a set of questions about the content in P (that a human could answer if she read P), and can also, if given a question Q about the content of P, answer the question intelligently. Projects will be pitted against each other in a competition at the end of the course.
Students will be evaluated by exam (midterm and final, totaling 40%), regular short quizzes and weekly pencil-and-paper or small programming homework problems (30% together), and the group project (30%).
Should I take this course?