This field is called Natural Language Processing or Computational Linguistics, and it is extremely multidisciplinary. This course will therefore include some ideas central to Machine Learning and to Linguistics.
We'll cover computational treatments of words, sounds, sentences, meanings, and conversations. We'll see how probabilities and real-world text data can help. We'll see how different levels interact in state-of-the-art approaches to applications like translation and information extraction.
From a software engineering perspective, there will be an emphasis on rapid prototyping, a useful skill in many other areas of Computer Science.
CS courses on data structures and algorithms, and strong programming skills.
|Date||Topic||Readings|| Assignments and |
Course overview; What does it mean to know language?
Information extraction, question answering, and NLP in IR
|Chap 22.0-2, 23.0-2|
|4||Jan 24||Words, morphology, and lexicons||Chap 3.1-3.9||Assignment 1 out|
|5||Jan 29||Language models and smoothing||Chap 4.3-8|
|6||Jan 31||Noisy channel models and edit distance||Chap 3.10, 3.11, 5.9||Assignment 1 due Assignment 2 out|
|8||Feb 7||Part of speech tags||Chap 5.0-3||Assignment 2 due Assignment 3 out|
|9||Feb 12||Hidden Markov models||Chap 6.0-4|
|10||Feb 14||Syntactic representations of natural language||Chap 12.0-3||Assignment 3 due Assignment 4 out|
|11||Feb 19||Chomsky hierarchy and natural language||Chap 15|
|12||Feb 21||Context-free recognition, CKY|
|13||Feb 26||Parsing algorithms||Chap 13|
|14||Feb 28||Parsing algorithms contd.||Chap 12.7, Chap 14-14.2||Assignment 4 due Assignment 5 out Project Progress Report due 11:59pm|
Treebanks and PCFGs
||Chap 12.4, 14.7|
||Chap 17.0-2, 19.0-3|
Word embeddings/vector semantics
||SLP3 Chap 6||Assignment 5 due Assignment 6 out|
||Chap 17.2-4, Chap 19.4-6|
Compositional semantics, semantic parsing
Word Sense Disambiguation and Semantic Role Labelling
Discourse, entity linking, pragmatics
||Chap 20.0-6, 20.8-11||Assignment 6 due Assignment 7 out|
||Project dry run code due 11:59 PM|
|24||Apr 18||Non-English NLP
||Assignment 7 due|
Interpreting Social Media
|26||Apr 25||Machine Translation
||Chap 25.0-1, 25.9||Final Project code due 11:59 PM|
||Final Project report due 11:59pm|
A major component will be the project: build a program whose input is a web page P and whose output is a set of questions about the content in P (that a human could answer if she read P), and can also, if given a question Q about the content of P, answer the question intelligently. Projects will be pitted against each other in a competition at the end of the course.
Students will be evaluated by exam (midterm and final, totaling 40%), regular short quizzes and weekly pencil-and-paper or small programming homework problems (30% together), and the group project (30%).
Should I take this course?