CMU CS 11711, Fall 2018
T/Th 1:30-2:50pm, GHC 4307
Yulia Tsvetkov (office hours: Thu 3-4pm, GHC 6405), ytsvetko@cs.cmu.edu
Robert Frederking (office hours: TBD), ref@cs.cmu.edu
Teaching Assistants:
Aldrian Obaja Muis (office hours: Thu 3-4pm, GHC 6404), amuis@cs.cmu.edu
Maria Ryskina (office hours: Mon 4-5pm, GHC 5417), mryskina@cs.cmu.edu
Sachin Kumar (office hours: Wed 3-4pm, GHC 5417), sachink@cs.cmu.edu
Forum: Piazza
This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning, with a split focus between supervised and unsupervised methods. This term we are making Algorithms for NLP a lab-based course. Instead of homeworks and exams, you will complete four hands-on coding projects. This course assumes a good background in basic probability and a strong ability to program in Java. Prior experience with linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class.
Slides, materials, and projects for this iteration of Algorithms for NLP are borrowed from Dan Jurafsky at Stanford, Dan Klein and David Bamman at UC Berkeley and Nathan Schneider at Georgetown
The lecture plan is subject to change.
Week | Date | Topics | Readings | Homeworks | |
---|---|---|---|---|---|
1 | Aug 28 | Course Introduction [slides] | J+M II 1, M+S 1-3 | ||
Aug 30 | Language Modeling I [slides] | J+M II 4, M+S 6, Chen & Goodman, Interpreting KN | P1: Language Modeling | ||
2 | Sep 4 | Language Modeling II [slides] | Massive Data, Bloom, Perfect, Efficient LMs | ||
Sep 6 | Language Modeling III [slides] | ||||
3 | Sep 11 | Vector Semantics and Word Embeddings I [slides] | J+M III 6, Turney and Pantel'10, Brown | ||
Sep 13 | Word Embeddings II [slides] | FastText, ELMo | |||
4 | Sep 18 | Speech Recognition I [slides] | J+M II 7 | ||
Sep 20 | Speech Recognition II, HMMs [slides] | J+M II 9, J+M III Appendix A | |||
5 | Sep 25 | POS Tagging, NER, CRFs [slides] | J+M 5, Brants, Toutanova & Manning | ||
Sep 27 | Formal Grammar [slides] | M+S 3.2, 12.1, J+M II 13, J+M III 10 | |||
6 | Oct 2 | Parsing I [slides] | M+S 3.2, 11, 12.1, J+M II 13, 14, Unlexicalized | P2: Parsing | |
Oct 4 | Parsing II [slides] | Coarse-to-fine | |||
7 | Oct 9 | Structured Classification I [slides] | Pegasos, Cutting Plane | ||
Oct 11 | Structured Classification II [slides] | J+M II 16, 18, 19, Adagrad, Subgradient SVM | |||
8 | Oct 16 | Parsing III [slides] | Split, Lexicalized, K-Best A* | ||
Oct 18 | Parsing IV: Dependency Parsing [slides] | ||||
9 | Oct 23 | Machine Translation: Alignment I [slides] | J+M II 25, IBM Models, HMM, Agreement, Discriminative | ||
Oct 25 | Machine Translation: Alignment II [slides] | IBM Models I and II, fastalign, EM Algorithm | |||
10 | Oct 30 | Machine Translation: Phrase-Based [slides] | Decoding | ||
Nov 1 | Morphology; Features and Unification [slides_1], [slides_2] | J+M II 3, J+M II 15 (Note: errors in textbook) | P3: Discriminative Reranking | ||
11 | Nov 6 | Representing Meaning [slides] | J+M II 17, J+M II 18 (Note: errors in textbook) | ||
Nov 8 | CCG [slides] | J+M II 12.7.2 | |||
12 | Nov 13 | Lexical Semantics and Frame Semantic Parsing [slides] | J+M II 19, 20.6-20.9 | ||
Nov 15 | Computational Discourse [slides] | J+M II 21 | P4: Machine Translation | ||
13 | Nov 20 | Computational Social Science (Guest Lecture by Anjalie Field) [slides] | LDA, Comp Social Science, Comp Sociolinguistics | ||
Nov 22 | Thanksgiving Day | ||||
14 | Nov 27 | Sentiment Analysis [slides] | J+M III 4, 9 | ||
Nov 29 | Neural Machine Translation [slides] | NMT, NMT with Attention | |||
15 | Dec 4 | Ethics [slides] |
The primary recommended texts for this course are:
Make sure you get the purple 2nd edition of J+M, not the white 1st edition.
This is a project based course and grading will be done based on 4 homework assignments each contributing to 25% of your final grade.
Project Submission
Submit projects using the class Canvas site.
Prepare a directory named ‘project’ containing no more than 3 files: (a) a jar named ‘submit.jar’, (b) a pdf named ‘writeup.pdf’, and (c) an optional jar named ‘best.jar’. The jar named ‘submit.jar’ should contain your implementation of the core project that passes the basic requirements. For example, for project 1, the jar named ‘assign1-submit.jar’ is all that you would need to turn in – renaming it ‘submit.jar’. The pdf ‘writeup.pdf’ should contain your writeup for the project. Finally, the file ‘best.jar’ is an optional additional jar that implements the core project, but need not pass spot-checks. Include this last jar if you wish to demonstrate an improvement over the basic project, possibly using approximations are alternative models.
Compress the ‘project’ directory you created in the last step using the command ‘tar cvfz project.tgz project’.
Click on the assignments tab of the main Canvas course site and select the assignment corresponding to the project (e.g. Assignment 1 corresponds to Project 1). Click ‘Submit assignment’ button to open submission portal, then click ‘Choose file’ and select your compressed project directory ‘project.tgz’ created in the previous step. Finally, click the ‘Submit assignment’ button below.
Project Grading
Projects out of 10 points total:
Late policy. Each student will be granted 5 late days to use over the duration of the semester. There are no restrictions on how the late days can be used (e.g. all 5 could be used on one project.) Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!
Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. Suspected violations of academic integrity rules will be handled in accordance with the CMU guidelines on collaboration and cheating.
Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. CMU services are available, and treatment does work. You can learn more about confidential mental health services available on campus at: http://www.cmu.edu/counseling/. Support is always available (24/7) from Counseling and Psychological Services: 412-268-2922.
Accommodations for Students with Disabilities:
If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.