This course will explore foundational statistical techniques for the automatic analysis of natural (human) language text. Towards this end the course will introduce pragmatic formalisms for representing structure in natural language, and algorithms for annotating raw text with those structures. The dominant modeling paradigm is corpus-driven statistical learning, covering both supervised and unsupervised methods. Algorithms for NLP is a lab-based course. This means that instead of homeworks and exams, you will mainly be graded based on four hands-on coding projects.
Slides, materials, and projects for this iteration of Algorithms for NLP are borrowed from Jacob Eisenstein’s course at Georgia Tech, Dan Jurafsky at Stanford, Dan Klein and David Bamman at UC Berkeley, and Nathan Schneider at Georgetown University.
This course assumes a good background in basic probability and a strong ability to program in Python. Experience using numerical libraries such as NumPy and neural network libraries such as PyTorch are a plus. Prior experience with machine learning, linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class.
The lecture plan is subject to change.
|1||Sep 1||Course Introduction [slides]||Eis 1|
|Sep 3||Linear classification [slides]||Eis 2; J&M III 4; Pang et al. 2002|
|2||Sep 8||Nonlinear classification [slides]||Eis 3, 4.4; J&M III 5; Goldberg 4||P1: Classification (Due Sept 25)|
|Sep 10||Lexical semantics & word embeddings [slides]||J&M III 6; Eis 14; Baroni et al. 2014|
|3||Sep 15||Language modeling [slides]||J&M III 3; Eis 6.1-6.2, 6.4; Teh 2006|
|Sep 17||Neural language modeling and contextualized word embeddings [slides]||Eis 6.3, 6.5; J&M III 7.5, 9; Goldberg 10; Understanding LSTM Networks; Peters et al. 2018|
|4||Sep 22||Sequence labeling I: POS tagging, HMMs, Viterbi [slides]||J&M III 8, Appendix A; Eis 7.1-7.4, 8.1; Collins notes|
|Sep 24||Sequence labeling II: NER, CRFs [slides]||Eis 7.5,7.7,8.3; Sutton & McCallum 2.1-2.5; Andor et al. 2016, section 3||P1 Due Tomorrow (11:59pm)|
|5||Sep 29||Neural sequence labeling [slides]||Eis 7.6; Collobert et al. 2011||P2: Sequence labeling (Due Oct 16)|
|Oct 1||Contextualized word embeddings II, Transformers [slides]||The Annotated Transformer; Devlin et al. 2019; Rogers et al. 2020; Schrimpf et al. 2020|
|6||Oct 6||Syntactic parsing, PCFGs [slides]||J&M III 12, 13, 14.1; Eis 10.1-10.2|
|Oct 8||PCFGs, CKY [slides]||J&M III 14; Eis 10.3-10.4|
|7||Oct 13||Dependency parsing I [slides]||J&M III 15.1-15.4; Eis 11.1, 11.3; Chen and Manning 2014|
|Oct 15||Dependency parsing II [slides] + Guest lecture by Andrew Drozdov||J&M III 15.5,15.6; Eis 11.2; Drozdov et al. 2019||P2 Due Tomorrow (11:59pm)|
|8||Oct 20||Formal languages & grammar [slides]||Eis 9.0-9.1.1, 9.2-9.2.2, 9.3-9.3.1;||P3: Parsing (Due Nov 6)|
|Oct 22||Morphology [slides]||J&M II 3; (Note: errors in textbook)|
|9||Oct 27||Features and Unification [slides]||J&M II 15;|
|Oct 29||Semantics and first-order logic [slides]||J&M II 17, 18; (Note: errors in textbook)|
|10||Nov 3||Pred-Arg Structure and Semantic Case Frames [slides]||J&M II 19.3-19.5;|
|Nov 5||CCG [slides]||J&M III 12.6.1; Artzi et al. 2013||P3 Due Tomorrow (11.59pm)|
|11||Nov 10||Discourse and Coreference [slides]||Eis 15;|
|Nov 12||Natural Language Inference and Interpretable NLP [slides]||LIME; Han et al. (2020)|
|12||Nov 17||No class - EMNLP||P4: Coreference (Due Dec 4)|
|Nov 19||No class - EMNLP|
|Nov 20||EMNLP mini-presentations|
|13||Nov 24||EMNLP mini-presentations||EMNLP Review Due Today (11.59pm)|
|Nov 26||No class - Thanksgiving Break|
|14||Dec 1||Machine Translation [slides]||J&M II 25; IBM Models; The Annotated Transformer|
|Dec 3||Summarization [slides]||Ch.1 in Nenkova & McKeown (2011); See et al. 2017|
|15||Dec 8||Sentiment Analysis [slides]||Pang & Lee (2008); Sentiment tutorial by Chris Potts|
|Dec 10||Computational Ethics + NLP|
|Dec 11||Algorithms for NLP Q&A||P4 Due Tomorrow (11.59pm)|
The primary recommended texts for this course are:
Project SubmissionAssignments will be submitted on the class Canvas page, and written assignments will also be shared with your peers on Piazza after the homework due date.
When you are ready to submit, make sure your writeup is included in the psetX directory in pdf format with the name psetX-writeup.pdf.
Run the script make-submission.sh that is included in each homework assignment folder in the github repo. This will generate a tarball psetX-submission.tgz that contains the code for your submission and your writeup.
Click on the assignments tab of the main Canvas course site and select the assignment corresponding to the project (e.g. Assignment 1 corresponds to Project 1). Click the Submit assignment button to open the submission portal, then click Choose file and select your compressed project directory psetX-submission.tgz created in the previous step. Finally, click the Submit assignment button.
Once we have received all submissions, we will create a post on Piazza for you to share your writeup with the rest of the class.
Late policy. Each student will be granted 5 late days to use over the duration of the semester. You can use a maximum of 3 late days on any one project. Weekends and holidays are also counted as late days. Late submissions are automatically considered as using late days. Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!
Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. Suspected violations of academic integrity rules will be handled in accordance with the CMU guidelines on collaboration and cheating.
Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. CMU services are available free to students, and treatment does work. You can learn more about confidential mental health services available on campus through Counseling and Psychological Services. Support is always available (24/7) at: 412-268-2922.
In this class, every individual will and must be treated with respect. The ways we are diverse are many and are fundamental to building and maintaining an equitable and an inclusive campus community. These include but are not limited to: race, color, national origin, caste, sex, disability (visible or invisible), age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information.
Research shows that greater diversity across individuals leads to greater creativity in the group. We at CMU work to promote diversity, equity and inclusion not only because it is necessary for excellence and innovation, but because it is just. Therefore, while we are imperfect, we ask you all to fully commit to work, both inside and outside of our classrooms to increase our commitment to build and sustain a campus community that embraces these core values. It is the responsibility of each of us to create a safer and more inclusive environment. Incidents of bias or discrimination, whether intentional or unintentional in their occurrence, contribute to creating an unwelcoming environment for individuals and groups at the university. If you experience or observe unfair or hostile treatment on the basis of identity, we encourage you to speak out for justice and offer support in the moment and/or share your experience using the following resources:
Accommodations for Students with Disabilities:
If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at firstname.lastname@example.org.