Algorithms for NLP

CMU CS 11711, Fall 2020

Tuesday/Thursday 1:30-2:50pm EDT, on Zoom

Emma Strubell (office hours: By appointment), strubell@cmu.edu
Yulia Tsvetkov (office hours: By appointment), ytsvetko@cs.cmu.edu
Robert Frederking (office hours: By appointment), ref@cs.cmu.edu

Teaching Assistants:
Jiateng Xie (office hours: Thursday 10:00-11:00pm EDT, Zoom), jiatengx@cs.cmu.edu
Sanket Vaibhav Mehta (office hours: Friday 9:00-10:00am EDT, Zoom), svmehta@cs.cmu.edu
Xiaochuang Han (office hours: Monday 10:30-11:30am EDT, Zoom), xiaochuh@cs.cmu.edu

Forum: Piazza
Note: Sensitive information related to the class (e.g., Zoom links) will be available on Piazza

Learning Goals

This course will explore foundational statistical techniques for the automatic analysis of natural (human) language text. Towards this end the course will introduce pragmatic formalisms for representing structure in natural language, and algorithms for annotating raw text with those structures. The dominant modeling paradigm is corpus-driven statistical learning, covering both supervised and unsupervised methods. Algorithms for NLP is a lab-based course. This means that instead of homeworks and exams, you will mainly be graded based on four hands-on coding projects.

Slides, materials, and projects for this iteration of Algorithms for NLP are borrowed from Jacob Eisenstein’s course at Georgia Tech, Dan Jurafsky at Stanford, Dan Klein and David Bamman at UC Berkeley, and Nathan Schneider at Georgetown University.


Prerequisites

This course assumes a good background in basic probability and a strong ability to program in Python. Experience using numerical libraries such as NumPy and neural network libraries such as PyTorch are a plus. Prior experience with machine learning, linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class.


Announcements


Format


Class Schedule

The lecture plan is subject to change.

Week Date Topics Readings Homeworks
1 Sep 1 Course Introduction [slides] Eis 1  
  Sep 3 Linear classification [slides] Eis 2; J&M III 4; Pang et al. 2002  
2 Sep 8 Nonlinear classification [slides] Eis 3, 4.4; J&M III 5; Goldberg 4 P1: Classification (Due Sept 25)
Sep 10 Lexical semantics & word embeddings [slides] J&M III 6; Eis 14; Baroni et al. 2014
Sep 15 Language modeling [slides] J&M III 3; Eis 6.1-6.2, 6.4; Teh 2006  
Sep 17 Neural language modeling and contextualized word embeddings [slides] Eis 6.3, 6.5; J&M III 7.5, 9; Goldberg 10; Understanding LSTM Networks; Peters et al. 2018  
4 Sep 22 Sequence labeling I: POS tagging, HMMs, Viterbi [slides] J&M III 8, Appendix A; Eis 7.1-7.4, 8.1; Collins notes  
Sep 24 Sequence labeling II: NER, CRFs [slides] Eis 7.5,7.7,8.3; Sutton & McCallum 2.1-2.5; Andor et al. 2016, section 3 P1 Due Tomorrow (11:59pm)
5 Sep 29 Neural sequence labeling [slides] Eis 7.6; Collobert et al. 2011 P2: Sequence labeling (Due Oct 16)
Oct 1 Contextualized word embeddings II, Transformers [slides] The Annotated Transformer; Devlin et al. 2019; Rogers et al. 2020; Schrimpf et al. 2020
6 Oct 6 Syntactic parsing, PCFGs [slides] J&M III 12, 13, 14.1; Eis 10.1-10.2
Oct 8PCFGs, CKY [slides] J&M III 14; Eis 10.3-10.4
7 Oct 13 Dependency parsing I [slides] J&M III 15.1-15.4; Eis 11.1, 11.3; Chen and Manning 2014  
Oct 15 Dependency parsing II [slides] + Guest lecture by Andrew Drozdov J&M III 15.5,15.6; Eis 11.2; Drozdov et al. 2019 P2 Due Tomorrow (11:59pm)
8 Oct 20 Formal languages & grammar [slides] Eis 9.0-9.1.1, 9.2-9.2.2, 9.3-9.3.1; P3: Parsing (Due Nov 6)
Oct 22 Morphology [slides] J&M II 3; (Note: errors in textbook)
9 Oct 27 Features and Unification [slides] J&M II 15;  
Oct 29 Semantics and first-order logic [slides] J&M II 17, 18; (Note: errors in textbook)  
10 Nov 3 Pred-Arg Structure and Semantic Case Frames [slides] J&M II 19.3-19.5;  
  Nov 5CCG [slides] J&M III 12.6.1; Artzi et al. 2013 P3 Due Tomorrow (11.59pm)
11 Nov 10 Discourse and Coreference [slides] Eis 15;
Nov 12 Natural Language Inference and Interpretable NLP [slides] LIME; Han et al. (2020)
12 Nov 17 No class - EMNLP   P4: Coreference (Due Dec 4)
Nov 19 No class - EMNLP    
Nov 20 EMNLP mini-presentations    
13 Nov 24 EMNLP mini-presentations   EMNLP Review Due Today (11.59pm) 
Nov 26 No class - Thanksgiving Break    
14 Dec 1 Machine Translation [slides] J&M II 25; IBM Models; The Annotated Transformer
Dec 3 Summarization [slides]  Ch.1 in Nenkova & McKeown (2011); See et al. 2017
15 Dec 8 Sentiment Analysis [slides] Pang & Lee (2008); Sentiment tutorial by Chris Potts  
Dec 10 Computational Ethics + NLP
Dec 11 Algorithms for NLP Q&A P4 Due Tomorrow (11.59pm)

Readings

The primary recommended texts for this course are:


Assignments

Grading

Project Submission

Assignments will be submitted on the class Canvas page, and written assignments will also be shared with your peers on Piazza after the homework due date.
  1. When you are ready to submit, make sure your writeup is included in the psetX directory in pdf format with the name psetX-writeup.pdf.

  2. Run the script make-submission.sh that is included in each homework assignment folder in the github repo. This will generate a tarball psetX-submission.tgz that contains the code for your submission and your writeup.

  3. Click on the assignments tab of the main Canvas course site and select the assignment corresponding to the project (e.g. Assignment 1 corresponds to Project 1). Click the Submit assignment button to open the submission portal, then click Choose file and select your compressed project directory psetX-submission.tgz created in the previous step. Finally, click the Submit assignment button.

  4. Once we have received all submissions, we will create a post on Piazza for you to share your writeup with the rest of the class.


Policies

Late policy. Each student will be granted 5 late days to use over the duration of the semester. You can use a maximum of 3 late days on any one project. Weekends and holidays are also counted as late days. Late submissions are automatically considered as using late days. Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!

Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. Suspected violations of academic integrity rules will be handled in accordance with the CMU guidelines on collaboration and cheating.


Note to Students

Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. CMU services are available free to students, and treatment does work. You can learn more about confidential mental health services available on campus through Counseling and Psychological Services. Support is always available (24/7) at: 412-268-2922.

In this class, every individual will and must be treated with respect. The ways we are diverse are many and are fundamental to building and maintaining an equitable and an inclusive campus community. These include but are not limited to: race, color, national origin, caste, sex, disability (visible or invisible), age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information.

Research shows that greater diversity across individuals leads to greater creativity in the group. We at CMU work to promote diversity, equity and inclusion not only because it is necessary for excellence and innovation, but because it is just. Therefore, while we are imperfect, we ask you all to fully commit to work, both inside and outside of our classrooms to increase our commitment to build and sustain a campus community that embraces these core values. It is the responsibility of each of us to create a safer and more inclusive environment. Incidents of bias or discrimination, whether intentional or unintentional in their occurrence, contribute to creating an unwelcoming environment for individuals and groups at the university. If you experience or observe unfair or hostile treatment on the basis of identity, we encourage you to speak out for justice and offer support in the moment and/or share your experience using the following resources:

All reports will be acknowledged, documented, and a determination will be made regarding a course of action. All experiences shared will be used to transform the campus climate to be more equitable and just.

Accommodations for Students with Disabilities:

If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.