Computational Ethics for NLP

CMU CS 11830, Spring 2018

T/Th 10:30-11:50am, GHC 4215

Yulia Tsvetkov (office hours: Tue noon-1pm, GHC 6405), ytsvetko@cs.cmu.edu
Alan W Black (office hours: TBD, GHC 5701), awb@cs.cmu.edu
TA: Shrimai Prabhumoye (office hours: Tue 2-3pm, GHC 5511), sprabhum@cs.cmu.edu

Summary

As language technologies have become increasingly prevalent, there is a growing awareness that decisions we make about our data, methods, and tools are often tied up with their impact on people and societies. This course introduces students to real-world applications of language technologies and the potential ethical implications associated with them. We discuss philosophical foundations of ethical research along with advanced state-of-the art techniques. Discussion topics include:

Syllabus

The lecture plan is subject to change.

Week Date Theme Topics Readings Work due
1 1/16 Introduction Motivation, requirements, and overview. [slides] Hovy & Spruit (2016); Barocas & Selbst (2016)
1/18 Foundations Philosophical foundations. [slides]
2 1/23 Foundations History: medical, psychological experiments, IRB and human subjects. [slides]
1/25 Objectivity and Bias Invited lecture by Geoff Kaufman. Psychological Foundations of Implicit Bias: Mechanisms and Mitigators. [slides] HW1 out
3 1/30 Objectivity and Bias Stereotypes, prejudice, and discrimination. Debiasing. [slides] Dwork et al. (2011); Bolukbasi et al. (2016)
2/1 Objectivity and Bias Quantifying stereotypes, prejudice, and discrimination. Debiasing. [slides] Zhao et al. (2017); Voigt et al. (2017); Sap et al. (2017)
4 2/6 Objectivity and Bias Invited lecture by Diyi Yang: racial discrimination.
2/8 No Class Project proposal preparation. HW1 due; HW2 out
5 2/13 Project Proposals Student presentations.
2/15 Civility in Communication Trolling, hate speech, abusive language, toxic comments. [slides] Nockleby (2000); Warner & Hirschberg (2012); Cheng et al. (2017)
6 2/20 Civility in Communication Hate speech and bias in conversational agents. [slides] Koustuv & Henderson(2017); Schmidt & Wiegand(2017)
2/22 Privacy and Profiling Privacy and security. Demographic inference techniques. [slides] Jurgens et al.(2017); Sec. 3 in Nguyen et al.(2016)
7 2/27 Privacy and Profiling Differential privacy, author anonymization and language obfuscation.[slides] HW2 due
3/1 The Language of Manipulation Computational propaganda.[slides] HW3 out
8 3/6 The Language of Manipulation Targeted ads, fake news, US elections, etc.[slides]
3/8 The Language of Manipulation Invited lecture by John Oddo: propaganda.[slides]
9 Spring Break!
10 3/20 Mid-Way Project Reports Student presentations.
3/22 Ethical Decision Making Invited lecture by Silvia Saccardo. Ethical decision making: a perspective from behavioral science.[slides]
11 3/27 Mid-Way Project Reports Student presentations. HW3 due
3/29 Case Study Cambridge Analytica Discussion HW4 out
12 4/3 Ethical Research Practices. NLP for Social Good. Pitfalls in statistical analysis. [slides] Low-resource NLP. [slides] Lorelei [slides]
4/5 NLP for Social Good Endangered languages. [slides]
13 4/10 Intellectual Property Plagiarism and plagiarism detection. Patents.[slides]
4/12 Design for Social Good Invited lecture by Anhong Guo: designing interfaces for accessibility.[slides]
14 4/17 No Class Project Preparation HW 4 due
4/19 NLP for social good Invited lecture by LP Morency: Medical applications, psychological counseling, etc.[slides]
15 4/24 Class Discussion Code of Ethics; Summary[slides]
4/26 No Class Project Preparation
16 5/01 Student Presentations
5/03 Student Presentations

Readings

Grading

Homework assignments. (4 assignments; 15% each) annotation and coding assignments, focusing on important subproblems from the topics discussed in class.

For example:

  1. annotation of bias in directed speech;
  2. classification of different types of bias or hate speech;
  3. generation of paraphrases that anonymize or obfuscate demographic properties of a writer;
  4. computational analysis of fake news or propaganda.

In assignments, students will be given training and dev datasets, a baseline algorithm to implement, and code for automatic evaluation. Submissions will be evaluated on test data. A baseline solution will earn a passing grade (B Grade); additional credit will be given for creative solutions that surpass the baseline.

Project. (30%) a semester-long (normally) 3-person team project (see below).

Participation in class. (10%) classes will include discussions of reading assignments, students will be expected to read the papers and participate in discussions, and lead one discussion.

Projects

A major component will be a team project. It will be a substantial research effort carried out by each student or groups of students (expected group size = 3).

The project milestones are:

Policies

Late policy. A penalty of 10% will be applied to homework assignments submitted up to 24 hours late; no credit will be given for homework submitted more than 24 hours after it is due.

Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. The project is to be completed by a team. You are encouraged to use existing NLP components in your project; you must acknowledge these appropriately in the documentation. Suspected violations of academic integrity rules will be handled in accordance with the CMU guidelines on collaboration and cheating.