Computational Ethics for NLP

CMU CS 11830, Spring 2019

T/Th 10:30-11:50am, POS 146

Yulia Tsvetkov (office hours: Tuesday 12-1pm, GHC 6405),
Alan W Black (office hours: Wednesday 12-1pm, GHC 5701),
TA: Anjalie Field (office hours: Thursday 3-4pm, GHC 6609),

HW 4: Privacy and Obfuscation

Due 11:59pm, Thursday 4/18

Submission: Email your assignment to Attach your write-up (titled FirstName_LastName_hw4.pdf) and a zip/tar folder containing your code. Code will not be graded


Online data has become an essential source of training data for natural language processing and machine learning tools; however, the use of this type of data has raised concerns about privacy. Furthermore, the detection of demographic characteristics is a common component of microtargeting. In this assignment, you will explore how to obfuscate demographic traits, specifically gender. The primary goals are (1) develop a method for obfuscating an author’s gender and (2) explore the trade-off between obfuscating an author’s identity and preserving useful information in the data


The data for this assignment is available here . Your primary dataset consists of posts from Reddit. Each post is annotated with the gender of the post’s author (“op_gender”) and the subreddit where the post was made (“subreddit“). The main text of the post is in the column “post_text”. The contents of the provided data include:

The provided classifier achieves an accuracy of 65.65% at identifying the gender of the poster and an accuracy of 83.85% at identifying a post’s subreddit when tested over test.csv. Your goal in this assignment is to obfuscate the data in test.csv so that the provided classifier is unable to determine the gender of authors, while still being able to determine the subreddit of the post. Note that in this set-up, we treat the provided classifier as an adversary. We assume you have access to the test data and to a background corpus, but you do not know the details of the classifier or the data that the classifier was trained on. In other words, you cannot use train.csv to inform your obfuscation model. This assignment was largely inspired by “Obfuscating Gender in Social Media Writing”, which may be a useful reference

Basic Requirements

Completing the basic requirements will earn a passing (B-range) grade

First, build a baseline obfuscation model:

Second, improve your obfuscation model:

Third, experiment with some basic modifications to your obfuscation models. For example, what if you randomly decide whether or not to replace words instead of replacing every lexicon word? What if you only replace words that have semantically similar enough counterparts?


Submit a 2-4 page report (ACL format) titled FirstName_LastName_hw4.pdf to ethicalnlpcmu[at] The report should include:

Advanced Analysis

Develop your own obfuscation model. We provide background.csv, a large data set of Reddit posts tagged with gender and subreddit information that you may use to train your obfuscation model. A larger version of the background corpus is available here. You may not train your obfuscation model on train.csv. Your ultimate goal should be to obfuscate text so that the classifier is unable to determine the gender of an author (no better than random guessing) without compromising the accuracy of the subreddit classification task. However, creative or thorough approaches will receive full credit, even if they do not significantly improve results. Some ideas you may consider:

In your report, include a description of your model and results.

Grading (100 points)