HW4

HW 4: Privacy and Obfuscation

Due 11:59pm, Thursday 4/18

Submission: Email your assignment to ethicalnlpcmu@gmail.com. Attach your write-up (titled FirstName_LastName_hw4.pdf) and a zip/tar folder containing your code. Code will not be graded

Goals

Online data has become an essential source of training data for natural language processing and machine learning tools; however, the use of this type of data has raised concerns about privacy. Furthermore, the detection of demographic characteristics is a common component of microtargeting. In this assignment, you will explore how to obfuscate demographic traits, specifically gender. The primary goals are (1) develop a method for obfuscating an author’s gender and (2) explore the trade-off between obfuscating an author’s identity and preserving useful information in the data

Overview

The data for this assignment is available here . Your primary dataset consists of posts from Reddit. Each post is annotated with the gender of the post’s author (“op_gender”) and the subreddit where the post was made (“subreddit“). The main text of the post is in the column “post_text”. The contents of the provided data include:

classifier.py: a simple word-based classifier that predicts the author’s gender and the subreddit for a post (example run: python classifier.py --test_file test.csv --train_file train.csv)
train.csv: training data for the classifier
test.csv: your primary test data
background.csv: additional Reddit posts that you may optionally use for training an obfuscation model. A larger version is available here
female.txt: a list of words commonly used by women
male.txt: a list of words commonly used by men

The provided classifier achieves an accuracy of 65.65% at identifying the gender of the poster and an accuracy of 83.85% at identifying a post’s subreddit when tested over test.csv. Your goal in this assignment is to obfuscate the data in test.csv so that the provided classifier is unable to determine the gender of authors, while still being able to determine the subreddit of the post. Note that in this set-up, we treat the provided classifier as an adversary. We assume you have access to the test data and to a background corpus, but you do not know the details of the classifier or the data that the classifier was trained on. In other words, you cannot use train.csv to inform your obfuscation model. This assignment was largely inspired by “Obfuscating Gender in Social Media Writing”, which may be a useful reference

Basic Requirements

Completing the basic requirements will earn a passing (B-range) grade

First, build a baseline obfuscation model:

For each post in test.csv, if the post was written by a man (“M”) and it contains words from male.txt, replace these words with a random word from female.txt
Obfuscate posts written by women (“W”) in the same way (i.e. by replacing words from female.txt with random words from male.txt)
Test classifier.py on your obfuscated data and analyze the results

Second, improve your obfuscation model:

Instead of replacing words from male.txt with randomly chosen words from female.txt, choose a semantically similar word from female.txt (use the same metric for replacing words from female.txt with words from male.txt). You may use any metric you choose for identifying semantically similar words. We recommend using cosine distance between pre-trained word embeddings (available here)
Test classifier.py on data obfuscated using your improved model and analyze the results. The classifier should do no better than random guessing at identifying gender and should obtain at least 79% accuracy on classifying the subreddit

Third, experiment with some basic modifications to your obfuscation models. For example, what if you randomly decide whether or not to replace words instead of replacing every lexicon word? What if you only replace words that have semantically similar enough counterparts?

Write-up

Submit a 2-4 page report (ACL format) titled FirstName_LastName_hw4.pdf to ethicalnlpcmu[at]gmail.com. The report should include:

Description of baseline, improved and advanced (if completed) obfuscated models
Description of the experiments you tried with your improved obfuscation model
Results for your models by using them to obfuscate test.csv and running classifier.py over your obfuscated test data.
Qualitative examples of text obfuscated with your models
A brief discussion of the ethical implications of obfuscation and privacy that draws from concepts covered during lecture

Advanced Analysis

Develop your own obfuscation model. We provide background.csv, a large data set of Reddit posts tagged with gender and subreddit information that you may use to train your obfuscation model. A larger version of the background corpus is available here. You may not train your obfuscation model on train.csv. Your ultimate goal should be to obfuscate text so that the classifier is unable to determine the gender of an author (no better than random guessing) without compromising the accuracy of the subreddit classification task. However, creative or thorough approaches will receive full credit, even if they do not significantly improve results. Some ideas you may consider:

Develop your own lexicons using pointwise mutual information scores or log odds with a dirichlet prior
Follow the procedure described in "Obfuscating Gender in Social Media Writing"
Use an adversarial objective as described in "Predicting Sales from the Language of Product Descriptions" to train a model that is good at predicting subreddit classification but bad a predicting gender. The key idea in this approach is to design a model that does not encode information about protected attributes (in this case, gender).

In your report, include a description of your model and results.

Grading (100 points)

20 points - Submitting assignment
40 points - Completing basic requirements
20 points - Write up is well-written, presents meaningful analysis, and contains all requested information
15 points - Advanced analysis
5 points - Thoughtful or well-researched discussion of ethical implications