Computational Ethics for NLP

CMU CS 11830, Spring 2019

T/Th 10:30-11:50am, POS 146

Yulia Tsvetkov (office hours: Tuesday 12-1pm, GHC 6405), ytsvetko@cs.cmu.edu
Alan W Black (office hours: Wednesday 12-1pm, GHC 5701), awb@cs.cmu.edu
TA: Anjalie Field (office hours: Thursday 3-4pm, GHC 6609), anjalief@cs.cmu.edu

HW 2: Crowdsourced Annotations

Due 11:59pm, Thursday 2/21

Submission: Email your assignment to ethicalnlpcmu@gmail.com. Attach 2 separate files: your write-up (titled Lastname_Firstname.pdf) and a zip/tar folder containing your code. Code will not be graded


Goals

Crowdsourcing annotations has become a fundamental aspect of NLP research, but there are many ethical concerns around this type of data collection. The goal of this assignment is to explore: (1) the challenges behind creating an annotation scheme and (2) the ethical implications of soliciting crowdsourced data and reporting results.


Overview

The data for this assignment is available here

In this homework, we provide a data set of comments written in response to TED talks. Our goal is to look for gender bias in this data set: how are comments on videos with male speakers different than comments on videos with female speakers? We provide an annotation interface with a preliminary annotation scheme. You will first annotate the data set using the provided scheme and then analyze your annotations in order to improve the scheme. This assignment requires you to compare annotations with 1 or 2 other students.


Basic Requirements

Completing the basic requirements will earn a passing (B-range) grade

Round 1 Annotations: First, follow the instructions below (under “Technical Details”) to run the annotation interface and annotate the data in data/common1.csv. Then, collaborate with 1-2 other students in the class in order to:

Round 2 Annotations: Based on your results from Round 1:

Write-up: Each student should submit their own 2-4 page report (ACL format). The report should include:


Advanced Analysis

We have provided annotations over a larger portion of data from the same data set (aggregate.csv). This data set includes annotator ratings for EncouragingDiscouraging, ExpertiseScale, PosterTone, RespectfulDisrespectful as well as the actual gender of the TED talk speaker.

Using these annotations, conduct an analysis that contrasts traits of comments addressed towards female and male speakers. You may analyze any one of the annotation traits (i.e. Respect), but justify your choice of trait.


Grading (100 points)


Technical Details

The annotation interface is located inside the folder “annotation”. NOTE: there is no way to pause the annotation interface and resume where you left off later. You will need to complete each set of annotations all at once. To run the annotation interface:


Acknowledgements

Thank you Rob Voigt for creating the annotation interface.