HW 3: Abusive Language Online
Due 11:59pm, Thursday 4/4
Submission: Email your assignment to ethicalnlpcmu@gmail.com. Attach 3-4 separate files: your write-up (titled FirstName_LastName_hw3.pdf), your predictions over test.tsv titled FirstName_LastName_test.tsv, your improved predictions over test.tsv titled FirstName_LastName_advanced.tsv (if completed), and a zip/tar folder containing your code. Code will not be graded
Goals
As we have discussed in class, abusive language on online platforms has become a major concern in the past few years. The goals of this assignment are to (1) create a classifier for identifying offensive language (2) explore some of the challenges and ethical issues surrounding automated approaches to identifying abusive language
Overview
The data for this assignment is available here . Please note that the data contains offensive or sensitive content, including profanity and racial slurs.
In this homework, we provide a data set of tweets annotated for offensiveness. This data was taken from the 2019 SemEval task on offensive language detection. The file “offenseval-annotation.txt" provides an overview of the annotation scheme. The files "train.tsv" and "dev.tsv" both have the same format. The first column (text) contains the text of a tweet, the second column (label) contains an offensiveness label:
- (NOT) Not Offensive - This post does not contain offense or profanity.
- (OFF) Offensive - This post contains offensive language or a targeted (veiled or direct) offense
For tweets marked as OFF, the third column (category) specifies the type of offensive speech:
- (TIN) Targeted Insult and Threats - A post containing an insult or threat to an individual, a group, or others
- (UNT) Untargeted - A post containing non-targeted profanity and swearing.
For tweets marked as NOT, the third column is nan
Finally, the file "test.tsv" contains only a set of tweets (i.e. only the text column)
Basic Requirements
Completing the basic requirements will earn a passing (B-range) grade
Offensive language identification
- Build a simple classifier to distinguish offensive (OFF) tweets from non-offensive (NOT) tweets using surface level features. Your classifier should obtain an accuracy of at least 70% and an F1 score of at least 50% over dev.tsv
- Conduct an error analysis of your classifier. At minimum, your error analysis should include a confusion matrix and examples of misclassified tweets. Your analysis should also discuss the challenges of identifying abusive language more broadly, including how to define and categorize abusive language (do you agree with the annotations in this data set?) [Note: Because this data set contains offensive content, if you prefer not to read tweets containing offensive language, you may optionally provide only examples of tweets annotated as NOT that your classifier labeled as OFF]
Categorization of offense types
- Restrict your data set to the tweets labeled OFF (3,520 train and 440 dev) and build a simple classifier for distinguishing between TIN and UNT tweets. You may use the same classifier created for offensive language identification. Your classifier should obtain an accuracy score of 85% and an F1 score of 10% (where UNT is considered the positive label) over dev.tsv
Predictions
- Use your classifiers to make label and category predictions for the test.tsv samples. Place these predictions in a separate file titled FirstName_LastName_test.tsv. Offensiveness labels (OFF/NOT) should be in a column with the heading "label" and categories (UNT/TIN) should be in a column with the heading "category". Please use tabs ("\t") to separate columns. You should generate a label and category prediction for every tweet in test.tsv, but we will only evaluate category predictions on the tweets that were actually marked OFF. Every value in the label column should be OFF or NOT and every value in the category column should be UNT or TIN (no value should be nan).
Write-up
Submit a 2-4 page report (ACL format) titled FirstName_LastName_hw3.pdf to ethicalnlpcmu[at]gmail.com. The report should include:
- Description of your offensive language identification classifier
- Error analysis of your offensive language classifier, including F1 score, accuracy score, and confusion matrix over dev.tsv
- F1 score, accuracy score, and confusion matrix over dev.tsv for your categorization of offensive types classifier (and description of classifier, if different)
- Description of your advanced analysis model and results over dev.tsv (if completed)
- A brief discussion of the ethical implications of using machine learning to combat abusive language. This discussion should refer to your observations from this assignment as well as refer to issues discussed in class or drawn from additional references. Questions you might consider include:
- What is the cost of misclassification? Is the cost greater for some demographic groups than others?
- What are concerns around collecting and annotating training data?
- Who should be responsible for monitoring abusive language on online platforms?
Advanced Analysis
Improve your preliminary classifier for either offensive language identification or categorization of offensive types (specifically aim to improve F1 score). Creative model architectures or feature crafting will receive full credit, even if they do not improve results. Some sources for inspiration include:
In your report, include a description of your model and results over dev.tsv. Additionally, use your improved classifier to predict results over test.tsv and place these predictions in a file titled FirstName_LastName_advanced.tsv using the appropriate column heading ("label" or "category")
Grading (100 points)
- 20 points - Submitting assignment
- 40 points - Completing basic requirements
- 20 points - Write up is well-written, presents meaningful analysis, and contains all requested information
- 15 points - Advanced analysis
- 5 points - Discussion of ethical implications is particularly thoughtful or well-researched
Implementation Tips
- You are welcome to use existing packages. Consider tools like nltk and gensim for text processing and tools like sklearn for constructing baseline classifiers
- You can restrict your data set to tweets classified as OFF through something like:
train = pandas.read_csv(args.data_file, sep="\t")
is_OFF = train['label']=="OFF"
train = train[is_OFF]