نبذة مختصرة : The proliferation of social media, despite its multitude of benefits, has led to the increased spread of abusive language. Such language, being typically hurtful, toxic, or prejudiced against individuals or groups, requires timely detection and moderation by online platforms. Deep learning models for detecting abusive language have displayed great levels of in-corpus performance but underperform substantially outside the training distribution. Moreover, they require a considerable amount of expensive labeled data for training.This strongly encourages the effective transfer of knowledge from the existing annotated abusive language resources that may have different distributions to low-resource corpora. This thesis studies the problem of transfer learning for abusive language detection and explores various solutions to improve knowledge transfer in cross-corpus scenarios.First, we analyze the cross-corpus generalizability of abusive language detection models without accessing the target during training. We investigate if combining topic model representations with contextual representations can improve generalizability. The association of unseen target comments with abusive language topics in the training corpus is shown to provide complementary information for a better cross-corpus transfer.Secondly, we explore Unsupervised Domain Adaptation (UDA), a type of transductive transfer learning, with access to the unlabeled target corpus. Some popular UDA approaches from sentiment classification are analyzed for cross-corpus abusive language detection. We further adapt a BERT model variant to the unlabeled target using the Masked Language Model (MLM) objective. While the latter improves the cross-corpus performance, the other UDA methods perform sub-optimally. Our analysis reveals their limitations and emphasizes the need for effective adaptation methods suited to this task.As our third contribution, we propose two DA approaches using feature attributions, which are post-hoc model explanations. Particularly, the problem ...
No Comments.