نبذة مختصرة : Abstract Recent advances in machine learning have resulted in techniques that are effective in complex scenarios, such as those with many rare classes or with multimodal data; in particular, low-shot learning (LSL) is a challenging task for which multiple strong approaches have been developed. We hypothesize that these techniques’ effectiveness against the data scarcity within LSL may translate to effectiveness against the data scarcity within more “traditional” supervised, imbalanced, binary classification tasks such as fraud detection; however, there has been relatively little research which applies them in these contexts. In this paper, we aim to fill this gap by selecting two LSL papers from prior literature (representing two major approaches to LSL, optimization-based and contrastive), and reevaluate their models on two highly-imbalanced tabular fraud detection datasets, including a “big-data” Medicare dataset. To the best of our knowledge, our work is the first to directly compare optimization-based and contrastive approaches in any setting, and the first work to examine either of these approaches on a tabular big-data task. We find that the contrastive learning method we test, Siamese-RNN, performs on par with state-of-the-art non-LSL baseline learners for especially big and severely imbalanced data, and significantly outperforms them for smaller and less severely imbalanced data.
No Comments.