نبذة مختصرة : This study investigates authorship attribution in Arabic poetry using the entire Classic Arabic Poetry corpus for the first time. Authorship attribution in Arabic poetry dates back to the 6th century during the pre-Islamic period when oral recitation was the primary method of preserving and disseminating poems. Limited written documentation, mainly for treaties, resulted in the loss of much pre-Islamic poetry and the misattribution of post-Islamic poems to pre-Islamic poets. While previous studies have qualitatively explored this issue, this research quantitatively addresses it for the first time. The study collected and augmented data with metadata to ensure accurate temporal separation. To address potential confusion between style and topic, topic modeling experiments identified five prominent topics, revealing patterns in topic distribution across centuries and poetic meters. Random poems from each century were qualitatively analyzed to validate the topic modeling process. A classification model was applied to delve deeper into authorship attribution. An ensemble model was developed and tested on applicable data, excluding the pre-Islamic era. The model’s performance was evaluated based on topic, number of poets, and number of examples. Topic segregation slightly improved performance, with optimal results observed when one poet was included in the opposite class. The best performance occurred with 60 examples on average. After selecting the most effective parameters, the model achieved accuracies of 0.97 to 1.0 and corresponding F1 scores. Misclassifications mostly occurred at probabilities below 90%, while correct classifications approached 100%. These findings demonstrate the model’s robustness and its potential for addressing real cases of misattribution in Arabic poetry.
No Comments.