Item request has been placed!

Item request cannot be made.

Processing Request

Word Distinctivity - quantifying improvement of topic modeling results from n-gramming

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: P. Chai , Christine
الموضوع:
Latent Dirichlet allocation; text mining; topic modeling; n-gramming; data cleaning; quantification
نوع التسجيلة:
Journal Article
اللغة:
English

معلومة اضافية
- بيانات النشر:
  published
- بيانات النشر:
  Statistics Portugal, 2022.
- الموضوع:
  2022
- نبذة مختصرة :
  Text data cleaning is an important but often overlooked step in text mining because it is difficult to quantify the contribution. Therefore, we propose the word distinctivity to measure the improvement of topic modeling results from n-gramming, which preserves special phrases in a corpus. The word distinctivity evaluates the signal strength of a word’s topic assignments, and a high distinctivity means a high posterior proba[1]bility for the word to come from a certain topic. We implemented the latent Dirichlet allocation for topic modeling, and discovered that some special phrases show an increase in word distinctivity, reducing uncertainty in topic identification.
- File Description:
  application/pdf
- Relation:
  https://revstat.ine.pt/index.php/REVSTAT/article/view/370/527
- الدخول الالكتروني :
  https://revstat.ine.pt/index.php/REVSTAT/article/view/370
- Rights:
  Open Access
  URL: https://creativecommons.org/licenses/by-nc/4.0
  URL: http://purl.org/coar/access_right/c_abf2
- الرقم المعرف:
  rcaap.com.REVSTAT.Statistical Journal.revstat.article.370

تعليقات

No Comments.