Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm
- Авторлар: Li W.1, Feng Y.1, Li D.2, Yu Z.1
-
Мекемелер:
- Department of Information Engineering and Automation
- R&D Department Jinan Qingqi Peugeot Motorcycle Co. Ltd.
- Шығарылым: Том 50, № 4 (2016)
- Беттер: 271-277
- Бөлім: Article
- URL: https://journals.rcsi.science/0146-4116/article/view/174421
- DOI: https://doi.org/10.3103/S0146411616040040
- ID: 174421
Дәйексөз келтіру
Аннотация
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts–micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
Негізгі сөздер
Авторлар туралы
Weijiang Li
Department of Information Engineering and Automation
Хат алмасуға жауапты Автор.
Email: hrbrichard@126.com
ҚХР, Kunming, 650500
Yanming Feng
Department of Information Engineering and Automation
Email: hrbrichard@126.com
ҚХР, Kunming, 650500
Dongjun Li
R&D Department Jinan Qingqi Peugeot Motorcycle Co. Ltd.
Email: hrbrichard@126.com
ҚХР, Jinan, Shandong, 250104
Zhengtao Yu
Department of Information Engineering and Automation
Email: hrbrichard@126.com
ҚХР, Kunming, 650500
Қосымша файлдар
