Our paper titled "Embedding Semantic Anchors to guide Topic Models on Short Text Corpora" is out now in Journal of Big Data Research!
Out now: The paper on guiding topic models on short text copora!
Download the paper¶
Use this link: https://authors.elsevier.com/c/1e5EH7tYN-elUI to download the paper until 05. January 2022 before it disappears behind a paywall forever!
Summary¶
The paper studies how semantic anchors can be used to guide a topic model on short texts. We showcase the applicability of the approach using a large corpus of roughly 100 million tweets posted on Twitter between January and February 2020. In these tweets, we find hashtags to be suitable semantic anchors. We pre-train a word embedding of hashtags to derive preliminary topics. These function as supervising information, so-called seed topics, to Archetypal LDA.
Our approach creates additional analytical opportunities in presenting both topics and seed topics next to each other enabling a more detailed understanding of the emergence of LDA topics.