mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • blog

Paper alert: Guiding Topic Models on Short Text Corpora

Our paper titled "Embedding Semantic Anchors to guide Topic Models on Short Text Corpora" is out now in Journal of Big Data Research!

Out now: The paper!
Out now: The paper on guiding topic models on short text copora!

Download the paper¶

Use this link: https://authors.elsevier.com/c/1e5EH7tYN-elUI to download the paper until 05. January 2022 before it disappears behind a paywall forever!

Summary¶

The paper studies how semantic anchors can be used to guide a topic model on short texts. We showcase the applicability of the approach using a large corpus of roughly 100 million tweets posted on Twitter between January and February 2020. In these tweets, we find hashtags to be suitable semantic anchors. We pre-train a word embedding of hashtags to derive preliminary topics. These function as supervising information, so-called seed topics, to Archetypal LDA.

Our approach creates additional analytical opportunities in presenting both topics and seed topics next to each other enabling a more detailed understanding of the emergence of LDA topics.


  • « Paper alert: International Bureaucrats in the UN Security Council Debates
  • Virtualize a Windows 10 installation in Ubuntu Linux »

Published

17. Nov, 2021

Last Updated

Nov 17, 2021

Tags

  • big data 1
  • data modeling 3
  • topic models 3

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.