mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • blog

Seminar: Introduction to computer-based text analysis (latest iteration)

Contents

  • News
  • Syllabus
  • Summer 2025
  • R-Basics
  • Schedule
    • Legend
  • References

This is the main course website for the seminar Introduction to Computer-based Text Analysis given in summer term 2025 at University of Bayreuth.

To view results of past participant's projects, go here: https://mircoschoenfeld.de/results-and-posters-of-the-seminar-introduction-to-computer-based-text-analysis.html

News¶

As you can see, the schedule lists all tutorials available for this course. The given dates showcase which videos will be discussed in the lecture on the mentioned date.

  • 21. July 2025: And the final poster presentation will take place on 21. July 2025! Looking forward to great contributions!
  • 14. July 2025: Before the final presentation, we will have a critical review of your projects.
  • 07. July 2025: This is the day of your preliminary poster presentations! Make sure to check out tutorials on Information Visualization and Poster Preparation as well as good examples for posters from previous semesters
  • 30. June 2025: Of all of what we did so far, what can we do with AI, nowadays?
  • 23. June 2025: On this session, we will conduct a peer review on your project ideas.
  • 16. June 2025: You are asked to give a short description of your research project which we will discuss today. For next week, prepare a written description of your project ideas. In the session, we will peer review your ideas.
  • 02. June 2025: To enable critical understanding of what we do, we will devote this session to discourse analysis. Please prepare this article https://arxiv.org/abs/1906.10969 for today’s class
  • 26. May 2025: Today, we will see an introduction to United Nations and the UN Security Council
  • 19. May 2025: We will discuss the context of words today.
  • 12. May 2025: These videos on metrics and statistics will be discussed in this session! You will probably need these videos on DFMs as well.
  • 05. May 2025: For today’s session, please prepare the topics on Working with corpora! If you like, you can already experiment with an example corpus: https://doi.org/10.7910/DVN/KGVSYH
  • 28. April 2025: Welcome to the Summer Term 2025! Today, we start with a gentle introduction. Next week, we will discuss how to build a corpus!

Recordings of the lecture are available online. Please see the schedule for a selection of relevant videos.

Syllabus¶

A central challenge of our time is the processing of a constantly growing amount of texts. Every day, collections are created that a single person can hardly work through in a reasonable amount of time: be it newspaper articles, statements, minutes, communiqués, blog articles or posts in social media. To help us understand large amounts of text, we turn to computational methods. In this course, we will explore such methods. We will learn methods for quantitative analysis of text collections, methods for extracting information, and statistical methods for analyzing large corpora. These methods will also be presented practically using R and evaluated together. An important part of the seminar is also the critical look at the results of the automated analyses.

Based on the newly learned methods, participants develop their own scientific questions and work on them in small groups during the semester.

In this course, students learn the main theoretical and methodological principles of computer-assisted text analysis and they will be able to apply these methods to their own research projects. After successful participation in this seminar, students will be able to realize, based on an own project, the transfer between a scientific research question and methods of computer-based text analysis.

Check out results of research projects from previous iterations of this course.

Summer 2025¶

In this summer semester, we recommend students to work with the Corpus The UN Security Council Debates which can be obtained here: https://doi.org/10.7910/DVN/KGVSYH.

To properly prepare working with this huge collection of text, first of all, familiarize yourself with the creation of the corpus: https://arxiv.org/abs/1906.10969. Then, to put things into context, Prof. Joël Glasman has prepared a collection of videos as an introduction into the context, meaning the United Nations and its security council:

  • What is the United Nations
  • Presentation of the UNSC Corpus
  • Why the UN Security Council matters
  • and some concepts of discourse analysis

R-Basics¶

In case you want to (re-)build basic R skills, please feel free to check out my other tutorials on R. Students of University of Bayreuth can also enroll in an elearning-course which offers tasks and automated evaluation of tasks.

Schedule¶

In this section, you will find a list of tutorial videos helping you to get started with analyzing text data in R.

Section Title
Getting Started Introduction to the seminar
Getting Started Working with RStudio
Getting Started Creating a corpus
Working with corpora Assigning document variables
Working with corpora Saving Time
Setting the Basis Tokenization
Setting the Basis Tokenization and Preprocessing
Setting the Basis Document Feature Matrices
Metrics and Statistics Simple Text Statistics
Metrics and Statistics Obtaining Metrics
Metrics and Statistics Multi-word expressions
Concepts & Context Keywords in context
Concepts & Context Differentiating context and the rest of the document
Working with the UNSC corpus What is the United Nations? (by Prof. Joël Glasman)
Working with the UNSC corpus Presentation of the Corpus (by Prof. Joël Glasman)
Working with the UNSC corpus Why the UN Security Council matters (by Prof. Joël Glasman)
Manual Classification Using a dictionary for manual classification
More about context Feature Co-Occurrences
Qualitative Text Analysis Some Concepts of Discourse Analysis
Clustering Clustering Documents
Topic Modeling Modeling topics
Topic Modeling Identify parameter k
Topic Modeling Seeded topic models
Stemming & Lemmatization Stemming
Stemming & Lemmatization Lemmatization
Computing with Semantics Word Embedding
Visualization Information Visualization
Visualization Poster Preparation

Legend¶

Find the video here
Find code material here
Find external material here

References¶

  • Mirco Schoenfeld, Steffen Eckhard, Ronny Patz, Hilde van Meegdenburg, and Antonio Pires. The UN Security Council debates 1995-2020. 2021. doi:10.7910/DVN/KGVSYH.
  • Ken Benoit. Text as data: an overview. In: The SAGE Handbook of Research Methods in Political Science and International Relations. SAGE Publications Ltd, 55 City Road, London, Apr 2020. doi:10.4135/9781526486387.
  • James H Martin and Daniel Jurafsky. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall Upper Saddle River, 3 edition, 2020. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.
  • Henry E. Brady. The challenge of big data and data science. Annual Review of Political Science, 22(1):null, 2019. doi:10.1146/annurev-polisci-090216-023229.
  • Kenneth Benoit and Adam Obeng. Readtext: import and handling for plain and formatted text files. 2018. https://readtext.quanteda.io/.
  • Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. Quanteda: an r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30):774, 2018. https://quanteda.io, doi:10.21105/joss.00774.
  • Jenny Bryan. Happy git and github for the user. 2018. https://happygitwithr.com/.
  • Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018. http://socviz.co/.
  • Yihui Xie, Joseph J Allaire, and Garrett Grolemund. R markdown: The definitive guide. CRC Press, 2018. https://bookdown.org/yihui/rmarkdown/.
  • David Lazer and Jason Radford. Data ex machina: introduction to big data. Annual Review of Sociology, 43(1):19–39, 2017. doi:10.1146/annurev-soc-060116-053457.
  • John Wilkerson and Andreu Casas. Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20(1):529–544, 2017. doi:10.1146/annurev-polisci-052615-025542.
  • Paul DiMaggio. Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2):2053951715602908, 2015. doi:10.1177/2053951715602908.
  • Justin Grimmer and Brandon M. Stewart. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013. doi:10.1093/pan/mps028.

  • « Lecture "Bits, Bytes, and Beyond: Foundations of Computer Science" (Winter 2024/25)
  • Seminar: Critical Data Studies (Summer 2025) »

Published

15. Apr, 2025

Last Updated

May 5, 2025

Tags

  • dmkg 13
  • lecturenotes 12
  • teaching 22
  • textanalysis 1
  • ubt 19

Links

  • elearning@ubt
  • cmlife@ubt
  • recordings
  • results of previous semesters

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.