mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • imprint
  • blog

Seminar: Introduction to computer-based text analysis (latest iteration)

Contents

  • News
  • Syllabus
  • Summer 2026
  • R-Basics
  • Schedule
    • Legend
  • References

This is the main course website for the seminar Introduction to Computer-based Text Analysis given in summer term 2026 at University of Bayreuth.

To view results of past participant's projects, go here: https://mircoschoenfeld.de/results-and-posters-of-the-seminar-introduction-to-computer-based-text-analysis.html

News¶

As you can see, the schedule lists all tutorials available for this course. The given dates showcase which videos will be discussed in the lecture on the mentioned date.

  • 16. July 2026: And the final poster presentation will take place on 16. July 2026 from 10am – 1pm! Looking forward to great contributions!
  • 13. July 2026: No session today, but submit your posters today via elearning in case you want to have them printed by us. Without a submission today, you need to take care of printing yourself!
  • 06. July 2026: Before the final presentation, we will have a critical review of your projects.
  • 29. June 2026: This is the day of your preliminary poster presentations! Make sure to check out tutorials on Information Visualization and Poster Preparation as well as good examples for posters from previous semesters
  • 15. June 2026: Of all of what we did so far, what can we do with AI, nowadays?
  • 08. June 2026: On this session, we will conduct a peer review on your project ideas.
  • 01. June 2026: You are asked to give a short description of your research project which we will discuss today. For next week, prepare a written description of your project ideas. In the session, we will peer review your ideas.
  • 18. May 2026: To enable critical understanding of what we do, we will devote this session to qualitative text analysis.
  • 11. May 2026: For today’s session, please prepare manual classification.
  • 04. May 2026: We will discuss the context of words today.
  • 27. April 2026: These videos on metrics and statistics will be discussed in this session! You will probably need these videos on DFMs as well.
  • 20. April 2026: For today’s session, please prepare the topics on Working with corpora! If you like, you can already experiment with an example corpus: https://doi.org/10.7910/DVN/KGVSYH
  • 13. April 2026: Welcome to the Summer Term 2026! Today, we start with a gentle introduction. Next week, we will discuss how to build a corpus, so make sure to prepare the corresponding videos!

Recordings of the lecture are available online. Please see the schedule for a selection of relevant videos.

Syllabus¶

A central challenge of our time is the processing of a constantly growing amount of texts. Every day, collections are created that a single person can hardly work through in a reasonable amount of time: be it newspaper articles, statements, minutes, communiqués, blog articles or posts in social media. To help us understand large amounts of text, we turn to computational methods. In this course, we will explore such methods. We will learn methods for quantitative analysis of text collections, methods for extracting information, and statistical methods for analyzing large corpora. These methods will also be presented practically using R and evaluated together. An important part of the seminar is also the critical look at the results of the automated analyses.

Based on the newly learned methods, participants develop their own scientific questions and work on them in small groups during the semester.

In this course, students learn the main theoretical and methodological principles of computer-assisted text analysis and they will be able to apply these methods to their own research projects. After successful participation in this seminar, students will be able to realize, based on an own project, the transfer between a scientific research question and methods of computer-based text analysis.

Check out results of research projects from previous iterations of this course.

Summer 2026¶

In this summer semester, the course takes the subtitle African Studies and AI.

We are going to discuss how African Studies benefit from Computer-Assisted Text Analysis in general, leading to considerations of what contextual knowledge we need to know to accurately interpret our data.

From these broader considerations, we will specifically examine Artificial Intelligence in Africa as a hyper-contested topic taking up on these questions:

  • How is AI in Africa talked about by different stakeholders?
  • How do LLM-based AI talk about Africa?
  • Where and how is suspicion about the authenticity of a given image or information voiced?
  • and more.

R-Basics¶

In case you want to (re-)build basic R skills, please feel free to check out my other tutorials on R. Students of University of Bayreuth can also enroll in an elearning-course which offers tasks and automated evaluation of tasks.

Schedule¶

In this section, you will find a list of tutorial videos helping you to get started with analyzing text data in R.

Section Title
Getting Started Introduction to the seminar
Getting Started Working with RStudio
Getting Started Creating a corpus
Working with corpora Assigning document variables
Working with corpora Saving Time
Setting the Basis Tokenization
Setting the Basis Tokenization and Preprocessing
Setting the Basis Document Feature Matrices
Metrics and Statistics Simple Text Statistics
Metrics and Statistics Obtaining Metrics
Metrics and Statistics Multi-word expressions
Concepts & Context Keywords in context
Concepts & Context Differentiating context and the rest of the document
Manual Classification Using a dictionary for manual classification
More about context Feature Co-Occurrences
Qualitative Text Analysis Some Concepts of Discourse Analysis
Clustering Clustering Documents
Topic Modeling Modeling topics
Topic Modeling Identify parameter k
Topic Modeling Seeded topic models
Stemming & Lemmatization Stemming
Stemming & Lemmatization Lemmatization
Computing with Semantics Word Embedding
Visualization Information Visualization
Visualization Poster Preparation

Legend¶

Find the video here
Find code material here
Find external material here

References¶

  • Mirco Schoenfeld, Steffen Eckhard, Ronny Patz, Hilde van Meegdenburg, and Antonio Pires. The UN Security Council debates 1995-2020. 2021. doi:10.7910/DVN/KGVSYH.
  • Ken Benoit. Text as data: an overview. In: The SAGE Handbook of Research Methods in Political Science and International Relations. SAGE Publications Ltd, 55 City Road, London, Apr 2020. doi:10.4135/9781526486387.
  • James H Martin and Daniel Jurafsky. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall Upper Saddle River, 3 edition, 2020. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.
  • Henry E. Brady. The challenge of big data and data science. Annual Review of Political Science, 22(1):null, 2019. doi:10.1146/annurev-polisci-090216-023229.
  • Kenneth Benoit and Adam Obeng. Readtext: import and handling for plain and formatted text files. 2018. https://readtext.quanteda.io/.
  • Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. Quanteda: an r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30):774, 2018. https://quanteda.io, doi:10.21105/joss.00774.
  • Jenny Bryan. Happy git and github for the user. 2018. https://happygitwithr.com/.
  • Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018. http://socviz.co/.
  • Yihui Xie, Joseph J Allaire, and Garrett Grolemund. R markdown: The definitive guide. CRC Press, 2018. https://bookdown.org/yihui/rmarkdown/.
  • David Lazer and Jason Radford. Data ex machina: introduction to big data. Annual Review of Sociology, 43(1):19–39, 2017. doi:10.1146/annurev-soc-060116-053457.
  • John Wilkerson and Andreu Casas. Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20(1):529–544, 2017. doi:10.1146/annurev-polisci-052615-025542.
  • Paul DiMaggio. Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2):2053951715602908, 2015. doi:10.1177/2053951715602908.
  • Justin Grimmer and Brandon M. Stewart. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013. doi:10.1093/pan/mps028.

  • « Seminar Intelligent Data Processes
  • Seminar: Critical Data Studies (latest iteration) »

Published

13. Apr, 2026

Last Updated

May 4, 2026

Tags

  • dmkg 14
  • lecturenotes 12
  • teaching 23
  • textanalysis 1
  • ubt 20

Links

  • elearning@ubt
  • cmlife@ubt
  • recordings
  • results of previous semesters

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.