This is the main course website for the seminar Introduction to Computer-based Text Analysis given in summer term 2026 at University of Bayreuth.
To view results of past participant's projects, go here: https://mircoschoenfeld.de/results-and-posters-of-the-seminar-introduction-to-computer-based-text-analysis.html
News¶
As you can see, the schedule lists all tutorials available for this course. The given dates showcase which videos will be discussed in the lecture on the mentioned date.
- 16. July 2026: And the final poster presentation will take place on 16. July 2026 from 10am – 1pm! Looking forward to great contributions!
- 13. July 2026: No session today, but submit your posters today via elearning in case you want to have them printed by us. Without a submission today, you need to take care of printing yourself!
- 06. July 2026: Before the final presentation, we will have a critical review of your projects.
- 29. June 2026: This is the day of your preliminary poster presentations! Make sure to check out tutorials on Information Visualization and Poster Preparation as well as good examples for posters from previous semesters
- 15. June 2026: Of all of what we did so far, what can we do with AI, nowadays?
- 08. June 2026: On this session, we will conduct a peer review on your project ideas.
- 01. June 2026: You are asked to give a short description of your research project which we will discuss today. For next week, prepare a written description of your project ideas. In the session, we will peer review your ideas.
- 18. May 2026: To enable critical understanding of what we do, we will devote this session to qualitative text analysis.
- 11. May 2026: For today’s session, please prepare manual classification.
- 04. May 2026: We will discuss the context of words today.
- 27. April 2026: These videos on metrics and statistics will be discussed in this session! You will probably need these videos on DFMs as well.
- 20. April 2026: For today’s session, please prepare the topics on Working with corpora! If you like, you can already experiment with an example corpus: https://doi.org/10.7910/DVN/KGVSYH
- 13. April 2026: Welcome to the Summer Term 2026! Today, we start with a gentle introduction. Next week, we will discuss how to build a corpus, so make sure to prepare the corresponding videos!
Recordings of the lecture are available online. Please see the schedule for a selection of relevant videos.
Syllabus¶
A central challenge of our time is the processing of a constantly growing amount of texts. Every day, collections are created that a single person can hardly work through in a reasonable amount of time: be it newspaper articles, statements, minutes, communiqués, blog articles or posts in social media. To help us understand large amounts of text, we turn to computational methods. In this course, we will explore such methods. We will learn methods for quantitative analysis of text collections, methods for extracting information, and statistical methods for analyzing large corpora. These methods will also be presented practically using R and evaluated together. An important part of the seminar is also the critical look at the results of the automated analyses.
Based on the newly learned methods, participants develop their own scientific questions and work on them in small groups during the semester.
In this course, students learn the main theoretical and methodological principles of computer-assisted text analysis and they will be able to apply these methods to their own research projects. After successful participation in this seminar, students will be able to realize, based on an own project, the transfer between a scientific research question and methods of computer-based text analysis.
Check out results of research projects from previous iterations of this course.
Summer 2026¶
In this summer semester, the course takes the subtitle African Studies and AI.
We are going to discuss how African Studies benefit from Computer-Assisted Text Analysis in general, leading to considerations of what contextual knowledge we need to know to accurately interpret our data.
From these broader considerations, we will specifically examine Artificial Intelligence in Africa as a hyper-contested topic taking up on these questions:
- How is AI in Africa talked about by different stakeholders?
- How do LLM-based AI talk about Africa?
- Where and how is suspicion about the authenticity of a given image or information voiced?
- and more.
R-Basics¶
In case you want to (re-)build basic R skills, please feel free to check out my other tutorials on R. Students of University of Bayreuth can also enroll in an elearning-course which offers tasks and automated evaluation of tasks.
Schedule¶
In this section, you will find a list of tutorial videos helping you to get started with analyzing text data in R.
Legend¶
| Find the video here | |
| Find code material here | |
| Find external material here |
References¶
- Mirco Schoenfeld, Steffen Eckhard, Ronny Patz, Hilde van Meegdenburg, and Antonio Pires. The UN Security Council debates 1995-2020. 2021. doi:10.7910/DVN/KGVSYH.
- Ken Benoit. Text as data: an overview. In: The SAGE Handbook of Research Methods in Political Science and International Relations. SAGE Publications Ltd, 55 City Road, London, Apr 2020. doi:10.4135/9781526486387.
- James H Martin and Daniel Jurafsky. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall Upper Saddle River, 3 edition, 2020. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.
- Henry E. Brady. The challenge of big data and data science. Annual Review of Political Science, 22(1):null, 2019. doi:10.1146/annurev-polisci-090216-023229.
- Kenneth Benoit and Adam Obeng. Readtext: import and handling for plain and formatted text files. 2018. https://readtext.quanteda.io/.
- Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. Quanteda: an r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30):774, 2018. https://quanteda.io, doi:10.21105/joss.00774.
- Jenny Bryan. Happy git and github for the user. 2018. https://happygitwithr.com/.
- Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018. http://socviz.co/.
- Yihui Xie, Joseph J Allaire, and Garrett Grolemund. R markdown: The definitive guide. CRC Press, 2018. https://bookdown.org/yihui/rmarkdown/.
- David Lazer and Jason Radford. Data ex machina: introduction to big data. Annual Review of Sociology, 43(1):19–39, 2017. doi:10.1146/annurev-soc-060116-053457.
- John Wilkerson and Andreu Casas. Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20(1):529–544, 2017. doi:10.1146/annurev-polisci-052615-025542.
- Paul DiMaggio. Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2):2053951715602908, 2015. doi:10.1177/2053951715602908.
- Justin Grimmer and Brandon M. Stewart. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013. doi:10.1093/pan/mps028.