Seminar: Introduction to computer-based text analysis (latest iteration)

This is the main course website for the seminar Introduction to Computer-based Text Analysis given in summer term 2025 at University of Bayreuth.

To view results of past participant's projects, go here: https://mircoschoenfeld.de/results-and-posters-of-the-seminar-introduction-to-computer-based-text-analysis.html

News¶

As you can see, the schedule lists all tutorials available for this course. The given dates showcase which videos will be discussed in the lecture on the mentioned date.

21. July 2025: And the final poster presentation will take place on 21. July 2025! Looking forward to great contributions!
14. July 2025: Before the final presentation, we will have a critical review of your projects.
07. July 2025: This is the day of your preliminary poster presentations! Make sure to check out tutorials on Information Visualization and Poster Preparation as well as good examples for posters from previous semesters
30. June 2025: Of all of what we did so far, what can we do with AI, nowadays?
23. June 2025: On this session, we will conduct a peer review on your project ideas.
16. June 2025: You are asked to give a short description of your research project which we will discuss today. For next week, prepare a written description of your project ideas. In the session, we will peer review your ideas.
02. June 2025: To enable critical understanding of what we do, we will devote this session to discourse analysis. Please prepare this article https://arxiv.org/abs/1906.10969 for today’s class
26. May 2025: Today, we will see an introduction to United Nations and the UN Security Council
19. May 2025: We will discuss the context of words today.
12. May 2025: These videos on metrics and statistics will be discussed in this session! You will probably need these videos on DFMs as well.
05. May 2025: For today’s session, please prepare the topics on Working with corpora! If you like, you can already experiment with an example corpus: https://doi.org/10.7910/DVN/KGVSYH
28. April 2025: Welcome to the Summer Term 2025! Today, we start with a gentle introduction. Next week, we will discuss how to build a corpus!

Recordings of the lecture are available online. Please see the schedule for a selection of relevant videos.

Syllabus¶

A central challenge of our time is the processing of a constantly growing amount of texts. Every day, collections are created that a single person can hardly work through in a reasonable amount of time: be it newspaper articles, statements, minutes, communiqués, blog articles or posts in social media. To help us understand large amounts of text, we turn to computational methods. In this course, we will explore such methods. We will learn methods for quantitative analysis of text collections, methods for extracting information, and statistical methods for analyzing large corpora. These methods will also be presented practically using R and evaluated together. An important part of the seminar is also the critical look at the results of the automated analyses.

Based on the newly learned methods, participants develop their own scientific questions and work on them in small groups during the semester.

In this course, students learn the main theoretical and methodological principles of computer-assisted text analysis and they will be able to apply these methods to their own research projects. After successful participation in this seminar, students will be able to realize, based on an own project, the transfer between a scientific research question and methods of computer-based text analysis.

Check out results of research projects from previous iterations of this course.

Summer 2025¶

In this summer semester, we recommend students to work with the Corpus The UN Security Council Debates which can be obtained here: https://doi.org/10.7910/DVN/KGVSYH.

To properly prepare working with this huge collection of text, first of all, familiarize yourself with the creation of the corpus: https://arxiv.org/abs/1906.10969. Then, to put things into context, Prof. Joël Glasman has prepared a collection of videos as an introduction into the context, meaning the United Nations and its security council:

R-Basics¶

In case you want to (re-)build basic R skills, please feel free to check out my other tutorials on R. Students of University of Bayreuth can also enroll in an elearning-course which offers tasks and automated evaluation of tasks.

Schedule¶

In this section, you will find a list of tutorial videos helping you to get started with analyzing text data in R.

	Section	Title
	Getting Started	Introduction to the seminar
	Getting Started	Working with RStudio
	Getting Started	Creating a corpus
	Working with corpora	Assigning document variables
	Working with corpora	Saving Time
	Setting the Basis	Tokenization
	Setting the Basis	Tokenization and Preprocessing
	Setting the Basis	Document Feature Matrices
	Metrics and Statistics	Simple Text Statistics
	Metrics and Statistics	Obtaining Metrics
	Metrics and Statistics	Multi-word expressions
	Concepts & Context	Keywords in context
	Concepts & Context	Differentiating context and the rest of the document
	Working with the UNSC corpus	What is the United Nations? (by Prof. Joël Glasman)
	Working with the UNSC corpus	Presentation of the Corpus (by Prof. Joël Glasman)
	Working with the UNSC corpus	Why the UN Security Council matters (by Prof. Joël Glasman)
	Manual Classification	Using a dictionary for manual classification
	More about context	Feature Co-Occurrences
	Qualitative Text Analysis	Some Concepts of Discourse Analysis
	Clustering	Clustering Documents
	Topic Modeling	Modeling topics
	Topic Modeling	Identify parameter k
	Topic Modeling	Seeded topic models
	Stemming & Lemmatization	Stemming
	Stemming & Lemmatization	Lemmatization
	Computing with Semantics	Word Embedding
	Visualization	Information Visualization
	Visualization	Poster Preparation

Legend¶

	Find the video here
	Find code material here
	Find external material here

References¶

Mirco Schoenfeld, Steffen Eckhard, Ronny Patz, Hilde van Meegdenburg, and Antonio Pires. The UN Security Council debates 1995-2020. 2021. doi:10.7910/DVN/KGVSYH.
Ken Benoit. Text as data: an overview. In: The SAGE Handbook of Research Methods in Political Science and International Relations. SAGE Publications Ltd, 55 City Road, London, Apr 2020. doi:10.4135/9781526486387.
James H Martin and Daniel Jurafsky. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall Upper Saddle River, 3 edition, 2020. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.
Henry E. Brady. The challenge of big data and data science. Annual Review of Political Science, 22(1):null, 2019. doi:10.1146/annurev-polisci-090216-023229.
Kenneth Benoit and Adam Obeng. Readtext: import and handling for plain and formatted text files. 2018. https://readtext.quanteda.io/.
Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. Quanteda: an r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30):774, 2018. https://quanteda.io, doi:10.21105/joss.00774.
Jenny Bryan. Happy git and github for the user. 2018. https://happygitwithr.com/.
Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018. http://socviz.co/.
Yihui Xie, Joseph J Allaire, and Garrett Grolemund. R markdown: The definitive guide. CRC Press, 2018. https://bookdown.org/yihui/rmarkdown/.
David Lazer and Jason Radford. Data ex machina: introduction to big data. Annual Review of Sociology, 43(1):19–39, 2017. doi:10.1146/annurev-soc-060116-053457.
John Wilkerson and Andreu Casas. Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20(1):529–544, 2017. doi:10.1146/annurev-polisci-052615-025542.
Paul DiMaggio. Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2):2053951715602908, 2015. doi:10.1177/2053951715602908.
Justin Grimmer and Brandon M. Stewart. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013. doi:10.1093/pan/mps028.

Seminar: Introduction to computer-based text analysis (latest iteration)

News¶

Syllabus¶

Summer 2025¶

R-Basics¶

Schedule¶

Legend¶

References¶

Published

Last Updated

Tags

Links

Find me here