mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • blog

Seminar: Introduction to computer-based text analysis (Summer 2022)

Contents

  • News
  • Syllabus
  • Schedule
    • Getting Started
    • Setting the Basis
    • Metrics and Statistics
    • Concepts & Context
    • Manual Classification
    • Stemming & Lemmatization
    • Clustering
    • Topic Modeling
    • More about context
    • Legend
  • References

This is the main course website for the seminar Introduction to Computer-based Text Analysis given in summer term 2022 at University of Bayreuth.

To view results of the participant's projects, go here: https://mircoschoenfeld.de/seminar-introduction-to-computer-based-text-analysis.html

News¶

  • 19. May 2022: A lot of topics ahead to feed your thirst for text-mining-knowledge over next week’s holiday!
  • 12. May 2022: Check out tutorials on how to investigate framing of concepts
  • 05. May 2022: A lot of new videos ahead!
  • 28. April 2022: First tutorials are online! Jump to the schedule!

Recordings of the lecture are available online. Please see the schedule for a selection of relevant videos.

Syllabus¶

A central challenge of our time is the processing of a constantly growing amount of texts. Every day, collections are created that a single person can hardly work through in a reasonable amount of time: be it newspaper articles, statements, minutes, communiqués, blog articles or posts in social media. To help us understand large amounts of text, we turn to computational methods. In this course, we will explore such methods. We will learn methods for quantitative analysis of text collections, methods for extracting information, and statistical methods for analyzing large corpora. These methods will also be presented practically using R and evaluated together. An important part of the seminar is also the critical look at the results of the automated analyses.

Based on the newly learned methods, the participants develop their own scientific questions and work on them in small groups during the semester.

Based on their new methodological and theoretical insights, participants develop own research questions in the seminar and answer them throughout the semester in groups of two persons.

In this course, students learn the main theoretical and methodological principles of computer-assisted text analysis and they will be able to apply these methods to their own research projects. After successful participation in this seminar, students will be able to realize, based on an own project, the transfer between a scientific research question and methods of computer-based text analysis.

Schedule¶

In this section, you will find a list of tutorial videos helping you to get started with analyzing text data in R.

Getting Started¶

Title Video Source-Code Material
Working with RStudio
Creating a corpus



Assigning document variables
Saving Time

Setting the Basis¶

Title Video Source-Code Material
Tokenization
Tokenization and Preprocessing
Document Feature Matrices

Metrics and Statistics¶

Title Video Source-Code Material
Simple Text Statistics
Obtaining Metrics
Multi-word expressions

Concepts & Context¶

Title Video Source-Code Material
Keywords in context
Differentiating context and the rest of the document

Manual Classification¶

Title Video Source-Code Material
Using a dictionary for manual classification

Stemming & Lemmatization¶

Title Video Source-Code Material
Stemming
Lemmatization

Clustering¶

Title Video Source-Code Material
Clustering Documents

Topic Modeling¶

Title Video Source-Code Material
Modeling topics
Identify parameter k
Seeded topic models

More about context¶

Title Video Source-Code Material
Feature Co-Occurrences
Word Embedding

Legend¶

Find the video here
Find code material here
Find external material here

References¶

  • Mirco Schoenfeld, Steffen Eckhard, Ronny Patz, Hilde van Meegdenburg, and Antonio Pires. The UN Security Council debates 1995-2020. 2021. doi:10.7910/DVN/KGVSYH.
  • Ken Benoit. Text as data: an overview. In: The SAGE Handbook of Research Methods in Political Science and International Relations. SAGE Publications Ltd, 55 City Road, London, Apr 2020. doi:10.4135/9781526486387.
  • James H Martin and Daniel Jurafsky. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall Upper Saddle River, 3 edition, 2020. https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.
  • Henry E. Brady. The challenge of big data and data science. Annual Review of Political Science, 22(1):null, 2019. doi:10.1146/annurev-polisci-090216-023229.
  • Kenneth Benoit and Adam Obeng. Readtext: import and handling for plain and formatted text files. 2018. https://readtext.quanteda.io/.
  • Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. Quanteda: an r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30):774, 2018. https://quanteda.io, doi:10.21105/joss.00774.
  • Jenny Bryan. Happy git and github for the user. 2018. https://happygitwithr.com/.
  • Kieran Healy. Data visualization: a practical introduction. Princeton University Press, 2018. http://socviz.co/.
  • Yihui Xie, Joseph J Allaire, and Garrett Grolemund. R markdown: The definitive guide. CRC Press, 2018. https://bookdown.org/yihui/rmarkdown/.
  • David Lazer and Jason Radford. Data ex machina: introduction to big data. Annual Review of Sociology, 43(1):19–39, 2017. doi:10.1146/annurev-soc-060116-053457.
  • John Wilkerson and Andreu Casas. Large-scale computerized text analysis in political science: opportunities and challenges. Annual Review of Political Science, 20(1):529–544, 2017. doi:10.1146/annurev-polisci-052615-025542.
  • Paul DiMaggio. Adapting computational text analysis to social science (and vice versa). Big Data & Society, 2(2):2053951715602908, 2015. doi:10.1177/2053951715602908.
  • Justin Grimmer and Brandon M. Stewart. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267–297, 2013. doi:10.1093/pan/mps028.

  • « Tutorial: Introduction to R for scholars of humanities and social sciences (Summer 2022)
  • Tutorial: Introduction to R for scholars of humanities and social sciences (Winter 2022/23) »

Published

28. Apr, 2022

Last Updated

Oct 13, 2022

Tags

  • dmkg 10
  • lecturenotes 7
  • teaching 17
  • textanalysis 1
  • ubt 14

Links

  • elearning@ubt
  • cmlife@ubt
  • recordings

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.