Lecture: Data Modeling and Knowledge Generation (Winter 2024/25)

This is the main course website for the lecture Data Modeling and Knowledge Generation in winter term 2024/25 at University of Bayreuth.

News¶

Jump to the schedule to access explanations and materials!

04. February 2025: Exam is scheduled for today.
28. January 2025: In the last session before the exam, we will discuss questions about the science of knowledge, and, of course, general questions as well. Make sure to register to the exam before the lecture! Deadline is today!!!
21. January 2025: Information Visualization is an important area that we will discuss today. Please take note of the deadline for exam registration on the 28.1.2025!
14. January 2025: Another fantastic area of methods: social network analysis
07. January 2025: In this session, we dive into the topic of text mining
17. December 2024: Wrapping up the topic of machine learning with a lecture about artificial intelligence and ethical considerations
10. December 2024: Attention! This session will not happen in presence! Questions about supervised learning will be discussed next week!
03. December 2024: An introduction to unsupervised learning will be the topic for today.
29. November 2024: Today, we have a practical to get to know git
26. November 2024: We will discuss questions from last week and about data and knowledge management
19. November 2024: Attention! This session will not happen in presence! Questions about the metadata lecture will be answered next week!
12. November 2024: We will discuss relational databases in this session. Please refer to the lecture about databases.
05. November 2024: In this session, we will explore the term “data modeling”.
15. November 2024: The first practical will take place on this date! We will introduce plain text tools
29. October 2024: We will discuss models as the topic of the week.
22. October 2024: Digitalization will be the topic of this session!
16. October 2024: Lecture starts and introductory slides are online! See the schedule to download the slides. Make sure to prepare next weeks session before class!

Recordings of the lecture are available online but require an account of UBT.

Syllabus¶

Data models represent the real world in the analysis process, they act as their placeholder, so to speak. As such, they create their own reality for the analyses. The formulation of data models is always subject to conscious and unconscious selection and transformation decisions. These decisions implicitly influence the way algorithms and analysts understand and process the real world. At the same time, data models act as blueprints for a real world that comes after analysis. Finally, analysis results are produced and evaluated with the help of data models and communicated as new knowledge. The decisions mentioned above therefore have far-reaching implications for the expected results and the knowledge that can be gained from these results. This dual role of description and prescription opens up a field of tension for the analysis process in interdisciplinary research as well as in numerous business areas that make use of "data driven decision making", for example. Only when the data model, algorithm and results are viewed as a holistic unit of an analysis process can reliable knowledge be gained from data.

In this course, different methods for data analysis and knowledge generation will be presented - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. These methods, which are currently widely used in science, business and beyond, bring about different requirements for the modelling of data. These requirements are viewed critically. The implications for the expected results and the knowledge derived from them are explicitly stated.

Students will learn different methods for data analysis and knowledge generation - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. Students become aware of the requirements for the required data models that the different analysis methods entail. Students know how to critically question data analyses, how to specify the implicit modelling decisions and how to evaluate analysis results always against the background of these decisions.

Examination¶

To complete this lecture, you will write an exam. The exam is conducted as a multiple-choice exam which will be held over a period of 90 minutes and in accordance with the examination and study regulations of the MA Philosophy & Computer Science.

The exam will take place on 4th February 2025! You need to register for the exam in order to participate! Deadline for registration is on 28. January 2025 14:00 - so before the last lecture prior to the exam!

Schedule¶

In this section, you will find the lectures together with materials, recordings, self-tests, and downloadable slides.

	Section	Title
	Introduction	Definition of data, datafication, and usage of data. Motivation for the lecture.
	The shape of data	From Analog to Digital, Digital Signals, Images, Optical Character Recognition
	The shape of data	Data structures, important file types, tables
	The shape of data	Motivation Plain Text Tools, Introduction to RStudio and R Markdown
	The shape of data	Models in General, Formal Modeling, Conceptual Modeling
	The shape of data	Introduction to R: General introduction, read in data, access table columns, filter data
	Data management	Logical modeling, relational databases, RDF, Graph databases
	Data management	Metadata, Ontologies, Knowledge Graphs
	Data management	Data Management, Knowledge Management, Long-term archiving
	Data management	Working with git
	Algorithms	Unsupervised Learning: Clustering, Silhouettes, Curse of Dimensionality
	Algorithms	How-To: Clustering in R
	Algorithms	Supervised Learning: Classification, Overfitting, Imbalanced Data, Embedding Models, Interpretability of Models
	Algorithms	How-To: Classification in R
	Algorithms	Artificial Intelligence and Ethical Considerations: Reinforcement Learning, Deep Reinforcement Learning, Ethics in Machine Learning
	Methods	Modeling Texts: Bag-of-words, n-grams, word2vec embeddings
	Methods	Hands on Text Mining: Tokenization, Document-Feature-Matrix, TF-IDF, Cosine similarity, Document Recommendation
	Methods	Social Network Analysis: What are networks? Graph traversal, centrality metrics, community detection
	Information Visualization	Information Visualization: History of Information Visualization, graphical elements, graphical integrity
	Information Visualization	Hands-on Information Visualization: ggplot2, long and wide format of tables
	Science of Knowledge	Knowledge, Epistemology, Philosophy of Science

Legend:

	Session was a lecture
	Session was a practical
	Find a video here
	Find slides here
	Find code material here
	Find external material here
	This session has a self assessment quiz attached
	This session has a script with gap text attached

References¶

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Volume 112. Springer, 2 edition, 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf, doi:10.1007/978-1-0716-1418-1.
Dan Jurafsky. Speech & language processing. Pearson Education India, 3 edition, 2021. https://web.stanford.edu/~jurafsky/slp3/.
Bruce Nielson and Daniel C. Elton. Induction, popper, and machine learning. 2021. arXiv:2110.00840.
Markus Putnings, Heike Neuroth, and Janna Neumann, editors. Praxishandbuch Forschungsdatenmanagement. De Gruyter Saur, 2021. ISBN 9783110657807. doi:doi:10.1515/9783110657807.
Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data sets. Cambridge university press, 3 edition, 2020. http://www.mmds.org/.
Kieran Healy. The Plain Person’s Guide to Plain Text Social Science. The Internet, 2019. https://plain-text.co/.
Colin Ware. Information visualization: perception for design. Morgan Kaufmann, 4 edition, 2019.
Julia Flanders and Fotis Jannidis. The shape of data in digital humanities: modeling texts and text-based resources. Routledge, 2018. doi:10.4324/9781315552941.
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2 edition, 2018. http://incompleteideas.net/book/RLbook2020.pdf.
Fotis Jannidis, Hubertus Kohle, and Malte Rehbein. Digital Humanities: Eine Einführung. Springer, 2017. doi:10.1007/978-3-476-05446-3.
Jo Bates, Yu-Wei Lin, and Paula Goodale. Data journeys: capturing the socio-material constitution of data objects and flows. Big Data & Society, 3(2):1–12, 2016. doi:10.1177/2053951716654502.
Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, 4 edition, 2016.
Richard Gartner. Metadata. Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-40893-4, doi:10.1007/978-3-319-40893-4.
Alfons Kemper and Andre Eickler. Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, München, 2015. ISBN 978-3-11-044375-2.
Walter Krämer. So lügt man mit Statistik. Campus Verlag, 2015.
Isabel Meirelles. Design for information: an introduction to the histories, theories, and best practices behind effective information visualizations. Rockport publishers, 2013.
Christopher Healey and James Enns. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7):1170–1188, July 2012. https://www.csc2.ncsu.edu/faculty/healey/PP/index.html, doi:10.1109/TVCG.2011.127.
Jacques Bertin. Semiology of graphics: diagrams networks maps. Esri Press, Redlands, California, 2011.
Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 10(4):271–288, 2011. doi:10.1177/1473871611415994.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer series in statistics New York, 2 edition, 2009. https://hastie.su.domains/ElemStatLearn/.
Susan Schreibman, Ray Siemens, and John Unsworth. A companion to digital humanities. John Wiley & Sons, 2008. http://www.digitalhumanities.org/companion/.
Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
Willard McCarty. Humanities Computing. Palgrave Macmillan UK, 2005. ISBN 978-1-4039-3504-5.
Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001. doi:10.1214/ss/1009213726.
Edward Rolf Tufte. The visual display of quantitative information. Volume 2. Graphics press, Cheshire, CT, 2001.
Hans-Jörg Bullinger, Kai Wörner, and Juan Prieto. Wissensmanagement — modelle und strategien für die praxis. In: Wissensmanagement: Schritte zum intelligenten Unternehmen, pages 21–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. doi:10.1007/978-3-642-71995-0_2.
Stephen Palmer and Irvin Rock. Rethinking perceptual organization: the role of uniform connectedness. Psychonomic bulletin & review, 1(1):29–55, 1994.

Lecture: Data Modeling and Knowledge Generation (Winter 2024/25)

News¶

Syllabus¶

Examination¶

Schedule¶

References¶

Published

Last Updated

Tags

Links

Find me here