This is the main course website for the lecture Data Modeling and Knowledge Generation in winter term 2022/23 at University of Bayreuth.
News¶
- 31. January 2023: Karl Popper, Bayesian Statistics, and Machine Learning will close this term.
- 27. January 2023: The last practical is about visualizing data - Download the material here!
- 24. January 2023: Important topic for the upcoming week: visualizing your findings!
- 17. January 2023: Very exciting method ahead: Social Network Analysis!
- 13. January 2023: Today’s practical deals with handling text data in R.
- 10. January 2023: This week we’ll start diving into methods of data modeling. First up: modeling text data!
- 20. December 2022: From Machine Learning to Artificial Intelligence and ethical considerations.
- 16. December 2022: A practical covering machine learning in R. Jump right in!
- 13. December 2022: Supervised Learning will be this week’s topic!
- 06. December 2022: First lecture regarding algorithms will be about clustering!
- 29. November 2022: Now to Data and Knowledge Management!
- 22. November 2022: Jump to the schedule to download the latest slide deck!
- 18. November 2022: Third practical is online. Today we will be working with git!
- 15. November 2022: This week’s lecture is about databases.
- 08. November 2022: This week’s topic: Data Modeling. See the schedule for the recording of the lecture and accompanying materials.
- 04. November 2022: Second practical online. It covers the first simple data inspection and cleaning!
- 25. October 2022: Lecture about the shape of data is online!
- 21. October 2022: First practical finished! Yay! Download slides and material from the schedule!
- 18. October 2022: Lecture starts and introductory slides are online! See the schedule to download the slides. Make sure to prepare next weeks session before class!
Recordings of the lecture are available online but require an account of UBT.
Syllabus¶
Data models represent the real world in the analysis process, they act as their placeholder, so to speak. As such, they create their own reality for the analyses. The formulation of data models is always subject to conscious and unconscious selection and transformation decisions. These decisions implicitly influence the way algorithms and analysts understand and process the real world. At the same time, data models act as blueprints for a real world that comes after analysis. Finally, analysis results are produced and evaluated with the help of data models and communicated as new knowledge. The decisions mentioned above therefore have far-reaching implications for the expected results and the knowledge that can be gained from these results. This dual role of description and prescription opens up a field of tension for the analysis process in interdisciplinary research as well as in numerous business areas that make use of "data driven decision making", for example. Only when the data model, algorithm and results are viewed as a holistic unit of an analysis process can reliable knowledge be gained from data.
In this course, different methods for data analysis and knowledge generation will be presented - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. These methods, which are currently widely used in science, business and beyond, bring about different requirements for the modelling of data. These requirements are viewed critically. The implications for the expected results and the knowledge derived from them are explicitly stated.
Students will learn different methods for data analysis and knowledge generation - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. Students become aware of the requirements for the required data models that the different analysis methods entail. Students know how to critically question data analyses, how to specify the implicit modelling decisions and how to evaluate analysis results always against the background of these decisions.
Schedule¶
In this section, you will find the lectures together with materials, recordings, self-tests, and downloadable slides.
Introduction¶
Title | Video | Material | Slides | Test | |
---|---|---|---|---|---|
Definition of data, datafication, and usage of data. Motivation for the lecture. |
The shape of data¶
Data management¶
Title | Video | Slides | Test | |||
---|---|---|---|---|---|---|
Logical modeling, relational databases, RDF, Graph databases | ||||||
Working with git |
|
|||||
Metadata, Ontologies, Knowledge Graphs | ||||||
Data Management, Knowledge Management, Long-term archiving |
|
Algorithms¶
Methods¶
Information Visualization¶
Science of Knowledge¶
Title | Video | Gap Text | Slides | Test | |
---|---|---|---|---|---|
Knowledge, Epistemology, Philosophy of Science |
Legend:
Session was a lecture | |
Session was a practical | |
Find a video here | |
Find slides here | |
Find code material here | |
Find external material here | |
This session has a self assessment quiz attached | |
This session has a script with gap text attached |
References¶
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Volume 112. Springer, 2 edition, 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf, doi:10.1007/978-1-0716-1418-1.
- Dan Jurafsky. Speech & language processing. Pearson Education India, 3 edition, 2021. https://web.stanford.edu/~jurafsky/slp3/.
- Bruce Nielson and Daniel C. Elton. Induction, popper, and machine learning. 2021. arXiv:2110.00840.
- Markus Putnings, Heike Neuroth, and Janna Neumann, editors. Praxishandbuch Forschungsdatenmanagement. De Gruyter Saur, 2021. ISBN 9783110657807. doi:doi:10.1515/9783110657807.
- Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data sets. Cambridge university press, 3 edition, 2020. http://www.mmds.org/.
- Kieran Healy. The Plain Person’s Guide to Plain Text Social Science. The Internet, 2019. https://plain-text.co/.
- Colin Ware. Information visualization: perception for design. Morgan Kaufmann, 4 edition, 2019.
- Julia Flanders and Fotis Jannidis. The shape of data in digital humanities: modeling texts and text-based resources. Routledge, 2018. doi:10.4324/9781315552941.
- Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2 edition, 2018. http://incompleteideas.net/book/RLbook2020.pdf.
- Fotis Jannidis, Hubertus Kohle, and Malte Rehbein. Digital Humanities: Eine Einführung. Springer, 2017. doi:10.1007/978-3-476-05446-3.
- Jo Bates, Yu-Wei Lin, and Paula Goodale. Data journeys: capturing the socio-material constitution of data objects and flows. Big Data & Society, 3(2):1–12, 2016. doi:10.1177/2053951716654502.
- Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, 4 edition, 2016.
- Richard Gartner. Metadata. Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-40893-4, doi:10.1007/978-3-319-40893-4.
- Alfons Kemper and Andre Eickler. Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, München, 2015. ISBN 978-3-11-044375-2.
- Walter Krämer. So lügt man mit Statistik. Campus Verlag, 2015.
- Isabel Meirelles. Design for information: an introduction to the histories, theories, and best practices behind effective information visualizations. Rockport publishers, 2013.
- Christopher Healey and James Enns. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7):1170–1188, July 2012. https://www.csc2.ncsu.edu/faculty/healey/PP/index.html, doi:10.1109/TVCG.2011.127.
- Jacques Bertin. Semiology of graphics: diagrams networks maps. Esri Press, Redlands, California, 2011.
- Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 10(4):271–288, 2011. doi:10.1177/1473871611415994.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer series in statistics New York, 2 edition, 2009. https://hastie.su.domains/ElemStatLearn/.
- Susan Schreibman, Ray Siemens, and John Unsworth. A companion to digital humanities. John Wiley & Sons, 2008. http://www.digitalhumanities.org/companion/.
- Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
- Willard McCarty. Humanities Computing. Palgrave Macmillan UK, 2005. ISBN 978-1-4039-3504-5.
- Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001. doi:10.1214/ss/1009213726.
- Edward Rolf Tufte. The visual display of quantitative information. Volume 2. Graphics press, Cheshire, CT, 2001.
- Hans-Jörg Bullinger, Kai Wörner, and Juan Prieto. Wissensmanagement — modelle und strategien für die praxis. In: Wissensmanagement: Schritte zum intelligenten Unternehmen, pages 21–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. doi:10.1007/978-3-642-71995-0_2.
- Stephen Palmer and Irvin Rock. Rethinking perceptual organization: the role of uniform connectedness. Psychonomic bulletin & review, 1(1):29–55, 1994.