This is the main course website for the lecture Data Modeling and Knowledge Generation in winter term 2021/22 at University of Bayreuth.
News¶
- 04. February 2022: The last practical is about visualizing data - Download the material here!
- 01. February 2022: Karl Popper, Bayesian Statistics, and Machine Learning.
- 25. January 2022: Important: visualizing your findings! This will be today’s topic.
- 21. January 2022: Fifth practical deals with handling text data in R. Download the material here!
- 18. January 2022: Today we talk about modeling text data
- 14. January 2022: A practical covering machine learning in R. Jump right in!
- 11. January 2022: Last (theoretical) session about machine learning today.
- 21. December 2021: Supervised Learning will be today’s topic!
- 14. December 2021: Today will be about clustering!
- 07. December 2021: With lecture number 8 we’ll start learning about methods
- 30. November 2021: Jump to the schedule to download the latest slide deck!
- 26. November 2021: Third practical is online. Today we will be working with git!
- 23. November 2021: Lecture number six is online!
- 16. November 2021: Lecture number five is about databases.
- 12. November 2021: Second practical done. That’s the first simple data inspection and cleaning!
- 09. November 2021: Lecture number four is online!
- 02. November 2021: Slides for third lecture are online.
- 29. October 2021: First practical finished! Yay! Download slides and material from the schedule!
- 26. October 2021: Slides for second lecture are online.
- 19. October 2021: Lecture starts and introductory slides are online! See the schedule to download the slides.
Recordings of the lecture are available online but require an account of UBT.
Syllabus¶
Data models represent the real world in the analysis process, they act as their placeholder, so to speak. As such, they create their own reality for the analyses. The formulation of data models is always subject to conscious and unconscious selection and transformation decisions. These decisions implicitly influence the way algorithms and analysts understand and process the real world. At the same time, data models act as blueprints for a real world that comes after analysis. Finally, analysis results are produced and evaluated with the help of data models and communicated as new knowledge. The decisions mentioned above therefore have far-reaching implications for the expected results and the knowledge that can be gained from these results. This dual role of description and prescription opens up a field of tension for the analysis process in interdisciplinary research as well as in numerous business areas that make use of "data driven decision making", for example. Only when the data model, algorithm and results are viewed as a holistic unit of an analysis process can reliable knowledge be gained from data.
In this course, different methods for data analysis and knowledge generation will be presented - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. These methods, which are currently widely used in science, business and beyond, bring about different requirements for the modelling of data. These requirements are viewed critically. The implications for the expected results and the knowledge derived from them are explicitly stated.
Students will learn different methods for data analysis and knowledge generation - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. Students become aware of the requirements for the required data models that the different analysis methods entail. Students know how to critically question data analyses, how to specify the implicit modelling decisions and how to evaluate analysis results always against the background of these decisions.
Schedule¶
Legend:
Session was a lecture | |
Session was a practical | |
Find slides here | |
Find code material here | |
Find external material here | |
This session has a self assessment quiz attached |
References¶
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Volume 112. Springer, 2 edition, 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf, doi:10.1007/978-1-0716-1418-1.
- Dan Jurafsky. Speech & language processing. Pearson Education India, 3 edition, 2021. https://web.stanford.edu/~jurafsky/slp3/.
- Bruce Nielson and Daniel C. Elton. Induction, popper, and machine learning. 2021. arXiv:2110.00840.
- Markus Putnings, Heike Neuroth, and Janna Neumann, editors. Praxishandbuch Forschungsdatenmanagement. De Gruyter Saur, 2021. ISBN 9783110657807. doi:doi:10.1515/9783110657807.
- Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data sets. Cambridge university press, 3 edition, 2020. http://www.mmds.org/.
- Kieran Healy. The Plain Person’s Guide to Plain Text Social Science. The Internet, 2019. https://plain-text.co/.
- Colin Ware. Information visualization: perception for design. Morgan Kaufmann, 4 edition, 2019.
- Julia Flanders and Fotis Jannidis. The shape of data in digital humanities: modeling texts and text-based resources. Routledge, 2018. doi:10.4324/9781315552941.
- Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2 edition, 2018. http://incompleteideas.net/book/RLbook2020.pdf.
- Fotis Jannidis, Hubertus Kohle, and Malte Rehbein. Digital Humanities: Eine Einführung. Springer, 2017. doi:10.1007/978-3-476-05446-3.
- Jo Bates, Yu-Wei Lin, and Paula Goodale. Data journeys: capturing the socio-material constitution of data objects and flows. Big Data & Society, 3(2):1–12, 2016. doi:10.1177/2053951716654502.
- Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, 4 edition, 2016.
- Richard Gartner. Metadata. Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-40893-4, doi:10.1007/978-3-319-40893-4.
- Alfons Kemper and Andre Eickler. Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, München, 2015. ISBN 978-3-11-044375-2.
- Walter Krämer. So lügt man mit Statistik. Campus Verlag, 2015.
- Isabel Meirelles. Design for information: an introduction to the histories, theories, and best practices behind effective information visualizations. Rockport publishers, 2013.
- Christopher Healey and James Enns. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7):1170–1188, July 2012. https://www.csc2.ncsu.edu/faculty/healey/PP/index.html, doi:10.1109/TVCG.2011.127.
- Jacques Bertin. Semiology of graphics: diagrams networks maps. Esri Press, Redlands, California, 2011.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer series in statistics New York, 2 edition, 2009. https://hastie.su.domains/ElemStatLearn/.
- Susan Schreibman, Ray Siemens, and John Unsworth. A companion to digital humanities. John Wiley & Sons, 2008. http://www.digitalhumanities.org/companion/.
- Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
- Willard McCarty. Humanities Computing. Palgrave Macmillan UK, 2005. ISBN 978-1-4039-3504-5.
- Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001. doi:10.1214/ss/1009213726.
- Edward Rolf Tufte. The visual display of quantitative information. Volume 2. Graphics press, Cheshire, CT, 2001.
- Hans-Jörg Bullinger, Kai Wörner, and Juan Prieto. Wissensmanagement — modelle und strategien für die praxis. In: Wissensmanagement: Schritte zum intelligenten Unternehmen, pages 21–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. doi:10.1007/978-3-642-71995-0_2.
- Stephen Palmer and Irvin Rock. Rethinking perceptual organization: the role of uniform connectedness. Psychonomic bulletin & review, 1(1):29–55, 1994.