mircoschoenfeld
  • publications
  • teaching
  • talks and workshops
  • community service
  • research projects
  • lab@ubt
  • blog

Lecture: Data Modeling and Knowledge Generation (Winter 2022/23)

Contents

  • News
  • Syllabus
  • Schedule
    • Introduction
    • The shape of data
    • Data management
    • Algorithms
    • Methods
    • Information Visualization
    • Science of Knowledge
  • References

This is the main course website for the lecture Data Modeling and Knowledge Generation in winter term 2022/23 at University of Bayreuth.

News¶

  • 31. January 2023: Karl Popper, Bayesian Statistics, and Machine Learning will close this term.
  • 27. January 2023: The last practical is about visualizing data - Download the material here!
  • 24. January 2023: Important topic for the upcoming week: visualizing your findings!
  • 17. January 2023: Very exciting method ahead: Social Network Analysis!
  • 13. January 2023: Today’s practical deals with handling text data in R.
  • 10. January 2023: This week we’ll start diving into methods of data modeling. First up: modeling text data!
  • 20. December 2022: From Machine Learning to Artificial Intelligence and ethical considerations.
  • 16. December 2022: A practical covering machine learning in R. Jump right in!
  • 13. December 2022: Supervised Learning will be this week’s topic!
  • 06. December 2022: First lecture regarding algorithms will be about clustering!
  • 29. November 2022: Now to Data and Knowledge Management!
  • 22. November 2022: Jump to the schedule to download the latest slide deck!
  • 18. November 2022: Third practical is online. Today we will be working with git!
  • 15. November 2022: This week’s lecture is about databases.
  • 08. November 2022: This week’s topic: Data Modeling. See the schedule for the recording of the lecture and accompanying materials.
  • 04. November 2022: Second practical online. It covers the first simple data inspection and cleaning!
  • 25. October 2022: Lecture about the shape of data is online!
  • 21. October 2022: First practical finished! Yay! Download slides and material from the schedule!
  • 18. October 2022: Lecture starts and introductory slides are online! See the schedule to download the slides. Make sure to prepare next weeks session before class!

Recordings of the lecture are available online but require an account of UBT.

Syllabus¶

Data models represent the real world in the analysis process, they act as their placeholder, so to speak. As such, they create their own reality for the analyses. The formulation of data models is always subject to conscious and unconscious selection and transformation decisions. These decisions implicitly influence the way algorithms and analysts understand and process the real world. At the same time, data models act as blueprints for a real world that comes after analysis. Finally, analysis results are produced and evaluated with the help of data models and communicated as new knowledge. The decisions mentioned above therefore have far-reaching implications for the expected results and the knowledge that can be gained from these results. This dual role of description and prescription opens up a field of tension for the analysis process in interdisciplinary research as well as in numerous business areas that make use of "data driven decision making", for example. Only when the data model, algorithm and results are viewed as a holistic unit of an analysis process can reliable knowledge be gained from data.

In this course, different methods for data analysis and knowledge generation will be presented - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. These methods, which are currently widely used in science, business and beyond, bring about different requirements for the modelling of data. These requirements are viewed critically. The implications for the expected results and the knowledge derived from them are explicitly stated.

Students will learn different methods for data analysis and knowledge generation - including methods from the fields of machine learning, data mining, text mining, social network analysis and information visualization. Students become aware of the requirements for the required data models that the different analysis methods entail. Students know how to critically question data analyses, how to specify the implicit modelling decisions and how to evaluate analysis results always against the background of these decisions.

Schedule¶

In this section, you will find the lectures together with materials, recordings, self-tests, and downloadable slides.

Introduction¶

Title Video Material Slides Test
Definition of data, datafication, and usage of data. Motivation for the lecture.

The shape of data¶

Title Video Material Slides Test
From Analog to Digital, Digital Signals, Images, Optical Character Recognition

Motivation Plain Text Tools, Introduction to RStudio and R Markdown



Data structures, important file types, tables
Introduction to R: General introduction, read in data, access table columns, filter data


Models in General, Formal Modeling, Conceptual Modeling

Data management¶

Title Video Slides Test
Logical modeling, relational databases, RDF, Graph databases
Working with git




Metadata, Ontologies, Knowledge Graphs
Data Management, Knowledge Management, Long-term archiving

Algorithms¶

Title Video Material Gap Text Slides Test
Unsupervised Learning: Clustering, Silhouettes, Curse of Dimensionality

Supervised Learning: Classification, Overfitting, Imbalanced Data, Embedding Models, Interpretability of Models

How-To: Clustering and Classification in R


Artificial Intelligence and Ethical Considerations: Reinforcement Learning, Deep Reinforcement Learning, Ethics in Machine Learning


Methods¶

Title Video Gap Text Slides Test
Modeling Texts: Bag-of-words, n-grams, word2vec embeddings
Hands on Text Mining: Tokenization, Document-Feature-Matrix, TF-IDF, Cosine similarity, Document Recommendation
Social Network Analysis: What are networks? Graph traversal, centrality metrics, community detection

Information Visualization¶

Title Video Material Gap Text Slides Test
Information Visualization: History of Information Visualization, graphical elements, graphical integrity
Hands-on Information Visualization: ggplot2, long and wide format of tables

Science of Knowledge¶

Title Video Gap Text Slides Test
Knowledge, Epistemology, Philosophy of Science

Legend:

Session was a lecture
Session was a practical
Find a video here
Find slides here
Find code material here
Find external material here
This session has a self assessment quiz attached
This session has a script with gap text attached

References¶

  • Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Volume 112. Springer, 2 edition, 2021. https://web.stanford.edu/~hastie/ISLR2/ISLRv2_website.pdf, doi:10.1007/978-1-0716-1418-1.
  • Dan Jurafsky. Speech & language processing. Pearson Education India, 3 edition, 2021. https://web.stanford.edu/~jurafsky/slp3/.
  • Bruce Nielson and Daniel C. Elton. Induction, popper, and machine learning. 2021. arXiv:2110.00840.
  • Markus Putnings, Heike Neuroth, and Janna Neumann, editors. Praxishandbuch Forschungsdatenmanagement. De Gruyter Saur, 2021. ISBN 9783110657807. doi:doi:10.1515/9783110657807.
  • Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
  • Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive data sets. Cambridge university press, 3 edition, 2020. http://www.mmds.org/.
  • Kieran Healy. The Plain Person’s Guide to Plain Text Social Science. The Internet, 2019. https://plain-text.co/.
  • Colin Ware. Information visualization: perception for design. Morgan Kaufmann, 4 edition, 2019.
  • Julia Flanders and Fotis Jannidis. The shape of data in digital humanities: modeling texts and text-based resources. Routledge, 2018. doi:10.4324/9781315552941.
  • Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2 edition, 2018. http://incompleteideas.net/book/RLbook2020.pdf.
  • Fotis Jannidis, Hubertus Kohle, and Malte Rehbein. Digital Humanities: Eine Einführung. Springer, 2017. doi:10.1007/978-3-476-05446-3.
  • Jo Bates, Yu-Wei Lin, and Paula Goodale. Data journeys: capturing the socio-material constitution of data objects and flows. Big Data & Society, 3(2):1–12, 2016. doi:10.1177/2053951716654502.
  • Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, 4 edition, 2016.
  • Richard Gartner. Metadata. Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-40893-4, doi:10.1007/978-3-319-40893-4.
  • Alfons Kemper and Andre Eickler. Datenbanksysteme: Eine Einführung. Oldenbourg Verlag, München, 2015. ISBN 978-3-11-044375-2.
  • Walter Krämer. So lügt man mit Statistik. Campus Verlag, 2015.
  • Isabel Meirelles. Design for information: an introduction to the histories, theories, and best practices behind effective information visualizations. Rockport publishers, 2013.
  • Christopher Healey and James Enns. Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7):1170–1188, July 2012. https://www.csc2.ncsu.edu/faculty/healey/PP/index.html, doi:10.1109/TVCG.2011.127.
  • Jacques Bertin. Semiology of graphics: diagrams networks maps. Esri Press, Redlands, California, 2011.
  • Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 10(4):271–288, 2011. doi:10.1177/1473871611415994.
  • Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer series in statistics New York, 2 edition, 2009. https://hastie.su.domains/ElemStatLearn/.
  • Susan Schreibman, Ray Siemens, and John Unsworth. A companion to digital humanities. John Wiley & Sons, 2008. http://www.digitalhumanities.org/companion/.
  • Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval. Cambridge University Press Cambridge, 2008. https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf.
  • Willard McCarty. Humanities Computing. Palgrave Macmillan UK, 2005. ISBN 978-1-4039-3504-5.
  • Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001. doi:10.1214/ss/1009213726.
  • Edward Rolf Tufte. The visual display of quantitative information. Volume 2. Graphics press, Cheshire, CT, 2001.
  • Hans-Jörg Bullinger, Kai Wörner, and Juan Prieto. Wissensmanagement — modelle und strategien für die praxis. In: Wissensmanagement: Schritte zum intelligenten Unternehmen, pages 21–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. doi:10.1007/978-3-642-71995-0_2.
  • Stephen Palmer and Irvin Rock. Rethinking perceptual organization: the role of uniform connectedness. Psychonomic bulletin & review, 1(1):29–55, 1994.

  • « Seminar Social Network Analysis (Winter 2022/23)
  • Personal Data Management: From an organised hard drive to an automated life »

Published

18. Oct, 2022

Last Updated

Jan 31, 2023

Tags

  • dmkg 10
  • lecturenotes 7
  • teaching 17
  • ubt 14

Links

  • elearning@ubt
  • cmlife@ubt
  • recordings

Find me here

  • This website contains no ads, cookies, trackers or social media buttons.
  • Powered by Pelican and Elegant.