In interdisciplinary collaborations that were related to text mining and topic modeling, we often needed to talk about the outcomes of the mining- and modeling-procedures in order to evaluate the models both quantitatively and qualitatively. To have a common "view" on a model and to allow non-computer-scientists to deeply evaluate it, I put together the tminspector which you can see in action here.
The tminspector displays information about the topic model and provides access to the raw data
how to inspect a model¶
The first view the tminspector provides on a topic model is the development of its topics over time. You can see how many documents of one year have their highest assignment towards one topic. By clicking a colored area (within the dashed lines), you can see the assignments of documents of that particular year towards the chosen topic.
On the left, the top words of that topic are displayed. This is often a first, quick way of assessing the context of a topic.
Below the timeline, a list of documents appears that states the assignment probability of the document to the topic as well as the origin of the document. In case of the example instance that displays a topic model of the speeches held to the United Nations Security Council in meetings on "The situation in Afghanistan", the origin of a document is the country or organisation a speaker represents. (If you want to learn more about the data, see our UNSC dataset and its description.)
The tminspector provides access to the topic assignments of all documents.
If you want to dig deep, you can also access the document-topic-matrix by clicking the tab "Document Topic Assignments" in the header line. This allows you to evaluate topic assignments of individual documents and closely look at individual "categories" of documents (in this case, its the country/organisation of the speaker), and individual years by applying the corresponding filters.
how to use it¶
To inspect a topic model using the tminspector, the model needs to be prepared in a certain way. This will be discussed in a later article.
Assuming you have a URL to a tminspector-dataset enter and execute the following lines in
library(shiny) TMINSPCT_SOURCE_URL <- URL runGitHub("tminspector","TwlyY29")
This preconfigures the tminspector to download the dataset. Finally, the current working snapshot is taken from my github. For this to work, you need to make sure the following requirements are met: