Institutions | About Us | Help | Gaeilge
rian logo


Mark
Go Back
Synthetic Dataset Generation for Online Topic Modeling
Belford, Mark; MacNamee, Brian; Greene, Derek
AICS 2017: 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 7 - 8 December 2017 Online topic modeling allows for the discovery of the underlying latent structure in a real time stream of data. In the evaluation of such approaches it is common that a static value for the number of topics is chosen. However, we would expect the number of topics to vary over time due to changes in the underlying structure of the data, known as concept drift and concept shift. We propose a semi-synthetic dataset generator, which can introduce concept drift and concept shift into existing annotated non-temporal datasets, via user-controlled paramaterization. This allows for the creation of multiple different artificial streams of data, where the “correct” number and composition of the topics is known at each point in time. We demonstrate how these generated datasets can be used as an evaluation strategy for online topic modeling approaches. Science Foundation Ireland Insight Research Centre
Keyword(s): Machine Learning & Statistics; Online topic modeling; Semi-synthetic dataset generator; Paramaterization
Publication Date:
2019
Type: Other
Peer-Reviewed: Unknown
Language(s): English
Institution: University College Dublin
Publisher(s): CEUR-WS.org
First Indexed: 2019-07-05 06:16:00 Last Updated: 2019-07-05 06:16:00