Institutions | About Us | Help | Gaeilge
rian logo


Mark
Go Back
Finding Niche Topics using Semi-Supervised Topic Modeling via Word Embeddings
Conheady, Gerald; Greene, Derek
AICS 2017: 25th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 7-8 December 2017 Topic modeling techniques generally focus on the discovery of the predominant thematic structures in text corpora. In contrast, a niche topic is made up of a small number of documents related to a common theme. Such a topic may have so few documents relative to the overall corpus size that it fails to be identified when using standard techniques. This paper proposes a new process, called Niche+, for finding these kinds of niche topics. It assumes interactions with a user who can provide a strictly limited level of supervision, which is subsequently employed in semi-supervised matrix factorization. Furthermore, word embeddings are used to provide additional weakly-labeled data. Experimental results show that documents in niche topics can be successfully identified using Niche+. These results are further supported via a use case that explores a real-world company email database. Science Foundation Ireland Insight Research Centre
Keyword(s): Modeling techniques; Niche+; Word embeddings; Text corpus exploration; Topic modeling
Publication Date:
2019
Type: Other
Peer-Reviewed: Unknown
Language(s): English
Institution: University College Dublin
Publisher(s): CEUR-WS.org
First Indexed: 2019-07-10 06:15:34 Last Updated: 2019-07-10 06:15:34