Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream
Amr Ahmed, Eric Xing
Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collec- tions often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' dis- tribution and popularity are time-evolving. Several models exist that model the evolu- tion of some but not all of the above as- pects. In this paper we introduce infinite dynamic topic models, iDTM, that can ac- commodate the evolution of all the aforemen- tioned aspects. Our model assumes that doc- uments are organized into epochs, where the documents within each epoch are exchange- able but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the repre- sentation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the effi- cacy of our model on both simulated and real datasets with favorable outcome.
PDF Link: /papers/10/p20-ahmed.pdf
AUTHOR = "Amr Ahmed
and Eric Xing",
TITLE = "Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream",
BOOKTITLE = "Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2010",
PAGES = "20--29"