Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings   Proceeding details   Article details         Authors         Search    
Multilingual Topic Models for Unaligned Text
Jordan Boyd-Graber, David Blei
We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.
Keywords: null
Pages: 75-82
PS Link:
PDF Link: /papers/09/p75-boyd-graber.pdf
AUTHOR = "Jordan Boyd-Graber and David Blei",
TITLE = "Multilingual Topic Models for Unaligned Text",
BOOKTITLE = "Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-09)",
ADDRESS = "Corvallis, Oregon",
YEAR = "2009",
PAGES = "75--82"

hosted by DSL   •   site info   •   help