Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings         Authors   Author's Info   Article details         Search    
A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language
Karl Stratos, Do-kyum Kim, Michael Collins, Daniel Hsu
The Brown clustering algorithm (Brown et al., 1992) is widely used in natural language process- ing (NLP) to derive lexical representations that are then used to improve performance on vari- ous NLP problems. The algorithm assumes an underlying model that is essentially an HMM, with the restriction that each word in the vocab- ulary is emitted from a single state. A greedy, bottom-up method is then used to find the clus- tering; this method does not have a guarantee of finding the correct underlying clustering. In this paper we describe a new algorithm for clustering under the Brown et al. model. The method relies on two steps: first, the use of canonical correla- tion analysis to derive a low-dimensional repre- sentation of words; second, a bottom-up hierar- chical clustering over these representations. We show that given a sufficient number of training examples sampled from the Brown et al. model, the method is guaranteed to recover the correct clustering. Experiments show that the method recovers clusters of comparable quality to the al- gorithm of Brown et al. (1992), but is an order of magnitude more efficient.
Pages: 762-771
PS Link:
PDF Link: /papers/14/p762-stratos.pdf
AUTHOR = "Karl Stratos and Do-kyum Kim and Michael Collins and Daniel Hsu",
TITLE = "A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language",
BOOKTITLE = "Proceedings of the Thirtieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-14)",
ADDRESS = "Corvallis, Oregon",
YEAR = "2014",
PAGES = "762--771"

hosted by DSL   •   site info   •   help