Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings         Authors   Author's Info   Article details         Search    
Improved Densification of One Permutation Hashing
Anshumali Shrivastava, Ping Li
The existing work on densification of one permu- tation hashing [24] reduces the query processing cost of the (K, L)-parameterized Locality Sen- sitive Hashing (LSH) algorithm with minwise hashing, from O(dKL) to merely O(d + KL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. While that is a substantial improvement, our analy- sis reveals that the existing densification scheme in [24] is sub-optimal. In particular, there is no enough randomness in that procedure, which af- fects its accuracy on very sparse datasets. In this paper, we provide a new densification pro- cedure which is provably better than the existing scheme [24]. This improvement is more signifi- cant for very sparse datasets which are common over the web. The improved technique has the same cost of O(d + KL) for query processing, thereby making it strictly preferable over the ex- isting procedure. Experimental evaluations on public datasets, in the task of hashing based near neighbor search, support our theoretical findings.
Pages: 732-741
PS Link:
PDF Link: /papers/14/p732-shrivastava.pdf
AUTHOR = "Anshumali Shrivastava and Ping Li",
TITLE = "Improved Densification of One Permutation Hashing",
BOOKTITLE = "Proceedings of the Thirtieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-14)",
ADDRESS = "Corvallis, Oregon",
YEAR = "2014",
PAGES = "732--741"

hosted by DSL   •   site info   •   help