信号与信息处理

一种基于CUDA的局部敏感哈希算法

展开
  • 1. 上海大学通信与工程学院, 上海 200444;
    2. 上海大学智慧城市研究院, 上海 200444

收稿日期: 2015-02-01

  修回日期: 2015-03-29

  网络出版日期: 2015-09-30

基金资助

国家"863"高技术研究发展计划基金(No.2013AA01A603);上海市教育委员会科研创新项目基金(No.14YZ011)资助

A Locality Sensitive Hashing Algorithm Based on CUDA

Expand
  • 1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China;
    2. Institute of Smart City, Shanghai University, Shanghai 200444, China

Received date: 2015-02-01

  Revised date: 2015-03-29

  Online published: 2015-09-30

摘要

传统的局部敏感哈希算法建立哈希表时往往需要较大的内存空间以及较长的建立时间. 在查询阶段,查询样本K个最近邻数据项的所需时间超过整个运行时间的95%. 针对这些问题,运用计算设备架构将局部敏感哈希算法移植至图形处理器,并用多线程并行计算数据项的哈希值来建立哈希表. 查询阶段在全局内存中引入基于工作队列的多样本查询,以提高算法的运行效率. 实验结果表明,所提出的算法与传统的局部敏感哈希算法相比,能在不降低运算精度的情况下将运算速度提高近12倍.

本文引用格式

张一凡, 余小清, 安炫东, 万旺根 . 一种基于CUDA的局部敏感哈希算法[J]. 应用科学学报, 2015 , 33(5) : 550 -558 . DOI: 10.3969/j.issn.0255-8297.2015.05.009

Abstract

The traditional locality sensitive hashing (LSH) algorithm often takes a large memory space and long settling time to build a hash table. In the query phase, it takes more than 95% of the overall running time to search K nearest neighbor data items of samples. To solve these problems, this paper uses the compute unified device architecture (CUDA) to transplant LSH algorithm into a GPU, using parallel multi-threads to calculate the values of data items to build the hash table. In the query phase, we introduce multisample query based on work queue to global memory to improve efficiency of the algorithm. Experimental results show that computing speed of the proposed LSH algorithm is 12 times faster than the traditional LSH algorithm.

参考文献

[1] 凌康. 基于局部敏感哈希的相似性搜索技术研究[D]. 南京:南京大学,2012.

[2] Donkin D. Multidimensional search problem [J]. Siam Journal on Computing, 1976, 5(2): 181-186.

[3] Yeh T, Lee J, Darrell T. Adaptive vocabulary forests br dynamic indexing and category learning [C]//Proceedings of the 11th IEEE International Conference on Computer Vision Rio de Janeiro, 2007. Washington, DC, USA: IEEE Computer Society, 2007: 1-8.

[4] Indyk P, Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality[C]//Proceedings of 30th ACM Symposium on Theory of Computing. New York, USA, 1998: 604-613.

[5] Gionis A, Indyk P. Similarity search in high dimensions via hashing [C]//Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, UK, 1999: 518-529.

[6] Wu Y, Shou L D, Chen G. CB-LSH: an efficient LSH indexing algorithm based on compressed bitmap [J]. Journal of Zhejiang University, 2012, 3(46): 376-385.

[7] Yin S Y, Badr M, Vodislav D. Dynamic multi-probe LSH: an I/O efficient index structure for approximate nearest neighbor search [J]. Lecture Notes in Computer Science, 2013, 1(8055): 48-62.

[8] Gu X G, Zhang Y D, Zhang L. An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features [J]. Signal Processing, 2013, 93(8): 2244-2255.

[9] Jiang K, Que Q C, Kulis B. Revisiting kernelized locality-sensitive hashing for improved largescale image retrieval [C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014.

[10] 雷德川. 基于CUDA的GPU加速迭代重建算法研究[D]. 中国工程物理研究院,2012.

[11] O'donnell R, Wu Y, Zhou Y. Optimal lower bounds for locality-sensitive hashing [C]// Proceedings of Innovations in Computer Science, 2011: 275-283.

[12] Andoni A, Indyk P, Nguyen H L. Beyond locality-sensitive hashing [C]//Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014: 1018-1028.

[13] 李淼,孙荣坤. 基于p稳定分布局部敏感哈希地址的鲁棒音频检索方法[J]. 信号处理,2012, 28(3): 367-375. Li M, Sun R K. Robust audio retrieval method using local-sensitive hashing address based on p-stable distribution [J]. Signal Processing, 2012, 28(3): 367-375. (in Chinese)

[14] Pagh R, Rodler F F. Cuckoo hashing [J]. Journal of Algorithms, 2004, 51(18): 122-144.

[15] Alcantara D A, Sharf A. Real-time parallel hashing on the GPU [J]. ACM Transactions on Graphics, 2009, 28(154): 1-9.
文章导航

/