Faster file matching using GPGPUs

Date
2010
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
File matching is an important topic in eld of forensics and information security. With the increasing popularity of GPU computing for scientific research and other commercial purposes, there is a desire to solve problems in a faster and low-cost effective manner. There is a need for the design of faster and effective parallelizable algorithms to exploit the parallelism o ered by the multi-core GPUs. One particular application that could potentially bene t from the massive amount of parallelism offered by GPUs is file matching. File matching involves identifying similar files or partially similar files using file signatures (I.e., hashes). It is a computationally expensive task, although it provides scope for parallelism and is therefore well suited for the GPU. We address the problem of faster file matching by identifying the parallel algorithm that is best suited to take advantage of GPU computing. MD6 is a cryptographic hash function that is tree-based, highly parallelizable, and can be used to construct the hashes used in file matching applications. The message M to be hashed can be computed at different starting points and their results can be aggregated as the final step. We implemented a parallel version of MD6 on the GPUs using CUDA by effectively partitioning and parallelizing the algorithm. To demonstrate the performance of CUDA MD6, we performed various experiments with inputs of different sizes and varying input parameters. We believe that CUDA MD6 is one of the fast and effective solutions for identifying similar files that are currently available.
Description
Keywords
Citation