Fast convolutional neural networks on graphics processing units

dc.contributor.authorZhang, Yulin
dc.date.accessioned2020-10-13T12:34:03Z
dc.date.available2020-10-13T12:34:03Z
dc.date.issued2019
dc.date.updated2020-02-06T17:02:36Z
dc.description.abstractThe Convolutional Neural Networks (CNNs) architecture is one of the most widely used deep learning tools. The execution time of CNNs is dominated by the time spent on the convolution steps. Most CNNs implementations adopt a simple yet efficient im2col (image to column) +GEMM approach to implement convolution. The im2col+GEMM approach lowers the convolution into matrix multiplication that can be easily parallelized with highly efficient BLAS libraries. The contribution of this dissertation is that we observe significant but intricately patterned data redundancy in this matrix representation of convolution. We have not been able to identify earlier work that exploits this redundancy to improve the performance of CNNs. In this work, we analyze the origin of the redundancy generated by the im2col process, and reveal a new data pattern to more mathematically concisely describe the matrix representation for convolution. Based on this redundancy-minimized matrix representation, we implement a FFT-based convolution with finer FFT granularity. It achieves on average 23% and maximum 50% speedup on the ILSVRC2017 benchmark over the regular FFT convolution from NVIDIA’s cuDNN library, one of the most widely used CNNs libraries. Moreover, by replacing existing methods with our new convolution method in a popular deep-learning programming framework Caffe, we observe on average 74% speedup for multiple synthetic CNNs in closer-to-real-world application scenarios.en_US
dc.description.advisorLi, Xiaoming
dc.description.degreePh.D.
dc.description.departmentUniversity of Delaware, Department of Electrical and Computer Engineering
dc.identifier.doihttps://doi.org/10.58088/p7nw-tm63
dc.identifier.unique1200041049
dc.identifier.urihttps://udspace.udel.edu/handle/19716/27826
dc.language.rfc3066en
dc.publisherUniversity of Delawareen_US
dc.relation.urihttps://login.udel.idm.oclc.org/login?url=https://www.proquest.com/docview/2384783869?accountid=10457
dc.titleFast convolutional neural networks on graphics processing unitsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_udel_0060D_14028.pdf
Size:
1.6 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: