Fast algorithms for selection and multidimensional sparse Fourier transform
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
In recent years, many applications throughout engineering, science, and finance have been faced with the analysis of massive amounts of data. This dissertation focuses on two algorithms ubiquitous in data analysis, with the goal of making them efficient in “big data” scenarios.
First, the selection problem also known as finding the order statistic of data is addressed. In particular, a fast weighted median (WM) algorithm, which is a more general problem than selection, is derived. The new algorithm computes the WM of N samples which has linear time and space complexity as opposed to O(N logN) which is the time complexity of traditional sorting algorithms. A popular selection algorithm often used to find the WM in large data sets is Quickselect whose performance is highly dependent on how the pivots are chosen. The new algorithm introduces an optimization based pivot selection strategy which results in significantly improved performance as well as a more consistent runtime compared to traditional approaches. In particular, the selected pivots are order statistics of subsets of the input data. In order to find the optimal order statistics as well as the optimal subset sizes, a set of cost functions are derived, which when minimized lead to optimal design parameters. The complexity is compared to Floyd and Rivest’s algorithm SELECT which to date has been the fastest median algorithm and it is shown that the proposed algorithm requires 30% fewer comparisons. It is also shown that the proposed selection algorithm is asymptotically optimal for large N.
The second algorithm developed in this dissertation extends the concepts of the Sparse Fast Fourier Transform (sFFT) Algorithm introduced in 2012, to work with multidimensional input data. The multidimensional algorithm requires several generalizations to multiple key concepts of the 1D sparse Fourier transform algorithm.
It is shown that the permutation parameter is of key importance and should not be chosen randomly but instead can be optimized for the reconstruction of sparse real world data.
A connection is made between key steps of the algorithm and lattice theory, thus establishing a rigorous understanding of the effect of the permutation parameter on the algorithm performance. Lattice theory is then used to optimize the set of parameters to achieve a more robust and better performing algorithm.
The result improves the algorithm without penalty on sampling complexity. In fact other algorithms which use pseudorandom spectrum permutation can also benefit from this finding. This dissertation addresses the case of the exact k-sparse Fourier transform but the underlying concepts can be applied to the general case of finding a k-sparse approximation of the Fourier transform of an arbitrary signal.
Simulations illustrate the efficiency and accuracy of the proposed algorithm. The optimizations of the parameters and the improvements therewith are shown in simulations in such that the worst case and average case PSNR improves by several dB.