A Search Optimization in FFTW
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Generating high performance fast Fourier transform(FFT) libraries for different
computer architectures is an important task. Architecture vendors sometimes
have to rely on dedicated experts to tune FFT implementation on each new platform.
Fastest Fourier transform in the West(FFTW) replaces this tedious and repeated
work with an adaptive FFT library. It automatically generates FFT code that are
comparable to libraries provided by vendors. Part of its success is due to its highly
e cient straight-line style code for small DFTs, called codelets. The other part of
its success is the result of a large and carefully chosen search space of FFT algorithms.
FFTW mainly traverses this space by empirical search, otherwise a simple
heuristic is used. However, both methods have their downside. The empirical search
method spends a lot of search time on large DFT problems and the simple heuristic
often delivers implementation that is much worse than optimum. An ideal approach
should nd a reasonably good implementation within the FFT search space in a
small amount of time.
Model-driven optimization is often believed to be inferior to empirical search.
It is very hard to capture all the performance features of an adaptive library on many
modern architectures. No one has implemented an adaptive performance model to
automatically assist the search of FFT algorithms on multiple architectures. This
thesis presents an implicit abstract machine model and a codelet performance model
that can be used in the current FFTW framework. With the performance prediction
given by these models, the empirical search engine of FFTW can be replaced without
serious hurt of performance. This technique also helps to break down the runtime in FFTW and explain the performance of complex combination of DFT algorithms.
As a good trade-o , this optimization achieves 95% of the performance of FFTW
empirical search and uses less than 5% of the search overhead on four test platforms.
