Contents Previous Next


Fastest Fourier Transform in the West.

Version : 2.1.3
Author(s) : M. Frigo and S. G. Johnson,
License : GPL
Website :

Disk space required for installation is 811.00 Kb


FFTW is a comprehensive collection of fast C routines for computing the discrete Fourier
transform (DFT) in one or more dimensions, of both real and complex data, and of arbitrary input size. FFTW also includes parallel transforms for both shared- and distributed-memory
systems. We assume herein that the reader is already familiar with the properties and uses of the DFT that are relevant to her application. Otherwise, see e.g. The Fast Fourier Transform by
E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974). Our web page also has links to FFT-related information online.

FFTW is usually faster (and sometimes much faster) than all other freely-available Fourier transform programs found on the Net. For transforms whose size is a power of two, it compares
favorably with the FFT codes in Sun's Performance Library and IBM's ESSL library, which are targeted at specific machines. Moreover, FFTW's performance is portable. Indeed, FFTW is
unique in that it automatically adapts itself to your machine, your cache, the size of your memory, the number of registers, and all the other factors that normally make it impossible to optimize
a program for more than one machine. An extensive comparison of FFTW's performance with that of other Fourier transform codes has been made. The results are available on the Web at the
benchFFT home page.

In order to use FFTW effectively, you need to understand one basic concept of FFTW's internal structure. FFTW does not used a fixed algorithm for computing the transform, but it can adapt
the DFT algorithm to details of the underlying hardware in order to achieve best performance. Hence, the computation of the transform is split into two phases. First, FFTW's planner is
called, which "learns" the fastest way to compute the transform on your machine. The planner produces a data structure called a plan that contains this information. Subsequently, the plan is
passed to FFTW's executor, along with an array of input data. The executor computes the actual transform, as dictated by the plan. The plan can be reused as many times as needed. In
typical high-performance applications, many transforms of the same size are computed, and consequently a relatively-expensive initialization of this sort is acceptable. On the other hand, if
you need a single transform of a given size, the one-time cost of the planner becomes significant. For this case, FFTW provides fast planners based on heuristics or on previously computed

The pattern of planning/execution applies to all four operation modes of FFTW, that is, I) one-dimensional complex transforms (FFTW), II) multi-dimensional complex transforms
(FFTWND), III) one-dimensional transforms of real data (RFFTW), IV) multi-dimensional transforms of real data (RFFTWND). Each mode comes with its own planner and executor.

Besides the automatic performance adaptation performed by the planner, it is also possible for advanced users to customize FFTW for their special needs. As distributed, FFTW works most
efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and uses a slower general-purpose routine for other factors. FFTW, however, comes with a code generator
that can produce fast C programs for any particular array size you may care about. For example, if you need transforms of size 513 = 19*33, you can customize FFTW to support the factor 19

FFTW can exploit multiple processors if you have them. FFTW comes with a shared-memory implementation on top of POSIX (and similar) threads, as well as a distributed-memory
implementation based on MPI. We also provide an experimental parallel implementation written in Cilk, the superior programming tool of choice for discriminating hackers (Olin Shivers). (See
the Cilk home page.)

Contents Previous Next