11 September 2025

Fast Fourier Transforms Part 1: Cooley-Tukey

by Connor Boyle

I’m planning to write a series of posts about fast Fourier transform algorithms. This first post covers the Cooley-Tukey algorithm, which is the original and most well-known FFT algorithm.

The Discrete Fourier Transform

If \(x\) is a sequence of complex numbers with a length \(\lvert x \rvert\) and a starting index of 0, then the discrete Fourier transform of \(x\), \(\mathcal{F} \{ x \}\), is defined as follows:

\[|\mathcal{F} \{ x \}| = |x|\] \[\mathcal{F} \{ x \}[k] = \sum_{j=0}^{|x|-1} x[j] \cdot e^{-i 2 \pi jk \frac{1}{|x|}}\]

Since complex exponentation is so commonly used in Fourier transforms, we’ll define a helpful term \(W_N\) as follows:

\[W_N \triangleq e^{-i 2 \pi \frac{1}{N}}\]

i.e. \(W_N = W_N^1\) is a \(\frac{1}{N}\)-turn rotation in the complex plane (starting at 1). \(W_N^2\) is a \(\frac{2}{N}\)-turn rotation in the complex plane, etc. Substituting to the original discrete Fourier transform definition, we get:

\[\mathcal{F} \{ x \}[k] = \sum_{j=0}^{|x|-1} x[j] \cdot W_{|x|}^{jk}\]

Naïvely evaluating this equation for each of the \(\lvert x \rvert\) different output frequency buckets of the DFT (\(k = 0, 1, \ldots, \lvert x \rvert - 2, \lvert x \rvert - 1\)) requires a summation of complex products of the \(\lvert x \rvert\) samples in the signal, thus giving any naïve DFT algorithm a time complexity of \(O(\lvert x \rvert^2)\).

The Cooley-Tukey Algorithm

If \(\lvert x \rvert\) is a composite number, we can pick two natural numbers \(r\) and \(d\), such that:

\[\lvert x \rvert = r \cdot d\]

This allows us to change the single summation over \(j\) into nested summations:

\[\mathcal{F} \{ x \}[k] = \sum_{j_1=0}^{d-1} \sum_{j_0=0}^{r-1} x[j_1 r + j_0] \cdot W_{\lvert x \rvert}^{(j_1 r + j_0)k}\] \[= \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{\lvert x \rvert}^{(j_1 r + j_0)k}\]

Similarly, we can define variables \(k_0\) and \(k_1\) such that \(k = k_1 d + k_0\). Let:

\[k_1 \triangleq \lfloor \frac{k}{d} \rfloor\] \[k_0 \triangleq k - k_1 d\]

In other words, \(k_1\) is the quotient and \(k_0\) is the remainder of the Euclidean division¹ of \(k\) by \(d\).

This allows us to again re-formulate the discrete Fourier transform:

\[\mathcal{F} \{ x \}[k] = \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{|x|}^{(j_1 r + j_0) (k_1 d + k_0)}\] \[= \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{|x|}^{j_1 r k_1 d + j_1 r k_0 + j_0 (k_1 d + k_0)}\] \[= \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{|x|}^{j_1 k_1 |x| + j_1 r k_0 + j_0 k}\] \[= \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{|x|}^{j_1 k_1 |x|} \cdot W_{|x|}^{j_1 r k_0} \cdot W_{|x|}^{j_0 k}\]

Since \(W_{\lvert x \rvert}^{j_1 k_1 \lvert x \rvert} = (e^{-i \frac{2 \pi \lvert x \rvert}{\lvert x \rvert}})^{j_1 k_1} = 1^{j_1 k_1} = 1\), therefore:

\[\mathcal{F} \{ x \}[k] = \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x[j_1 r + j_0] \cdot W_{\lvert x \rvert}^{j_1 r k_0} \cdot W_{\lvert x \rvert}^{j_0 k}\]

At this point, we can split the elements of \(x\) into sub-sequences corresponding to modulo classes. Let \(x_{r}^{j_0}\) be a sequence whose elements are equal to the elements of \(x\) whose indices are equivalent \(j_0\) modulo \(r\). More formally, these sequences (of which there are \(r\) total) can be defined as follows:

\[x_r^{j_0}[j_1] = x[j_1 r + j_0]\] \[|x_r^{j_0}| = \frac{|x|}{r} = d\]

Substituting this sequence definition, we get:²

\[\mathcal{F} \{ x \}[k] = \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x_r^{j_0}[j_1] W_{|x|}^{k_0 j_1 r} W_{|x|}^{j_0 k}\] \[= \sum_{j_0=0}^{r-1} \sum_{j_1=0}^{d-1} x_r^{j_0}[j_1] W_d^{k_0 j_1} W_{|x|}^{j_0 k}\] \[= \sum_{j_0=0}^{r-1} \mathcal{F} \{ x_r^{j_0} \}[k_0] W_{|x|}^{j_0 k}\]

Let’s consider how long it will take to evaluate the discrete Fourier transform in this formulation:

First, we need to pre-compute the discrete Fourier transforms of the \(r\) different sequences \(x_r^{j_0}\) (i.e. \(\mathcal{F} \{ x_r^0 \}, \mathcal{F} \{ x_r^1 \}, \ldots \mathcal{F} \{ x_r^{r-1} \}\)), each of length \(d\). Assuming each DFT is performed naïvely, this will take \(O(r \cdot d^2) = O(\lvert x \rvert \cdot d)\) operations.
Then, we need to compute the \(\lvert x \rvert\) values of \(\mathcal{F} \{ x \}\), each of which requires a summation over \(r\) different values of \(j_0\) for \(\mathcal{F} \{ x^{(j_0)} \}[k_0]\), taking \(O(\lvert x \rvert \cdot r)\) operations.

Added together, these two sub-routines require \(O(\lvert x \rvert \cdot d + \lvert x \rvert \cdot r) = O(\lvert x \rvert \cdot (d + r))\) operations, possibly a significant improvement from the original \(O(\lvert x \rvert^2)\) complexity of the original naive formulation, depending on the values of \(r\) & \(d\). More importantly, this manipulation can be applied recursively. Specifically, each of the \(r\) discrete Fourier transforms of the \(d\)-length sequences \(x_r^{j_0}\) can be broken down into \(r'\) Fourier transforms of length \(d'\), assuming that two natural numbers exist such that \(d = r' \cdot d'\).³ In the ideal⁴ case where \(\lvert x \rvert = 2^n\), \(n \in \mathbb{N}\), calculating the Cooley-Tukey algorithm will require \(O(\lvert x \rvert \cdot (2 + 2 + \ldots + 2)) = O(\lvert x \rvert \cdot 2 \cdot \log_2(\lvert x \rvert)) = O (\lvert x \rvert \cdot \log(\lvert x \rvert))\) operations.

The Cooley-Tukey algorithm can also be used to calculate the inverse discrete Fourier transform with only very slight modification. In fact, the original Cooley-Tukey paper (see “related reading”) specifically described an algorithm to compute the inverse discrete Fourier transform, not the “forward” DFT. I will leave the Cooley-Tukey iDFT algorithm as an exercise for the reader.

However, note that the Cooley-Tukey algorithm gives no speed-up for input sequences of prime length, and provides relatively little speed-up when the factors of the input length contain large primes. To efficiently compute the DFT for sequences of prime or even non-highly-composite lengths, we will need additional algorithms. Ultimately, however, these other FFT algorithms generally depend on Cooley-Tukey for part of the computation. I plan to cover at least one of these techniques—Bluestein’s algorithm—in a future blog post(s).

Cooley-Tukey Interactive Visualization

This visualization shows how the discrete Fourier transform of some signal \(x\) is computed using the Cooley-Tukey algorithm. The black boxes at the very bottom are the input signal. While the DFT can be applied to complex signals, I’ve restricted the sample values of the input signal to be real numbers, for simplicity’s sake (this mimics some real-world applications, such as performing a DFT on an audio recording). You can click and drag on the input boxes to change their values.

The grey circles and the sometimes visible white “clock hands” inside of them represent the complex exponent \(W_N^x = e^{-2 i \pi \frac{x}{N}}\) for some \(N\) (e.g. \(\lvert x \rvert\), \(d\), \(d'\), etc.) and some \(x\). These complex exponents, which are equivalent to rotations in the complex plane, are applied to the relevant input value. Unlike the usual convention, I’ve decided to show the real component of the complex plane as vertical (“up” is positive-real) and the imaginary component as horizontal (“right” is imaginary-positive). You can see where the input value is drawn from by hovering the mouse over a given “rotation” box.

The sum of those rotated input values is added together to calculate one element of a discrete Fourier transform. Hover over a white “output” box of a discrete Fourier transform in the visualization to highlight the column of “rotation” boxes that it was summed from.

\(|x| =\)

Available factors:

A Quick Rant on Word Choice

I’ve noticed an irritating and confusing tendency among many people–including published authors–when talking about discrete Fourier transforms. I find they often use the phrase “fast Fourier transform” (or perhaps more often, the abbreviation “FFT”) when they mean “discrete Fourier transform” (or “DFT”). I think this is wrong and confusing; to understand why, imagine you have a list:

x = [10, 3, 2, 19, -2]

would it make sense to refer to the following list as the “mergesort” of the previous list?

y = [-2, 2, 3, 10, 19]

I think most people would find that very strange. We can say that the second list is the result of sorting the first list, but we don’t know anything about the specific algorithm that was used to sort the list. It could have been sorted using mergesort, quicksort, heapsort, bubblesort, bogosort, or any other sorting algorithm (in reality, I just sorted this one in my head). However, even if I had used mergesort, y still wouldn’t be the “mergesort” of x, it would still just be “the result of sorting x”.

Similarly, the output of an FFT algorithm should not be referred to as “an/the FFT of” anything. In theory, calculating the DFT of a sequence using Cooley-Tukey gives the exact same result as calculating that DFT using a naïvely-implemented DFT algorithm. In practice, Cooley-Tukey will probably give a slightly more accurate result since there are fewer total calculations and therefore fewer opportunities for floating point round-off error.

This is not just irritating to me; I think it causes confusion among the public. A friend of mine–an intelligent mathematics major and software engineer–recently asked me “what would I lose by taking the fast Fourier transform instead of the ‘normal’ Fourier transform? I’ve always just taken the normal Fourier transform”. He probably inferred from the way many people throw around the phrase “FFT” that the “fast Fourier transform” is some kind of approximation or related concept yet distinct from the discrete Fourier transform or Fourier transforms in general, rather than an algorithm for calculating such.

An Algorithm for the Machine Calculation of Complex Fourier Series by James W. Cooley & John W. Tukey

Thank you to my friend Andre Archer, who helped to proofread an earlier version of this post. Any mistakes are my own.

Footnotes:

As far as I am aware, there is no widely-standardized notation for Euclidean division in mathematics. Personally, I find this quite silly. ↩
\[\mathcal{F} \{ x_r^{j_0} \}[k_0] = \sum_{j_1 = 0}^{d} x_r^{(j_0)}[j_1] \cdot W_d^{k_0 j_1}\]
↩
If \(r\) is chosen to be equal to \(\lvert x \rvert\), and therefore \(d = 1\), then calculating \(\mathcal{F} \{ x \}\) using Cooley-Tukey is equivalent to calculating \(\mathcal{F} \{ x \}\) naïvely, taking \(O(\lvert x \rvert^2)\) steps. ↩
By “ideal” I do not necessarily mean optimal, in any sense. \(|x| = 2^n\) is “ideal” in that it is very simple to reason about. ↩

tags: mathematics - software

Fast Fourier Transforms Part 1: Cooley-Tukey

The Discrete Fourier Transform

The Cooley-Tukey Algorithm

Cooley-Tukey Interactive Visualization

A Quick Rant on Word Choice

Sources & Related Reading