2 May 2026

Fast Fourier Transforms Part 3: Bluestein's Algorithm

by Connor Boyle

This post is part 3 in my ongoing series on fast Fourier transform algorithms. You may wish to read part 1 and part 2 before reading this article.

The Cooley-Tukey algorithm works very well when the input sequence’s length is a product of (mostly or entirely) small prime factors, which it can use to “break down” the problem. However, it provides little speedup once those small prime factors (if any exist) are exhausted. In particular, for prime input lengths, Cooley-Tukey provides no speedup at all, degrading to \(O(\lvert x \rvert^2)\) time complexity. In order to guarantee \(O(\lvert x \rvert \log \lvert x \rvert )\) time complexity regardless of what prime factors the length has, we will need another algorithm, such as Bluestein’s algorithm.

Bluestein’s Algorithm

Recall the definition of the discrete Fourier transform:

\[\mathcal{F} \{ x \}[k] = \sum_{j=0}^{|x| - 1} x[j] \cdot W_{|x|}^{jk}\]

Substitute the identity \(jk = \frac{j^2 + k^2 - (k-j)^2}{2}\):

\[= \sum_{j=0}^{|x| - 1} x[j] \cdot W_{|x|}^{\frac{j^2 + k^2 - (k-j)^2}{2}}\] \[= W_{|x|}^{\frac{k^2}{2}} \cdot \sum_{j=0}^{|x| - 1} x[j] \cdot W_{|x|}^{\frac{j^2}{2}} \cdot W_{|x|}^{\frac{-(k-j)^2}{2}}\]

At this point, the right-hand side should look very similar to a cyclic convolution, however it still needs a few more manipulations:

\[\mathcal{F} \{ x \}[k + 1 - |x|] = W_{|x|}^{\frac{(k + 1 - |x|)^2}{2}} \cdot \sum_{j=0}^{|x| - 1} x[j] \cdot W_{|x|}^{\frac{j^2}{2}} \cdot W_{|x|}^{\frac{-(k + 1 - |x| - j)^2}{2}}\]

Let there be sequences \(y\) and \(z\), both of some (yet-undecided) length \(N\) defined as follows:

\[y[j] = \left\{ \begin{array}{lll} x[j] \cdot W_{|x|}^{\frac{j^2}{2}} & \textrm{if} & 0 \leq j \leq |x| - 1 \\ 0 & \textrm{if} & |x| \leq j \leq N - 1 \\ \end{array} \right.\] \[z[j] = W_{|x|}^{\frac{-(j + 1 - |x|)^2}{2}}\] \[|y| = |z| = N\]

Substituting these sequences to our previous equation:

\[\mathcal{F} \{ x \}[k + 1 - |x|] = W_{|x|}^{\frac{(k + 1 - |x|)^2}{2}} \cdot \sum_{j=0}^{|x| - 1} y[j] \cdot z[k-j]\]

For the discrete Fourier transform \(\mathcal{F}\{ x \}\), the only valid indices are from 0 to \(\lvert x \rvert - 1\), therefore:

\[0 \leq k + 1 - |x| \leq |x| - 1\] \[|x| - 1 \leq k \leq 2|x| - 2\]

Inside the summation, we know that:

\[0 \leq j \leq |x| - 1\] \[1 - |x| \leq -j \leq 0\] \[0 \leq k - j \leq 2|x| - 2\]

If we choose an integer value for \(N\) such that \(2 \lvert x \rvert - 2 < N\), then \((k - j) \% N = k-j\):

\[\mathcal{F} \{ x \}[k + 1 - |x|] = W_{|x|}^{\frac{(k + 1 - |x|)^2}{2}} \cdot \sum_{j=0}^{|x| - 1} y[j] \cdot z[(k-j) \% N]\]

Since \(y[j] = 0\) for all \(\lvert x \rvert \leq j \leq N - 1\), we can extend the summation’s bounds without changing its value:

\[\mathcal{F} \{ x \}[k + 1 - |x|] = W_{|x|}^{\frac{(k + 1 - |x|)^2}{2}} \cdot \sum_{j=0}^{N - 1} y[j] \cdot z[(k-j) \% N]\] \[= W_{|x|}^{\frac{(k + 1 - |x|)^2}{2}} \cdot ( y \circledast z )[k]\] \[\mathcal{F} \{ x \}[k] = W_{|x|}^{\frac{k^2}{2}} \cdot ( y \circledast z )[k + |x| - 1]\]

Discrete Fourier Transform as Cyclic Convolution

We have thus far shown that the discrete Fourier transform of any sequence \(x\) can be calculated as the convolution of two sequences \(y\) and \(z\). \(y\) is a “zero-padded” copy of our input sequence, multiplied elementwise with a chirp signal, while \(z\) is itself a chirp signal. The advantage of this formulation lies in our freedom to choose the length of these sequences; their lengths \(N\) can be chosen to be any integer greater than \(2 \lvert x \rvert - 1\).

If we choose \(N\) to be the smallest power of 2 greater than or equal to \(2 \lvert x \rvert - 1\), then in the worst case scenario¹ where \(2 \lvert x \rvert - 1 = 2^p + 1\) (\(p \in \mathbb{N}\)), then \(N = 2^{p+1} = 4 \lvert x \rvert - 4\). If we apply the convolution theorem, here, we now see that the discrete Fourier transform \(\mathcal{F} \{ x \}\) can be calculated by taking the DFT’s of \(y\) and \(z\), then taking the inverse DFT of their elementwise product, and finally multiplying that result elementwise by another chirp signal:

\[\mathcal{F} \{ x \}[k] = W_{|x|}^{\frac{k^2}{2}} \cdot \mathcal{F}^{-1} \{ \mathcal{F} \{ y \} \odot \mathcal{F} \{ z \} \} [k + |x| - 1]\]

All of these operations take linear time complexity with respect to \(N\), except for the inverse and forward discrete Fourier transforms. Since these DFT’s are calculated on sequences whose length is a power of 2, we know they can be calculated in \(O(\lvert x \rvert \log \lvert x \rvert)\) time using the Cooley-Tukey algorithm.

How Fast is a Fast Fourier Transform?

The guaranteed linearithmic time complexity of Bluestein’s algorithm unfortunately does not mean that the performance is just as good as we would get for Cooley-Tukey applied to a sequence of highly-composite length. Bluestein’s algorithm inflates the problem size to at least \(N = 2 \lvert x \rvert - 1\), while also adding an inverse Fourier transform step.

The limitations of Bluestein’s algorithm are apparent when one uses any fast Fourier transform implementation that relies on it, such as NumPy’s. In a Google Colab notebook, I ran numpy.fft.fft() on random real-numbered sequences ranging in length from \(2^{20}\) to \(2^{20}+24\); below is a graph showing the length of time taken to compute the discrete Fourier transform of each of the sequences:

You may notice that a longer sequence can take a shorter time to compute;² for example, the DFT for the sequence of length 1,048,580 took 0.155 seconds to compute, while for length 1,048,579 the execution time was 0.496 seconds, more than 3 times as long. This is because the greater length, 1,048,580, is highly composite, having 6 prime factors: 2, 2, 5, 13, 37, and 109. Meanwhile, the lesser length, 1,048,579, has only 3 prime factors: 7, 163, and 919.³

The “Ugly” Truth

Rather than being an alternative to Cooley-Tukey, Bluestein’s algorithm is better described as an (apparently high-overhead) “compatibility layer” for Cooley-Tukey. Bluestein’s allows Cooley-Tukey to efficiently process awkwardly-sized inputs, but never remotely as quickly as highly-composite lengths (even those of somewhat greater size).

The fast Fourier transform makes perennial appearances on discussions in internet fora and attention-grabbing listicles as among the most “beautiful” algorithms. If “fast Fourier transform” refers only to Cooley-Tukey applied to inputs of highly-composite lengths, I’m tempted to concur; that algorithm is just complex enough to be both interesting and elegant. But if “fast Fourier transform” refers to a program that uses a combination of Cooley-Tukey and Bluestein’s algorithm to efficiently process inputs of any length, then I can’t be sure I agree. Bluestein’s algorithm is, in my opinion, both fascinating and at least a bit unsatisfying. The fact that we can calculate a Fourier transform as a convolution is mind-boggling. The fact that this convolution is calculated using two Fourier transforms at least twice the length of our input size is a little disappointing.

Smith, J.O., “Bluestein’s FFT Algorithm” in Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition, http://ccrma.stanford.edu/~jos/mdft/, online book, 2007 edition, accessed 2026-05-01.
L. Bluestein, “A linear filtering approach to the computation of discrete Fourier transform,” in IEEE Transactions on Audio and Electroacoustics, vol. 18, no. 4, pp. 451-455, December 1970, doi: 10.1109/TAU.1970.1162132.

Footnotes:

Assuming that \(\lvert x \rvert > 1\) ↩
Thanks to some quite unrigorous experimentation, I’ve noticed that (within this particular range) the logarithm of the sum of the prime factors of the input length is a fairly good predictor of the running time of the FFT algorithm. You may show this value on the graph by clicking on the legend where it says “log sum”. ↩
There are other factors that will be specific to NumPy’s (or rather its dependency, PocketFFT’s) implementation. I haven’t taken the time to research PocketFFT’s implementation, but I’ll speculate that its implementation of Cooley-Tukey only “breaks down” the original DFT problem using small primes (such as 2, 3, 5, and 7), and then once no small primes remain, applies Bluestein’s algorithm or a naive DFT to the divided input sequences. Therefore, the final running time is likely a somewhat difficult-to-predict result of the size of the prime factors and the proximity from the “divided length” to the nearest highly-composite number that is at least twice as large as it. ↩

tags: mathematics - software

Connor Boyle

Fast Fourier Transforms Part 3: Bluestein's Algorithm

Bluestein’s Algorithm

Discrete Fourier Transform as Cyclic Convolution

How Fast is a Fast Fourier Transform?

The “Ugly” Truth

Sources & Related Reading