%-*- mode: Latex; abbrev-mode: true; auto-fill-function: do-auto-fill -*-

%include lhs2TeX.fmt
%include myFormat.fmt

\out{
\begin{code}
-- This code was automatically generated by lhs2tex --code, from the file 
-- HSoM/SpectrumAnalysis.lhs.  (See HSoM/MakeCode.bat.)

\end{code}
}

\chapter{Spectrum Analysis}
\label{ch:spectrum-analysis}

\begin{code}
{-# LANGUAGE Arrows #-}

module Euterpea.Music.Signal.SpectrumAnalysis where

import Euterpea
import Euterpea.Experimental (fftA)

import Data.Complex (Complex ((:+)), polar)
import Data.Maybe (listToMaybe, catMaybes)

\end{code}

There are many situations where it is desirable to take an existing
sound signal---in particular one that is recorded by a
microphone---and analyze it for its spectral content.  If one can do
this effectively, it is then possible (at least in theory) to recreate
the original sound, or to create novel variations of it.  The thepry
behind this approach is based on \emph{Fourier's Theorem}, which
states that any periodic signal can be decomposed into a weighted sum
of (a potentially infinite number of) sine waves.  In this chapter we
discuss the theory as well as the pragmatics for doing spectrum
analysis in Euterpea.

\section{Fourier's Theorem}
\label{sec:fouriers-theorem}

A \emph{periodic signal} is a signal that repeats itself infinitely
often.  Mathematically, a signal $x$ is periodic if there exists a
real number $T$ such that for all integers $n$:
\[ x(t) = x(t + nT) \]
%% \[ (\exists\ T\! \in\! \mathbb{R}) (\forall n\! \in\! \mathbb{Z}): x(t) = x(t + nT) \]
%% where $\mathbb{R}$ is the set of real numbers and $\mathbb{Z}$ is the
%% set of integers.  
$T$ is called the \emph{period}, which may be just a few microseconds,
a few seconds, or perhaps days---the only thing that matters is that
the signal repeats itself.  Usually we want to find the smallest value
of $T$ that satisfies the above property.  For example, a sine wave is
surely periodic; indeed, recall from Section \ref{sec:frequency} that:
\[ \sin (2\pi k + \theta) = \sin \theta \]
for any integer $k$.  In this case, $T = 2\pi$, and it is the smallest
value that satisfies this property.

But in what sense is, for example, a single musical note periodic?
Indeed it is not, unless it is repeated infinitely often, which would
not be very interesting musically.  Yet something we would like to
know is the spectral content of that single note, or even of a small
portion of that note, within an entire composition.  This is one of
the practical problems that we will address later in the chapter.

%% In the case of audio, since humans cannot hear sound lower than
%% about 20 Hz, we need only concern ourselves with periodic signals
%% whose repetition period is less than 50 milliseconds
%% (\nicefrac{1}{20} of a second).

Recall from Section \ref{sec:frequency} that a sine wave can be
represented by: $x(t) = A\sin(\omega t + \phi)$, where $A$ is the
amplitude, $\omega$ is the radian frequency, and $\phi$ is the phase
angle.  Joseph Fourier, a french mathematician and physicist, showed
the following result.  Any periodic signal $x(t)$ with period $T$ can
be represented as:
\begin{equation}\label{eq:fourier-series}
x(t) = C_0 + \sum_{n=1}^{\infty} C_n \cos(\omega_0 nt + \phi_n)
\end{equation}
This is called \emph{Fourier's Theorem}.  $\omega_0 =
\nicefrac{2\pi}{T}$ is called the \emph{fundamental frequency}.  Note
that the frequency of each cosine wave in the series is an integer
multiple of the fundamental frequency.  The above equation is also
called the \emph{Fourier series} or \emph{harmonic series} (related,
but not to be confused with, the mathematical definition of harmonic
series, which has the precise form $1 + \nicefrac{1}{2} +
\nicefrac{1}{3} + \nicefrac{1}{4} + \cdots$).

The trick, of course, is determining what the coefficients $C_0$,
$...$, $C_n$ and phase angles $\phi_1$, $...$, $\phi_n$ are.
Determining the above equation for a particular periodic signal is
called \emph{Fourier analysis}, and synthesizing a sound based on the
above equation is called \emph{Fourier synthesis}.  Theoretically, at
least, we should be able to use Fourier analysis to decompose a sound
of interest into its composite sine waves, and then regenerate it by
artificially generating those composite sine waves and adding them
together (i.e.\ additive synthesis, to be described in
Chapter~\ref{ch:additive}).  Of course, we also have to deal with the
fact that the representation may involve an \emph{infinite} number of
composite signals.

As discussed somewhat in Chapter~\ref{ch:signals}, many naturally
occurring vibrations in nature---including the resonances of most
musical instruments---are characterized as having a fundamental
frequency (the perceived pitch) and some combination of multiples of
that frequency, which are often called \emph{harmonics},
\emph{overtones} or \emph{partials}.  So Fourier's Theorem seems to be
a good match for this musical application.

\subsection{The Fourier Transform}

When studying Fourier analysis, it is more convenient, mathematically,
to use \emph{complex exponentials}.  We can relate working with
complex exponentials back to sines and cosines using \emph{Euler's
  Formula}:
\[\begin{array}{lcl}
e^{j\theta}   &=& cos(\theta) + j sin(\theta) \\[.05in]
cos(\theta) &=& \dfrac{1}{2} (e^{j\theta} + e^{-j\theta}) \\[.1in]
sin(\theta) &=& \dfrac{1}{2} (e^{j\theta} - e^{-j\theta})
\end{array}\]

For a periodic signal $x(t)$, which we consider to be a
function of time, we denote its \emph{Fourier transform} by
$\hat{x}(f)$, which is a function of frequency.  Each point in
$\hat{x}$ is a complex number that represents the magnitude and phase
of the frequency $f$'s presence in $x(t)$.  Using complex
exponentials, the formula for $\hat{x}(f)$ in terms of $x(t)$ is: 
\[ \hat{x}(f) = \int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt \] 
where $\omega = 2\pi f$, and $j$ is the same as the imaginary unit $i$
used in mathematics.\footnote{Historically, engineers prefer to use
  the symbol $j$ rather than $i$, because $i$ is generally used to
  represent current in an electrical circuit.}  Intuitively, the
Fourier transform at a particular frequency $f$ is the integral of the
product of the original signal and a pure sinusiodal wave $e^{-j\omega
  t}$.  This latter process is related to the \emph{convolution} of
the two signals, and intuitively will be non-zero only when the signal
has some content of that pure signal in it.

The above equation describes $\hat{x}$ in terms of $x$.  We can also
go the other way around---defining $x$ in terms of $\hat{x}$:
\[ x(t) = \int_{-\infty}^{\infty} \hat{x}(f) e^{j\hat{\omega} f} df \]
where $\hat{\omega} = 2\pi t$.  This is called the \emph{inverse}
Fourier transform.

%% -2 pi i x e  =>  -2 pi j t f  =>  -j w t
%%  2 pi i x e  =>   2 pi j t f  =>  -j what f

If we expand the definitions of $\omega$ and $\hat{\omega}$ we can see
  how similar these two equations are:
%% \[\begin{array}{lcl}
\begin{equation}\label{eq:ft}
\hat{x}(f) = \int_{-\infty}^{\infty} x(t)       e^{-j2\pi f t} dt
\end{equation}
\begin{equation}\label{eq:ift}
   x(t)    = \int_{-\infty}^{\infty} \hat{x}(f) e^{ j2\pi f t} df
\end{equation}
%% \end{array}\]
These two equations, for the Fourier transform and its inverse, are
remarkable in their simplicity and power.  They are also remarkable in
the following sense: \emph{no information is lost when converting from
  one to the other}.  In other words, a signal can be represented in
terms of its time-varying behavior or its spectral content---they are
equivalent!

A function that has the property that $f(x) = f(-x)$ is called an
\emph{even} function; if $f(x) = - f(-x)$ it is said to be \emph{odd}.
It turns out that, perhaps surprisingly, \emph{any} function can be
expressed as the sum of a single even function and a single odd
function.  This may help provide some intuition about the equations
for the Fourier transform, because the complex exponential 
$e^{j2\pi f t}$ separates the waveform by which it is being multiplied
into its even and odd parts (recall Euler's formula).  The real
(cosine) part affects only the even part of the input, and the
imaginary (sine) part affects only the odd part of the input.

\subsection{Examples}

Let's consider some examples, which are illustrated in
Figure~\ref{fig:fourier-transforms}:
\begin{itemize}
\item
Intuitively, the Fourier transform of a pure cosine wave should be an
impulse function---that is, the spectral content of a cosine wave
should be concentrated completely at the frequency of the cosine wave.
The only catch is that, when working in the complex domain, the
Fourier transform also yields the mirror image of the spectral
content, at a frequency that is the negation of the cosine wave's
frequency, as shown in Figure~\ref{fig:fourier-transforms}a.  In other
words, in this case, $\hat{x}(f) = \hat{x}(-f)$, i.e.\ $\hat{x}$ is
even.  So the spectral content is the \emph{real} part of the complex
number returned from the Fourier transform (recall Euler's formula).

\item
In the case of a pure sine wave, we should expect a similar result.
The only catch now is that the spectral content is contained in the
\emph{imaginary} part of the complex number returned from the Fourier
transform (recall Euler's formula), and the mirror image is negated.
That is, $\hat{x}(f) = - \hat{x}(-f)$, i.e.\ $\hat{x}$ is odd.  This
is illustrated in Figure~\ref{fig:fourier-transforms}b.

\item
Conversely, consider what the spectral content of an impulse function
should be.  Because an impulse function is infinitely ``sharp,'' it
would seem that its spectrum should contain energy at every point in
the frequency domain.  Indeed, the Fourier transform of an impulse
function centered at zero is a constant, as shown in
Figure~\ref{fig:fourier-transforms}c.

\item
Consider now the spectral content of a square wave.  It can be shown
that the Fourier series representation of a square wave is the sum of
the square wave's fundamental frequency plus its harmonically
decreasing (in magnitude) odd harmonics.  Specifically:
\begin{equation}\label{eq:square-wave-series}
sq(t) = \sum_{k=1}^{\infty} \frac{1}{k} \sin k\omega t,\quad {\rm for\ odd\ } k
\end{equation}
The spectral content of this signal in shown in
Figure~\ref{fig:fourier-transforms}d.
Figure~\ref{fig:square-wave-series} also shows partial reconstruction
of the square wave from a finite number of its composite signals.
\end{itemize}

It is worth noting that the diagrams in
Figure~\ref{fig:fourier-transforms} make no assumptions about time or
frequency.  Therefore, because the Fourier transform and its inverse
are true mathematical inverses, we can read the diagrams as time
domain / frequency domain pairs, or the other way around; i.e.\ as
frequency domain / time domain pairs.  For example, interpreting the
diagram on the left of Figure~\ref{fig:fourier-transforms}a in the
frequency domain, is to say that it is the Fourier transform of the
signal on the right (interpreted in the time domain).

\begin{figure}[hbtp]
\centering
\includegraphics[height=2.4in,angle=270]{pics/cosine1.eps}
\includegraphics[height=2.4in,angle=270]{pics/cosine2.eps}
\begin{center}
(a) Cosine wave
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/sine1.eps} 
\includegraphics[height=2.4in,angle=270]{pics/sine2.eps} 
\begin{center}
(b) Sine wave
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/pulse1.eps}
\includegraphics[height=2.4in,angle=270]{pics/pulse2.eps}
\begin{center}
(c) Impulse function
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/square1.eps} 
\includegraphics[height=2.4in,angle=270]{pics/square2.eps} 
\begin{center}
(d) Square wave 
\end{center}
\caption{Examples of Fourier Transforms}
\label{fig:fourier-transforms}
\end{figure}

\begin{figure}[hbtp]
\centering
\includegraphics[height=2.4in,angle=270]{pics/19_2a.eps}
\begin{center}
(a) Sine wave
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/19_2b.eps} 
\begin{center}
(b) Sine wave + third harmonic
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/19_2c.eps} 
\begin{center}
(c) Sine wave + third and fifth harmonics
\end{center}
\includegraphics[height=2.4in,angle=270]{pics/19_2d.eps} 
\begin{center}
(d) Sum of first eight terms of the Fourier series of a square wave
\end{center}
\caption{Generating a Square Wave from Odd Harmonics}
\label{fig:square-wave-series}
\end{figure}

\section{The Discrete Fourier Transform}
\label{sec:DFT}

Recall from Section \ref{sec:discrete} that we can move from the
continuous signal domain to the discrete domain by replacing the time
$t$ with the quantity $\nicefrac{n}{r}$, where $n$ is the integer
index into the sequence of discrete samples, and $r$ is the sampling
rate.  Let us assume that we have done this for $x$, and we will use
square brackets to denote the difference.  That is, $x[n]$ denotes the
$n^{\rm th}$ sample of the continuous signal $x(t)$, corresponding to
the value $x(\nicefrac{n}{r})$.

We would now like to compute the \emph{Discrete Fourier Transform}
(DFT) of our discrete signal.  But instead of being concerned about
the sampling rate (which can introduce aliasing, for example), our
concern turns to the \emph{number of samples} that we use in computing
the DFT---let's call this $N$.  Intuitively, the integrals used in our
equations for the Fourier transform and its inverse should become sums
over the range $0 ... N-1$.  This leads to a reformulation of our two
equations (\ref{eq:ft} and \ref{eq:ift}) as follows:\footnote{The
  purpose of the factor $\nicefrac{1}{N}$ in Equation~\ref{eq:dft} is
  to ensure that the DFT and the inverse DFT are in fact inverses of
  each other.  But it is just by convention that one equation has this
  factor and the other does not---it would be sufficient if it were
  done the other way around.  In fact, all that matters is that the
  product of the two coefficients be $\nicefrac{1}{N}$, and thus it
  would also be sufficient for each equation to have the same
  coefficient, namely $\nicefrac{1}{\sqrt{N}}$.  Similarly, the
  negative exponent in one equation and positive in the other is also
  by convention---it would be sufficient to do it the other way
  around.}
\begin{equation}\label{eq:dft}
\hat{x}[k] = \frac{1}{N}\sum_{n=0}^{N-1} 
               x[n] e^{-j\frac{2\pi k n}{N}},\ \ k = 0, 1, ..., N-1
\end{equation}
%% Analogously, the equation for the inverse DFT is:
\begin{equation}\label{eq:idft}
x[n] = \sum_{k=0}^{N-1} 
          \hat{x}[k] e^{j\frac{2\pi k n}{N}},\ \ n = 0, 1, ..., N-1
\end{equation}

Despite all of the mathematics up to this point, the reader may now
realize that the discrete Fourier transform as expressed above is
amenable to implementation---for example it should not be difficult to
write Haskell functions that realize each of the above equations.  But
before addressing implementation issues, let's discuss a bit more what
the results actually \emph{mean}.

\subsection{Interpreting the Frequency Spectrum}
\label{sec:freq-spectrum}

Just as $x[n]$ represents a sampled version of the continuous input
signal, $\hat{x}[k]$ represents a sampled version of the continous
frequency spectrum.  Care must be taken when interpreting either of
these results, keeping in mind the Nyquist-Shannon Sampling Theorem
(recall Section~\ref{sec:digital-audio}) and aliasing
(Section~\ref{sec:aliasing}).

Also recall that the result of a Fourier transform of a periodic
signal is a Fourier series (see Section~\ref{sec:fouriers-theorem}),
in which the signal being analyzed is expressed as multiples of a
fundamental frequency.  In equation \ref{eq:dft} above, that
fundamental frequency is the inverse of the duration of the $N$
samples, i.e.\ the inverse of $\nicefrac{N}{r}$, or $\nicefrac{r}{N}$.
For example, if the sampling rate is 44.1 kHz (the CD standard), then:
\begin{itemize}
%% \item If we take $N=44$ samples, then the fundamental frequency will
%%  be (about) $r/N = 1000$ Hz.
\item If we take $N=441$ samples, then the fundamental frequency will
  be $\nicefrac{r}{N} = 100$ Hz.
\item If we take $N=4410$ samples, then the fundamental frequency will
  be $\nicefrac{r}{N} = 10$ Hz.
\item If we take $N=44100$ samples, then the fundamental frequency will
  be $\nicefrac{r}{N} = 1$ Hz.
\end{itemize}
Thus, as would be expected, taking more samples yields a \emph{finer}
resolution of the frequency spectrum.  On the other hand, note that if
we increase the sampling rate and keep the number of samples fixed, we
get a \emph{coarser} resolution of the spectrum---this also should be
expected, because if we increase the sampling rate we would expect to
have to look at more samples to get the same accuracy.

Analogous to the Nyquist-Shannon Sampling Theorem, the representable
points in the resulting frequency spectrum lie in the range
$\pm\nicefrac{r}{2}$, i.e.\ between plus and minus one-half of the
sampling rate.  For the above three cases, respectively, that means
the points are:
\begin{itemize}
%% \item -22 kHz, -21 kHz, ..., -1 kHz,   0, 1 kHz,   ..., 21 kHz, 22kHz
%% \item 
%% -22.05 kHz, -21.95 kHz, ..., -0.05 kHz, 0.05 kHz, ..., 21.95 kHz, 22.05 kHz
\item 
  -22.0 kHz, -21.9 kHz, ..., -0.1 kHz, 0, 0.1 kHz, ..., 21.9 kHz, 22.0 kHz
\item
  -22.05 kHz, -22.04 kHz,  ..., -10 Hz, 0, 10 Hz,   ..., 22.04 kHz, 22.05 kHz
\item 
  -22.05 kHz, -22.049 kHz, ..., -1 Hz, 0, 1 Hz,    ..., 22.049 kHz, 22.05 kHz
\end{itemize}
For practical purposes, the first of these is usually too coarse, the
third is too fine, and the middle one is useful for many applications.

Note that the first range of frequencies above does not quite cover
the range $\pm\nicefrac{r}{2}$.  But remember that this is a discrete
representation of the actual frequency spectrum, and the proper
interpretation would include the frequences $+\nicefrac{r}{2}$ and
$-\nicefrac{r}{2}$.

Also note that there are $N+1$ points in each of the above ranges, not
$N$.  Indeed, the more general question is, how do these points in the
frequency spectrum correspond to the indices $i = 0, 1, ..., N-1$ in
$\hat{x}[i]$?  If we denote each of these frequencies as $f$, the
answer is that:
\begin{equation}\label{eq:f1}
f = \frac{ir}{N},\ \ \ \ i = 0, 1, ..., N-1
\end{equation}
But note that this range of frequencies extends from $0$ to
$(N-1)(\nicefrac{r}{N})$, which exceeds the Nyquist-Shannon sampling
limit of $\nicefrac{r}{2}$.  The way out of this dilemma is to realize
that the DFT assumes that the input signal is periodic in time, and
therefore the DFT is periodic in frequency.  In other words, values of
$f$ for indices $i$ greater than $\nicefrac{N}{2}$ can be interpreted
as frequencies that are the \emph{negation} of the frequency given by
the formula above.  Assuming even $N$, we can revise formula
\ref{eq:f1} as follows:
\begin{equation}\label{eq:f2}
f = \left\{
      \begin{array}{ll}
        i\dfrac{r}{N},    & \quad i = 0, 1, ..., \dfrac{N}{2}    \\[0.1in]
        (i-N)\dfrac{r}{N} & \quad i = \dfrac{N}{2}, 
                                        \dfrac{N}{2}+1, ..., N-1 \\
      \end{array} \right.
\end{equation}
Note that when $i = \nicefrac{N}{2}$, both equations apply, yielding
$f = \nicefrac{r}{2}$ in the first case, and $f = -\nicefrac{r}{2}$ in
the second.  Indeed, the magnitude of the DFT for each of these
frequencies is the same (see discussion in the next section),
reflecting the periodicity of the DFT, and thus is simply a form of
redundancy.

%% \begin{itemize}
%% \item
%%   $\hat{x}[i],\ i = 0, 1, 2, ..., \frac{N}{2}$ correspond to the
%%   frequencies $0, \frac{r}{N}, \frac{2r}{N}, ..., \frac{r}{2}$.
%% \item
%%   $\hat{x}[i],\ i = \frac{N}{2}, \frac{N}{2}+1, \frac{N}{2}+2, ..., N-1$ 
%%   correspond to the frequencies 
%%   $-\frac{r}{2}, -\frac{r}{2}+\frac{r}{N}, -\frac{r}{2}+\frac{2r}{N}, 
%%   ..., -\frac{r}{N}$.
%% \end{itemize} 
%% ... This can be viewed s a kind of aliasing in the frequency domain.

The above discussion has assumed a periodic signal whose fundamental
frequency is known, thus allowing us to parameterize the DFT with the
same fundamental frequency.  In practice this rarely happens.  That
is, the fundamental frequency of the DFT typically has no integral
relationship to the period of the periodic signal.  This raises the
question, what happens to the frequencies that ``fall in the gaps''
between the frequencies discussed above?  The answer is that the
energy of that frequency component will be distributed amongst
neighboring points in a way that makes sense mathematically, although
the result may look a little funny compared to the ideal result (where
every frequency component is an integer multiple of the fundamental).
The important thing to remember is that these are digital
representations of the exact spectra, just as a digitized signal is
representative of an exact signal.  Two digitized signals can look
very different (depending on sample rate, phase angle, and so on), yet
represent the same underlying signal---the same is true of a digitized
spectrum.

In practice, for reasons of computational efficiency, $N$ is usually
chosen to be a power of two.  We will return to this issue when we
discuss implementing the DFT.

\subsection{Amplitude and Power of Spectrum}
\label{sec:amp-spectrum}

We discussed above how each sample in the result of a DFT relates to a
point in the frequency spectrum of the input signal.  But how do we
determine the amplitude and phase angle of each of those frequency
components?  In general each sample in the result of a DFT is a
complex number, thus having both a real and imaginary part, of the
form $a + jb$.  We can visualize this number as a point in the complex
Cartesian plane, where the abscissa (x-axis) represents the real part,
and the ordinate (y-axis) represents the imaginary part, as shown in
Figure~\ref{fig:complex-plane}.  It is easy to see that the line from
the origin to the point of interest is a vector $A$, whose length is
the \emph{amplitude} of the frequency component in the spectrum:
\begin{equation}\label{eq:amplitude}
A = \sqrt{a^2 + b^2}
\end{equation}
The angle $\theta$ is the \emph{phase}, and it is easily defined from
the figure as:
\begin{equation}\label{eq:phase}
\theta = tan^{-1} \frac{b}{a}
\end{equation}
(This amplitude / phase pair is often called the \emph{polar}
representation of a complex number.)

\begin{figure}[hbtp]
\centering
\includegraphics[height=2.4in]{pics/ComplexToPolar.eps} % angle=270
\caption{Complex and Polar Coordinates}
\label{fig:complex-plane}
\end{figure}

Recall from Section~\ref{sec:amplitude} that power is proportional to
the square of the amplitude.  Since taking a square root adds
computational expense, the square root is often omitted from
Equation~\ref{eq:amplitude}, thus yielding a \emph{power spectrum}
instead of an \emph{amplitude spectrum}.

One subtle aspect of the resulting DFT is how to interpret
\emph{negative} frequencies.  In the case of having an input whose
samples are all real numbers (i.e.\ there are no imaginary
components), which is true for audio applications, the negative
spectrum is a mirror image of the positive spectrum, and the
amplitude/power is distributed evenly between the two.

\subsection{A Haskell Implementation of the DFT}
\label{sec:haskell-dft}

From equation \ref{eq:dft}, which defines the DFT mathematically, we
can write a Haskell program that implements the DFT.  

The first thing we need to do is understand how complex numbers are
handled in Haskell.  They are captured in the |Complex| library, which
must be imported into any program that uses them.  The type |Complex
T| is the type of complex numbers whose underlying numeric type is
|T|.  We will use, for example, |Complex Double| for testing our DFT.
A complex number $a + jb$ is represented in Haskell as |a :+ b|, and
since |(:+)| is a constructor, such values can be pattern matched.

\syn{Complex numbers in Haskell are captured in the |Complex| library,
  in which complex numbers are defined as a polymorphic data type:
\begin{spec}
infix  6  :+
data (RealFloat a) => Complex a = !a :+ !a
\end{spec}
The ``|!|'' in front of the type variables declares that the
constructor |(:+)| is strict in its arguments.  For example, the
complex number $a + jb$ is represented by |a :+ b| in Haskell.  One
can pattern match on complex number values to extract the real and
imaginary parts, or use one of the predefined selectors defined in the
|Complex| library:
\begin{spec}
realPart, imagPart  :: RealFloat a => Complex a -> a
\end{spec}
The |Complex| library also defines the following functions:
\begin{spec}
conjugate           :: RealFloat a => Complex a -> Complex a
mkPolar             :: RealFloat a => a -> a -> Complex a
cis                 :: RealFloat a => a -> Complex a
polar               :: RealFloat a => Complex a -> (a,a)
magnitude, phase    :: RealFloat a => Complex a -> a
\end{spec}
The library also declares instances of |Complex| for the type classes
|Num|, |Fractional|, and |Floating|.
}

Although not as efficient as arrays, for simplicity we choose to use
lists to represent the vectors that are the input and output of the
DFT.  Thus if |xs| is the list that represents the signal $x$, then
|xs!!n| is the |n+1|$^{th}$ sample of that signal, and is equivalent
to $x[n]$.  Furthermore, using list comprehensions, we can make the
Haskell code look very much like the mathematical definition captured
in Equation \ref{eq:dft}.  Finally, we adopt the convention that the
length of the input signal is the number of samples that we will use
for the DFT.

Probably the trickiest part of writing a Haskell program for the DFT
is dealing with the types!  In particular, if you look closely at
Equation \ref{eq:dft} you will see that $N$ is used in three different
ways---as an integer (for indexing), as a real number (in the exponent
of $e$), and as a complex number (in the expression
$\nicefrac{1}{N}$).

Here is a Haskell program that implements the DFT:
\begin{code}
dft :: RealFloat a => [Complex a] -> [Complex a]
dft xs = 
  let  lenI = length xs
       lenR = fromIntegral lenI
       lenC = lenR :+ 0
  in [  let i = -2 * pi * fromIntegral k / lenR
        in (1/lenC) * sum [  (xs!!n) * exp (0 :+ i * fromIntegral n)
                             | n <- [0,1..lenI-1] ]
        | k <- [0,1..lenI-1] ]
\end{code}
Note that |lenI|, |lenR|, and |lenC| are the integer, real, and
complex versions, respectively, of $N$.  Otherwise the code is fairly
straightforward---note in particular how list comprehensions are used
to implement the ranges of $n$ and $k$ in Equation \ref{eq:dft}.

To test our program, let's first create a couple of waveforms.  For
example, recall that Equation \ref{eq:square-wave-series} defines the
Fourier series for a square wave.  We can implement the first, first
two, and first three terms of this series, corresponding respectively
to Figures~\ref{fig:square-wave-series}a,
\ref{fig:square-wave-series}b, and \ref{fig:square-wave-series}c, by
the following Haskell code:
\begin{code}
mkTerm :: Int -> Double -> [Complex Double]
mkTerm num n = let f = 2 * pi / fromIntegral num
               in [  sin (n * f * fromIntegral i) / n :+ 0
                     | i <- [0,1..num-1] ]

mkxa, mkxb, mkxc :: Int-> [Complex Double]
mkxa num = mkTerm num 1
mkxb num = zipWith (+) (mkxa num) (mkTerm num 3)
mkxc num = zipWith (+) (mkxb num) (mkTerm num 5)
\end{code}
Thus |mkTerm num n| is the |n|$^{th}$ term in the series, using |num|
samples.

Using the helper function |printComplexL| defined in
Figure~\ref{fig:pp-code}, which ``pretty prints'' a list of complex
numbers, we can look at the result of our DFT in a more readable
form.\footnote{``Pretty-printing'' real numbers is a subtle task.  The
  code in Figure~\ref{fig:pp-code} rounds the number to 10 decimal
  places of accuracy, and inserts spaces before and after to line up
  the decimal points and give a consistent string length.  The
  fractional part is not padded with zeros, since that would give a
  false impression of its accuracy.  (It is not necessary to
  understand this code in order to understand the concepts in this
  chapter.)}

\begin{figure}
\begin{code}
printComplexL :: [Complex Double] -> IO ()
printComplexL xs  =
  let  f (i,rl:+im) = 
            do  putStr (spaces (3 - length (show i))  )
                putStr (show i       ++ ":  ("        )
                putStr (niceNum rl  ++ ", "           )
                putStr (niceNum im  ++ ")\n"          )
  in mapM_ f (zip [0..length xs - 1] xs)

niceNum :: Double -> String
niceNum d =
  let  d' = fromIntegral (round (1e10 * d)) / 1e10
       (dec, fra)  = break (== '.') (show d')
       (fra',exp)  = break (== 'e') fra
  in  spaces (3  - length dec) ++ dec ++ take 11 fra'
      ++ exp ++ spaces (12 - length fra' - length exp)

spaces :: Int -> String
spaces  n = take n (repeat ' ')
\end{code}
\caption{Helper Code for Pretty-Printing DFT Results}
\label{fig:pp-code}
\end{figure}

For example, suppose we want to take the DFT of a 16-sample
representation of the first three terms of the square wave series.
Typing the following at the GHCi prompt:
\begin{spec}
printComplexL (dft (mkxc 16))
\end{spec}
will yield the result of the DFT, pretty-printing each number as a
pair, along with its index:
{\small \begin{verbatim}
  0:  (  0.0          ,   0.0          )
  1:  (  0.0          ,  -0.5          )
  2:  (  0.0          ,   0.0          )
  3:  (  0.0          ,  -0.1666666667 )
  4:  (  0.0          ,   0.0          )
  5:  (  0.0          ,  -0.1          )
  6:  (  0.0          ,   0.0          )
  7:  (  0.0          ,   0.0          )
  8:  (  0.0          ,   0.0          )
  9:  (  0.0          ,   0.0          )
 10:  (  0.0          ,   0.0          )
 11:  (  0.0          ,   0.1          )
 12:  (  0.0          ,   0.0          )
 13:  (  0.0          ,   0.1666666667 )
 14:  (  0.0          ,   0.0          )
 15:  (  0.0          ,   0.5          )
\end{verbatim} }

Let's study this result more closely.  For sake of argument, assume a
sample rate of 1.6 KHz.  Then by construction using |mkxc|, our
square-wave input's fundamental frequency is 100 Hz.  Similarly,
recall that the resolution of the DFT is |r/N|, which is also 100 Hz.

Now compare the overall result to Figure
\ref{fig:fourier-transforms}b.  Recalling also Equation \ref{eq:f2},
we note that the above DFT results are non-zero precisely at 100, 300,
500, -500, -300, and -100 Hz.  This is just what we would expect.
Furthermore, the amplitudes are one-half of the corresponding
harmonically decreasing weights dictated by Equation
\ref{eq:square-wave-series}, namely the values 1, $\nicefrac{1}{6}$,
and $\nicefrac{1}{10}$ (recall the discussion in Section
\ref{sec:amp-spectrum}).

Let's do another example.  We can create an impulse function as
follows:
\begin{code}
mkPulse :: Int -> [Complex Double]
mkPulse n = 100 : take (n-1) (repeat 0)
\end{code}
and print its DFT with the command:
\begin{spec}
printComplexL (dft (mkPulse 16))
\end{spec}
whose effect is:
{\small \begin{verbatim}
 0:  (  6.25         ,   0.0          )
 1:  (  6.25         ,   0.0          )
 2:  (  6.25         ,   0.0          )
 3:  (  6.25         ,   0.0          )
 4:  (  6.25         ,   0.0          )
 5:  (  6.25         ,   0.0          )
 6:  (  6.25         ,   0.0          )
 7:  (  6.25         ,   0.0          )
 8:  (  6.25         ,   0.0          )
 9:  (  6.25         ,   0.0          )
10:  (  6.25         ,   0.0          )
11:  (  6.25         ,   0.0          )
12:  (  6.25         ,   0.0          )
13:  (  6.25         ,   0.0          )
14:  (  6.25         ,   0.0          )
15:  (  6.25         ,   0.0          )
\end{verbatim}}
Compare this to Figure \ref{fig:fourier-transforms}c, and note how the
original magnitude of the impulse (100) is distributed evenly among
the 16 points in the DFT ($100/16 = 6.25$).

So far we have considered only input signals whose frequency
components are integral multiples of the DFT's resolution.  This
rarely happens in practice, however, because music is simply too
complex, and noisy.  As mentioned in \ref{sec:freq-spectrum}, the
energy of the signals that ``fall in the gaps'' is distributed among
neighboring points, although not in as simple a way as you might
think.  To get some perspective on this, let's do one other example.
We define a function to generate a signal whose frequeny is $\pi$
times the fundamental frequency:
%% We will modify the function |mkTerm| so that it creates a signal
%% 42\% higher than it used to be:
\begin{code}
x1 num = let f = pi * 2 * pi / fromIntegral num
         in map (:+ 0) [  sin (f * fromIntegral i)
                          | i <- [0,1..num-1] ]
\end{code}
$\pi$ is an irrational number, but any number that ``falls in the
gaps'' between indices would do.  We can see the result by typing the
command:
\begin{spec}
printComplexL (dft x1)
\end{spec}
which yields:
{\small \begin{verbatim}
  0:  ( -7.9582433e-3 ,   0.0          )
  1:  ( -5.8639942e-3 ,  -1.56630897e-2)
  2:  (  4.7412105e-3 ,  -4.56112124e-2)
  3:  (  0.1860052232 ,  -0.4318552865 )
  4:  ( -5.72962095e-2,   7.33993364e-2)
  5:  ( -3.95845728e-2,   3.14378088e-2)
  6:  ( -3.47994673e-2,   1.65400768e-2)
  7:  ( -3.29813518e-2,   7.4048103e-3 )
  8:  ( -3.24834325e-2,   0.0          )
  9:  ( -3.29813518e-2,  -7.4048103e-3 )
 10:  ( -3.47994673e-2,  -1.65400768e-2)
 11:  ( -3.95845728e-2,  -3.14378088e-2)
 12:  ( -5.72962095e-2,  -7.33993364e-2)
 13:  (  0.1860052232 ,   0.4318552865 )
 14:  (  4.7412105e-3 ,   4.56112124e-2)
 15:  ( -5.8639942e-3 ,   1.56630897e-2)
\end{verbatim}}
This is much more complicated than the previous examples!  Not only do
the points in the spectrum seem to have varying amounts of energy,
they also have both non-zero real and non-zero imaginary components,
meaning that the magnitude and phase vary at each point.  We can
define a function that converts a list of complex numbers into a list
of their polar representations as follows:
\begin{code}
mkPolars :: [Complex Double] -> [Complex Double]
mkPolars = map ((\(m,p)-> m:+p) . polar)
\end{code}
which we can then use to reprint our result:
\begin{spec}
printComplexL (mkPolars (dft x1))
\end{spec}
{\small \begin{verbatim}
  0:  (  7.9582433e-3 ,   3.1415926536 )
  1:  (  1.67247961e-2,  -1.9290259418 )
  2:  (  4.58569709e-2,  -1.4672199604 )
  3:  (  0.470209455  ,  -1.1640975898 )
  4:  (  9.31145435e-2,   2.2336013741 )
  5:  (  5.05497204e-2,   2.4704023271 )
  6:  (  3.85302097e-2,   2.6979021519 )
  7:  (  3.38023784e-2,   2.9207398294 )
  8:  (  3.24834325e-2,  -3.1415926536 )
  9:  (  3.38023784e-2,  -2.9207398294 )
 10:  (  3.85302097e-2,  -2.6979021519 )
 11:  (  5.05497204e-2,  -2.4704023271 )
 12:  (  9.31145435e-2,  -2.2336013741 )
 13:  (  0.470209455  ,   1.1640975898 )
 14:  (  4.58569709e-2,   1.4672199604 )
 15:  (  1.67247961e-2,   1.9290259418 )
\end{verbatim} }
If we focus on the magnitude (the first column), we can see that there
is a peak near index 3 (corresponding roughly to the frequency $\pi$),
with small amounts of energy elsewhere.

\vspace{.1in}\hrule

\begin{exercise}{\em 
Write a Haskell function |idft| that implements the \emph{inverse} DFT
as captured in Equation~\ref{eq:ift}.  Test your code by applying
|idft| to one of the signals used earlier in this section.  In other
words, show empirically that, up to round-off errors, |idft (dft xs)
== xs|. }
\end{exercise}

\begin{exercise}{\em 
Use |dft| to analyze some of the signals generated using signal
functions defined in Chapter~\ref{ch:sigfuns}. }
\end{exercise}

\todo{To do the above exercise we need to provide a function that
  extracts |N| samples from a sigfun, and somehow keeps it in the
  sigfun world.  Perhaps something like:

\begin{spec}
sample :: Rate -> Int -> Signal c a (Event (Table a))
\end{spec}
such that |sample r n| is a sigfun that generates an event every |1/r|
seconds, each event being a table containing |n| samples of the
input.  These tables may or may not overlap, depending on the
relationship between |r|, |n|, and the sampling rate. }

\begin{exercise}{\em
Define a function |mkSqWave :: Int -> Int -> [Complex Double]| such
that |mkSqWave num n| is the sum of the first $n$ terms of the Fourier
series of a square wave, having $num$ samples in the result. }
\end{exercise}

\begin{exercise}{\em
Prove mathematically that $x$ and $\hat{x}$ are inverses.  Also prove,
using equational reasoning, that |dft| and |idft| are inverses.  (For
the latter you may assume that Haskell numeric types obey the standard
axioms of real arithmetic.) }
\end{exercise}

\vspace{.1in}\hrule

\vspace{.1in}

\section{The Fast Fourier Transform}
\label{sec:fft}

In the last section a DFT program was developed in Haskell that was
easy to understand, being a faithful translation of Equation
\ref{eq:dft}.  For pedogogical purposes, this effort served us well.
However, for practical purposes, the program is inherently
inefficient.

To see why, think of $x[n]$ and $\hat{x}[k]$ as vectors.  Thus, for
example, each element of $\hat{x}$ is the sum of $N$ multiplications
of a vector by a complex exponential (which can be represented as a
pair, the real and imaginary parts).  And this overall process must be
repeated for each value of $k$, also $N$ times.  Therefore the overall
time complexity of the implied algorithm is O($N^2$).  For even
moderate values of $N$, this can be computationally intractable.  (Our
choice of lists for the implementation of vectors makes the complexity
even worse, because of the linear-time complexity of indexing, but the
discussion below makes this a moot point.)

Fortunately, there exists a much faster algorithm called the
\emph{Fast Fourier Transform}, or FFT, that reduces the complexity to
O($N\log N$).  This difference is quite significant for large values
of $N$, and is the standard algorithm used in most signal processing
applications.  We will not go into the details of the FFT algorithm,
other than to note that it is a divide-and-conquer algorithm that
depends on the vector size being a power of two.\footnote{The basic
  FFT algorithm was invented by James Cooley and John Tukey in 1965.}

%% \footnote{James Cooley and John Tukey are
%%   usually credited with inventing the FFT in 1965, although it was
%%   later discovered that Carl Friedrich Gauss proposed the same
%%   algorithm around 1805.}

Rather than developing our own program for the FFT, we will instead
use the Haskell library |Numeric.FFT| to import a function that will
do the job for us.  Specifically:
\begin{spec}
fft :: ...
\end{spec}
With this function we could explore the use of the FFT on specific
iinput vectors, as we did earlier with |dft|.

However, our ultimate goal is to have a version of FFT that works on
\emph{signals}.  We would like to be able to specify the number of
samples as a power of two (which we can think of as the ``window
size''), the clock rate, and how often we would like to take a
snapshot of the current window (and thus successive windows may or may
not overlap).  The resulting signal function takes a signal as input,
and outputs \emph{events} at the specified rate.  Events are discussed
in more detail in Chapter~\ref{ch:MUI}.

Indeed, Euterpea provide this functionality for us in a function
called |fftA|:
\begin{spec}
fftA :: Int -> Double -> Int -> SF Double (Event FFTData)
type FFTData = Map Double Double
\end{spec}
|SF| is a signal function type similar to |SigFun|, except that it is
targeted for use in the Musical User Interface (MUI) discussed in
detail in Chapter~\ref{ch:mui}, and thus, for example, does not have a
clock rate.  |Map T1 T2| is an abstract type that maps values of type
|T1| to values of type |T2|, and is imported from |Data.Map|.

|fftA winInt rate size| is a signal function that, every |winInt|
samples of the input, creates a window of size |2 ^ size|, and
computes the FFT of that window.  For every such result, it issues an
|Event| that maps from frequency to magnitude (using the clock rate
|rate| to determine the proper mapping).

Combining |fftA| with the MUI widgets discussed in
Chapter~\ref{ch:mui}, we can write a simple program that generates a
sine wave whose frequency is controlledd by a slider, and whose
real-time graph as well as its FFT are displayed.  The program to do
this is shown in Figure~\ref{fig:fft-mui}.

\begin{figure}
\begin{spec}
fftEx :: UISF () ()
fftEx = proc _ -> do
    f       <- hSlider (1, 2000) 440 -< ()
    (d,_) <- convertToUISF 100 simpleSig -< f
    let (s, fft) = unzip d
    _ <- histogram (500, 150) 20 -< listToMaybe (catMaybes fft)
    _ <- realtimeGraph' (500, 150) 200 20 Black -< s
    outA -< ()
  where
    simpleSig :: SigFun CtrRate Double (Double, Event [Double])
    simpleSig = proc f -> do
      s <- osc (tableSinesN 4096 [1]) 0 -< f
      fftData <- fftA 100 256 -< s
      outA -< (s, fftData)

t0 = runMUI (500, 600) "fft Test" fftEx
\end{spec}
\caption{A Real-Time Display of FFT Results}
\label{fig:fft-mui}
\end{figure}

%% fftEx :: UISF () ()
%% fftEx = proc _ -> do
%%     f      <- hSlider (1, 2000) 440 -< ()
%%     (d,_)  <- convertToUISF 100 simpleSig -< f
%%     let (s,fft) = unzip d
%%     _ <- histogram (500,150) 20 -< listToMaybe (catMaybes fft)
%%     _ <- realtimeGraph' (500,150) 200 20 Black -< s
%%     outA -< ()
%%   where
%%     simpleSig :: SigFun CtrRate Double (Double, Event FFTData)
%%     simpleSig = proc f -> do
%%         s     <- osc (tableSinesN 4096 [1]) 0 -< f
%%         fft   <- fftA 100 (rate (undefined :: CtrRate)) 8 -< s
%%         outA  -< (s, fft)

%% t0 = runMUI (500,600) "fft Test" fftEx

\section{Further Pragmatics}

\todo{Discuss windowing.}

\vspace{.1in}\hrule

\begin{exercise}{\em 
Modify the program in Figure~\ref{fig:fft-mui} in the following ways:
\begin{enumerate}
\item
Add a second slider, and use it to control the frequency of a second
oscillator. 
\item
Let |s1| and |s2| be the names of the signals whose frequencies are
controlled by the first and second sliders, respectively.  Instead of
displaying the FFT of just |s1|, try a variety of combinations of |s1|
and |s2|, such as |s1 + s2|, |s1 - s2|, |s1 * s2|, |1/s1 + 1/s2|, and
|s1 / s2|.  Comment on the results.
\item
Use |s2| to control the frequency of |s1| (as was done with |vibrato|
in Chapter~\ref{ch:sigfuns}).  Plot the fft of |s1| and comment on the
result.
\item
Instead of using |osc| to generate a pure sine wave, try using other
oscillators and/or table generators to create more complex tones, and
plot their FFT's.  Comment on the results.
\end{enumerate}
}
\end{exercise}

\vspace{.1in}\hrule

\section{References}

Most of the ideas in this chapter can be found in any good textbook on
signal processing.  The particular arrangement of the material here,
in particular Figure~\ref{fig:fourier-transforms} and the development
and demonstration of a program for the DFT, is borrowed from the
excellent text \emph{Elements of Computer Music} by Moore
\cite{Moore90}.