%-*- mode: Latex; abbrev-mode: true; auto-fill-function: do-auto-fill -*- %include lhs2TeX.fmt %include myFormat.fmt \out{ \begin{code} -- This code was automatically generated by lhs2tex --code, from the file -- HSoM/SpectrumAnalysis.lhs. (See HSoM/MakeCode.bat.) \end{code} } \chapter{Spectrum Analysis} \label{ch:spectrum-analysis} \begin{code} {-# LANGUAGE Arrows #-} module Euterpea.Music.Signal.SpectrumAnalysis where import Euterpea import Euterpea.Experimental (fftA) import Data.Complex (Complex ((:+)), polar) import Data.Maybe (listToMaybe, catMaybes) \end{code} There are many situations where it is desirable to take an existing sound signal---in particular one that is recorded by a microphone---and analyze it for its spectral content. If one can do this effectively, it is then possible (at least in theory) to recreate the original sound, or to create novel variations of it. The thepry behind this approach is based on \emph{Fourier's Theorem}, which states that any periodic signal can be decomposed into a weighted sum of (a potentially infinite number of) sine waves. In this chapter we discuss the theory as well as the pragmatics for doing spectrum analysis in Euterpea. \section{Fourier's Theorem} \label{sec:fouriers-theorem} A \emph{periodic signal} is a signal that repeats itself infinitely often. Mathematically, a signal $x$ is periodic if there exists a real number $T$ such that for all integers $n$: \[ x(t) = x(t + nT) \] %% \[ (\exists\ T\! \in\! \mathbb{R}) (\forall n\! \in\! \mathbb{Z}): x(t) = x(t + nT) \] %% where $\mathbb{R}$ is the set of real numbers and $\mathbb{Z}$ is the %% set of integers. $T$ is called the \emph{period}, which may be just a few microseconds, a few seconds, or perhaps days---the only thing that matters is that the signal repeats itself. Usually we want to find the smallest value of $T$ that satisfies the above property. For example, a sine wave is surely periodic; indeed, recall from Section \ref{sec:frequency} that: \[ \sin (2\pi k + \theta) = \sin \theta \] for any integer $k$. In this case, $T = 2\pi$, and it is the smallest value that satisfies this property. But in what sense is, for example, a single musical note periodic? Indeed it is not, unless it is repeated infinitely often, which would not be very interesting musically. Yet something we would like to know is the spectral content of that single note, or even of a small portion of that note, within an entire composition. This is one of the practical problems that we will address later in the chapter. %% In the case of audio, since humans cannot hear sound lower than %% about 20 Hz, we need only concern ourselves with periodic signals %% whose repetition period is less than 50 milliseconds %% (\nicefrac{1}{20} of a second). Recall from Section \ref{sec:frequency} that a sine wave can be represented by: $x(t) = A\sin(\omega t + \phi)$, where $A$ is the amplitude, $\omega$ is the radian frequency, and $\phi$ is the phase angle. Joseph Fourier, a french mathematician and physicist, showed the following result. Any periodic signal $x(t)$ with period $T$ can be represented as: \begin{equation}\label{eq:fourier-series} x(t) = C_0 + \sum_{n=1}^{\infty} C_n \cos(\omega_0 nt + \phi_n) \end{equation} This is called \emph{Fourier's Theorem}. $\omega_0 = \nicefrac{2\pi}{T}$ is called the \emph{fundamental frequency}. Note that the frequency of each cosine wave in the series is an integer multiple of the fundamental frequency. The above equation is also called the \emph{Fourier series} or \emph{harmonic series} (related, but not to be confused with, the mathematical definition of harmonic series, which has the precise form $1 + \nicefrac{1}{2} + \nicefrac{1}{3} + \nicefrac{1}{4} + \cdots$). The trick, of course, is determining what the coefficients $C_0$, $...$, $C_n$ and phase angles $\phi_1$, $...$, $\phi_n$ are. Determining the above equation for a particular periodic signal is called \emph{Fourier analysis}, and synthesizing a sound based on the above equation is called \emph{Fourier synthesis}. Theoretically, at least, we should be able to use Fourier analysis to decompose a sound of interest into its composite sine waves, and then regenerate it by artificially generating those composite sine waves and adding them together (i.e.\ additive synthesis, to be described in Chapter~\ref{ch:additive}). Of course, we also have to deal with the fact that the representation may involve an \emph{infinite} number of composite signals. As discussed somewhat in Chapter~\ref{ch:signals}, many naturally occurring vibrations in nature---including the resonances of most musical instruments---are characterized as having a fundamental frequency (the perceived pitch) and some combination of multiples of that frequency, which are often called \emph{harmonics}, \emph{overtones} or \emph{partials}. So Fourier's Theorem seems to be a good match for this musical application. \subsection{The Fourier Transform} When studying Fourier analysis, it is more convenient, mathematically, to use \emph{complex exponentials}. We can relate working with complex exponentials back to sines and cosines using \emph{Euler's Formula}: \[\begin{array}{lcl} e^{j\theta} &=& cos(\theta) + j sin(\theta) \\[.05in] cos(\theta) &=& \dfrac{1}{2} (e^{j\theta} + e^{-j\theta}) \\[.1in] sin(\theta) &=& \dfrac{1}{2} (e^{j\theta} - e^{-j\theta}) \end{array}\] For a periodic signal $x(t)$, which we consider to be a function of time, we denote its \emph{Fourier transform} by $\hat{x}(f)$, which is a function of frequency. Each point in $\hat{x}$ is a complex number that represents the magnitude and phase of the frequency $f$'s presence in $x(t)$. Using complex exponentials, the formula for $\hat{x}(f)$ in terms of $x(t)$ is: \[ \hat{x}(f) = \int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt \] where $\omega = 2\pi f$, and $j$ is the same as the imaginary unit $i$ used in mathematics.\footnote{Historically, engineers prefer to use the symbol $j$ rather than $i$, because $i$ is generally used to represent current in an electrical circuit.} Intuitively, the Fourier transform at a particular frequency $f$ is the integral of the product of the original signal and a pure sinusiodal wave $e^{-j\omega t}$. This latter process is related to the \emph{convolution} of the two signals, and intuitively will be non-zero only when the signal has some content of that pure signal in it. The above equation describes $\hat{x}$ in terms of $x$. We can also go the other way around---defining $x$ in terms of $\hat{x}$: \[ x(t) = \int_{-\infty}^{\infty} \hat{x}(f) e^{j\hat{\omega} f} df \] where $\hat{\omega} = 2\pi t$. This is called the \emph{inverse} Fourier transform. %% -2 pi i x e => -2 pi j t f => -j w t %% 2 pi i x e => 2 pi j t f => -j what f If we expand the definitions of $\omega$ and $\hat{\omega}$ we can see how similar these two equations are: %% \[\begin{array}{lcl} \begin{equation}\label{eq:ft} \hat{x}(f) = \int_{-\infty}^{\infty} x(t) e^{-j2\pi f t} dt \end{equation} \begin{equation}\label{eq:ift} x(t) = \int_{-\infty}^{\infty} \hat{x}(f) e^{ j2\pi f t} df \end{equation} %% \end{array}\] These two equations, for the Fourier transform and its inverse, are remarkable in their simplicity and power. They are also remarkable in the following sense: \emph{no information is lost when converting from one to the other}. In other words, a signal can be represented in terms of its time-varying behavior or its spectral content---they are equivalent! A function that has the property that $f(x) = f(-x)$ is called an \emph{even} function; if $f(x) = - f(-x)$ it is said to be \emph{odd}. It turns out that, perhaps surprisingly, \emph{any} function can be expressed as the sum of a single even function and a single odd function. This may help provide some intuition about the equations for the Fourier transform, because the complex exponential $e^{j2\pi f t}$ separates the waveform by which it is being multiplied into its even and odd parts (recall Euler's formula). The real (cosine) part affects only the even part of the input, and the imaginary (sine) part affects only the odd part of the input. \subsection{Examples} Let's consider some examples, which are illustrated in Figure~\ref{fig:fourier-transforms}: \begin{itemize} \item Intuitively, the Fourier transform of a pure cosine wave should be an impulse function---that is, the spectral content of a cosine wave should be concentrated completely at the frequency of the cosine wave. The only catch is that, when working in the complex domain, the Fourier transform also yields the mirror image of the spectral content, at a frequency that is the negation of the cosine wave's frequency, as shown in Figure~\ref{fig:fourier-transforms}a. In other words, in this case, $\hat{x}(f) = \hat{x}(-f)$, i.e.\ $\hat{x}$ is even. So the spectral content is the \emph{real} part of the complex number returned from the Fourier transform (recall Euler's formula). \item In the case of a pure sine wave, we should expect a similar result. The only catch now is that the spectral content is contained in the \emph{imaginary} part of the complex number returned from the Fourier transform (recall Euler's formula), and the mirror image is negated. That is, $\hat{x}(f) = - \hat{x}(-f)$, i.e.\ $\hat{x}$ is odd. This is illustrated in Figure~\ref{fig:fourier-transforms}b. \item Conversely, consider what the spectral content of an impulse function should be. Because an impulse function is infinitely ``sharp,'' it would seem that its spectrum should contain energy at every point in the frequency domain. Indeed, the Fourier transform of an impulse function centered at zero is a constant, as shown in Figure~\ref{fig:fourier-transforms}c. \item Consider now the spectral content of a square wave. It can be shown that the Fourier series representation of a square wave is the sum of the square wave's fundamental frequency plus its harmonically decreasing (in magnitude) odd harmonics. Specifically: \begin{equation}\label{eq:square-wave-series} sq(t) = \sum_{k=1}^{\infty} \frac{1}{k} \sin k\omega t,\quad {\rm for\ odd\ } k \end{equation} The spectral content of this signal in shown in Figure~\ref{fig:fourier-transforms}d. Figure~\ref{fig:square-wave-series} also shows partial reconstruction of the square wave from a finite number of its composite signals. \end{itemize} It is worth noting that the diagrams in Figure~\ref{fig:fourier-transforms} make no assumptions about time or frequency. Therefore, because the Fourier transform and its inverse are true mathematical inverses, we can read the diagrams as time domain / frequency domain pairs, or the other way around; i.e.\ as frequency domain / time domain pairs. For example, interpreting the diagram on the left of Figure~\ref{fig:fourier-transforms}a in the frequency domain, is to say that it is the Fourier transform of the signal on the right (interpreted in the time domain). \begin{figure}[hbtp] \centering \includegraphics[height=2.4in,angle=270]{pics/cosine1.eps} \includegraphics[height=2.4in,angle=270]{pics/cosine2.eps} \begin{center} (a) Cosine wave \end{center} \includegraphics[height=2.4in,angle=270]{pics/sine1.eps} \includegraphics[height=2.4in,angle=270]{pics/sine2.eps} \begin{center} (b) Sine wave \end{center} \includegraphics[height=2.4in,angle=270]{pics/pulse1.eps} \includegraphics[height=2.4in,angle=270]{pics/pulse2.eps} \begin{center} (c) Impulse function \end{center} \includegraphics[height=2.4in,angle=270]{pics/square1.eps} \includegraphics[height=2.4in,angle=270]{pics/square2.eps} \begin{center} (d) Square wave \end{center} \caption{Examples of Fourier Transforms} \label{fig:fourier-transforms} \end{figure} \begin{figure}[hbtp] \centering \includegraphics[height=2.4in,angle=270]{pics/19_2a.eps} \begin{center} (a) Sine wave \end{center} \includegraphics[height=2.4in,angle=270]{pics/19_2b.eps} \begin{center} (b) Sine wave + third harmonic \end{center} \includegraphics[height=2.4in,angle=270]{pics/19_2c.eps} \begin{center} (c) Sine wave + third and fifth harmonics \end{center} \includegraphics[height=2.4in,angle=270]{pics/19_2d.eps} \begin{center} (d) Sum of first eight terms of the Fourier series of a square wave \end{center} \caption{Generating a Square Wave from Odd Harmonics} \label{fig:square-wave-series} \end{figure} \section{The Discrete Fourier Transform} \label{sec:DFT} Recall from Section \ref{sec:discrete} that we can move from the continuous signal domain to the discrete domain by replacing the time $t$ with the quantity $\nicefrac{n}{r}$, where $n$ is the integer index into the sequence of discrete samples, and $r$ is the sampling rate. Let us assume that we have done this for $x$, and we will use square brackets to denote the difference. That is, $x[n]$ denotes the $n^{\rm th}$ sample of the continuous signal $x(t)$, corresponding to the value $x(\nicefrac{n}{r})$. We would now like to compute the \emph{Discrete Fourier Transform} (DFT) of our discrete signal. But instead of being concerned about the sampling rate (which can introduce aliasing, for example), our concern turns to the \emph{number of samples} that we use in computing the DFT---let's call this $N$. Intuitively, the integrals used in our equations for the Fourier transform and its inverse should become sums over the range $0 ... N-1$. This leads to a reformulation of our two equations (\ref{eq:ft} and \ref{eq:ift}) as follows:\footnote{The purpose of the factor $\nicefrac{1}{N}$ in Equation~\ref{eq:dft} is to ensure that the DFT and the inverse DFT are in fact inverses of each other. But it is just by convention that one equation has this factor and the other does not---it would be sufficient if it were done the other way around. In fact, all that matters is that the product of the two coefficients be $\nicefrac{1}{N}$, and thus it would also be sufficient for each equation to have the same coefficient, namely $\nicefrac{1}{\sqrt{N}}$. Similarly, the negative exponent in one equation and positive in the other is also by convention---it would be sufficient to do it the other way around.} \begin{equation}\label{eq:dft} \hat{x}[k] = \frac{1}{N}\sum_{n=0}^{N-1} x[n] e^{-j\frac{2\pi k n}{N}},\ \ k = 0, 1, ..., N-1 \end{equation} %% Analogously, the equation for the inverse DFT is: \begin{equation}\label{eq:idft} x[n] = \sum_{k=0}^{N-1} \hat{x}[k] e^{j\frac{2\pi k n}{N}},\ \ n = 0, 1, ..., N-1 \end{equation} Despite all of the mathematics up to this point, the reader may now realize that the discrete Fourier transform as expressed above is amenable to implementation---for example it should not be difficult to write Haskell functions that realize each of the above equations. But before addressing implementation issues, let's discuss a bit more what the results actually \emph{mean}. \subsection{Interpreting the Frequency Spectrum} \label{sec:freq-spectrum} Just as $x[n]$ represents a sampled version of the continuous input signal, $\hat{x}[k]$ represents a sampled version of the continous frequency spectrum. Care must be taken when interpreting either of these results, keeping in mind the Nyquist-Shannon Sampling Theorem (recall Section~\ref{sec:digital-audio}) and aliasing (Section~\ref{sec:aliasing}). Also recall that the result of a Fourier transform of a periodic signal is a Fourier series (see Section~\ref{sec:fouriers-theorem}), in which the signal being analyzed is expressed as multiples of a fundamental frequency. In equation \ref{eq:dft} above, that fundamental frequency is the inverse of the duration of the $N$ samples, i.e.\ the inverse of $\nicefrac{N}{r}$, or $\nicefrac{r}{N}$. For example, if the sampling rate is 44.1 kHz (the CD standard), then: \begin{itemize} %% \item If we take $N=44$ samples, then the fundamental frequency will %% be (about) $r/N = 1000$ Hz. \item If we take $N=441$ samples, then the fundamental frequency will be $\nicefrac{r}{N} = 100$ Hz. \item If we take $N=4410$ samples, then the fundamental frequency will be $\nicefrac{r}{N} = 10$ Hz. \item If we take $N=44100$ samples, then the fundamental frequency will be $\nicefrac{r}{N} = 1$ Hz. \end{itemize} Thus, as would be expected, taking more samples yields a \emph{finer} resolution of the frequency spectrum. On the other hand, note that if we increase the sampling rate and keep the number of samples fixed, we get a \emph{coarser} resolution of the spectrum---this also should be expected, because if we increase the sampling rate we would expect to have to look at more samples to get the same accuracy. Analogous to the Nyquist-Shannon Sampling Theorem, the representable points in the resulting frequency spectrum lie in the range $\pm\nicefrac{r}{2}$, i.e.\ between plus and minus one-half of the sampling rate. For the above three cases, respectively, that means the points are: \begin{itemize} %% \item -22 kHz, -21 kHz, ..., -1 kHz, 0, 1 kHz, ..., 21 kHz, 22kHz %% \item %% -22.05 kHz, -21.95 kHz, ..., -0.05 kHz, 0.05 kHz, ..., 21.95 kHz, 22.05 kHz \item -22.0 kHz, -21.9 kHz, ..., -0.1 kHz, 0, 0.1 kHz, ..., 21.9 kHz, 22.0 kHz \item -22.05 kHz, -22.04 kHz, ..., -10 Hz, 0, 10 Hz, ..., 22.04 kHz, 22.05 kHz \item -22.05 kHz, -22.049 kHz, ..., -1 Hz, 0, 1 Hz, ..., 22.049 kHz, 22.05 kHz \end{itemize} For practical purposes, the first of these is usually too coarse, the third is too fine, and the middle one is useful for many applications. Note that the first range of frequencies above does not quite cover the range $\pm\nicefrac{r}{2}$. But remember that this is a discrete representation of the actual frequency spectrum, and the proper interpretation would include the frequences $+\nicefrac{r}{2}$ and $-\nicefrac{r}{2}$. Also note that there are $N+1$ points in each of the above ranges, not $N$. Indeed, the more general question is, how do these points in the frequency spectrum correspond to the indices $i = 0, 1, ..., N-1$ in $\hat{x}[i]$? If we denote each of these frequencies as $f$, the answer is that: \begin{equation}\label{eq:f1} f = \frac{ir}{N},\ \ \ \ i = 0, 1, ..., N-1 \end{equation} But note that this range of frequencies extends from $0$ to $(N-1)(\nicefrac{r}{N})$, which exceeds the Nyquist-Shannon sampling limit of $\nicefrac{r}{2}$. The way out of this dilemma is to realize that the DFT assumes that the input signal is periodic in time, and therefore the DFT is periodic in frequency. In other words, values of $f$ for indices $i$ greater than $\nicefrac{N}{2}$ can be interpreted as frequencies that are the \emph{negation} of the frequency given by the formula above. Assuming even $N$, we can revise formula \ref{eq:f1} as follows: \begin{equation}\label{eq:f2} f = \left\{ \begin{array}{ll} i\dfrac{r}{N}, & \quad i = 0, 1, ..., \dfrac{N}{2} \\[0.1in] (i-N)\dfrac{r}{N} & \quad i = \dfrac{N}{2}, \dfrac{N}{2}+1, ..., N-1 \\ \end{array} \right. \end{equation} Note that when $i = \nicefrac{N}{2}$, both equations apply, yielding $f = \nicefrac{r}{2}$ in the first case, and $f = -\nicefrac{r}{2}$ in the second. Indeed, the magnitude of the DFT for each of these frequencies is the same (see discussion in the next section), reflecting the periodicity of the DFT, and thus is simply a form of redundancy. %% \begin{itemize} %% \item %% $\hat{x}[i],\ i = 0, 1, 2, ..., \frac{N}{2}$ correspond to the %% frequencies $0, \frac{r}{N}, \frac{2r}{N}, ..., \frac{r}{2}$. %% \item %% $\hat{x}[i],\ i = \frac{N}{2}, \frac{N}{2}+1, \frac{N}{2}+2, ..., N-1$ %% correspond to the frequencies %% $-\frac{r}{2}, -\frac{r}{2}+\frac{r}{N}, -\frac{r}{2}+\frac{2r}{N}, %% ..., -\frac{r}{N}$. %% \end{itemize} %% ... This can be viewed s a kind of aliasing in the frequency domain. The above discussion has assumed a periodic signal whose fundamental frequency is known, thus allowing us to parameterize the DFT with the same fundamental frequency. In practice this rarely happens. That is, the fundamental frequency of the DFT typically has no integral relationship to the period of the periodic signal. This raises the question, what happens to the frequencies that ``fall in the gaps'' between the frequencies discussed above? The answer is that the energy of that frequency component will be distributed amongst neighboring points in a way that makes sense mathematically, although the result may look a little funny compared to the ideal result (where every frequency component is an integer multiple of the fundamental). The important thing to remember is that these are digital representations of the exact spectra, just as a digitized signal is representative of an exact signal. Two digitized signals can look very different (depending on sample rate, phase angle, and so on), yet represent the same underlying signal---the same is true of a digitized spectrum. In practice, for reasons of computational efficiency, $N$ is usually chosen to be a power of two. We will return to this issue when we discuss implementing the DFT. \subsection{Amplitude and Power of Spectrum} \label{sec:amp-spectrum} We discussed above how each sample in the result of a DFT relates to a point in the frequency spectrum of the input signal. But how do we determine the amplitude and phase angle of each of those frequency components? In general each sample in the result of a DFT is a complex number, thus having both a real and imaginary part, of the form $a + jb$. We can visualize this number as a point in the complex Cartesian plane, where the abscissa (x-axis) represents the real part, and the ordinate (y-axis) represents the imaginary part, as shown in Figure~\ref{fig:complex-plane}. It is easy to see that the line from the origin to the point of interest is a vector $A$, whose length is the \emph{amplitude} of the frequency component in the spectrum: \begin{equation}\label{eq:amplitude} A = \sqrt{a^2 + b^2} \end{equation} The angle $\theta$ is the \emph{phase}, and it is easily defined from the figure as: \begin{equation}\label{eq:phase} \theta = tan^{-1} \frac{b}{a} \end{equation} (This amplitude / phase pair is often called the \emph{polar} representation of a complex number.) \begin{figure}[hbtp] \centering \includegraphics[height=2.4in]{pics/ComplexToPolar.eps} % angle=270 \caption{Complex and Polar Coordinates} \label{fig:complex-plane} \end{figure} Recall from Section~\ref{sec:amplitude} that power is proportional to the square of the amplitude. Since taking a square root adds computational expense, the square root is often omitted from Equation~\ref{eq:amplitude}, thus yielding a \emph{power spectrum} instead of an \emph{amplitude spectrum}. One subtle aspect of the resulting DFT is how to interpret \emph{negative} frequencies. In the case of having an input whose samples are all real numbers (i.e.\ there are no imaginary components), which is true for audio applications, the negative spectrum is a mirror image of the positive spectrum, and the amplitude/power is distributed evenly between the two. \subsection{A Haskell Implementation of the DFT} \label{sec:haskell-dft} From equation \ref{eq:dft}, which defines the DFT mathematically, we can write a Haskell program that implements the DFT. The first thing we need to do is understand how complex numbers are handled in Haskell. They are captured in the |Complex| library, which must be imported into any program that uses them. The type |Complex T| is the type of complex numbers whose underlying numeric type is |T|. We will use, for example, |Complex Double| for testing our DFT. A complex number $a + jb$ is represented in Haskell as |a :+ b|, and since |(:+)| is a constructor, such values can be pattern matched. \syn{Complex numbers in Haskell are captured in the |Complex| library, in which complex numbers are defined as a polymorphic data type: \begin{spec} infix 6 :+ data (RealFloat a) => Complex a = !a :+ !a \end{spec} The ``|!|'' in front of the type variables declares that the constructor |(:+)| is strict in its arguments. For example, the complex number $a + jb$ is represented by |a :+ b| in Haskell. One can pattern match on complex number values to extract the real and imaginary parts, or use one of the predefined selectors defined in the |Complex| library: \begin{spec} realPart, imagPart :: RealFloat a => Complex a -> a \end{spec} The |Complex| library also defines the following functions: \begin{spec} conjugate :: RealFloat a => Complex a -> Complex a mkPolar :: RealFloat a => a -> a -> Complex a cis :: RealFloat a => a -> Complex a polar :: RealFloat a => Complex a -> (a,a) magnitude, phase :: RealFloat a => Complex a -> a \end{spec} The library also declares instances of |Complex| for the type classes |Num|, |Fractional|, and |Floating|. } Although not as efficient as arrays, for simplicity we choose to use lists to represent the vectors that are the input and output of the DFT. Thus if |xs| is the list that represents the signal $x$, then |xs!!n| is the |n+1|$^{th}$ sample of that signal, and is equivalent to $x[n]$. Furthermore, using list comprehensions, we can make the Haskell code look very much like the mathematical definition captured in Equation \ref{eq:dft}. Finally, we adopt the convention that the length of the input signal is the number of samples that we will use for the DFT. Probably the trickiest part of writing a Haskell program for the DFT is dealing with the types! In particular, if you look closely at Equation \ref{eq:dft} you will see that $N$ is used in three different ways---as an integer (for indexing), as a real number (in the exponent of $e$), and as a complex number (in the expression $\nicefrac{1}{N}$). Here is a Haskell program that implements the DFT: \begin{code} dft :: RealFloat a => [Complex a] -> [Complex a] dft xs = let lenI = length xs lenR = fromIntegral lenI lenC = lenR :+ 0 in [ let i = -2 * pi * fromIntegral k / lenR in (1/lenC) * sum [ (xs!!n) * exp (0 :+ i * fromIntegral n) | n <- [0,1..lenI-1] ] | k <- [0,1..lenI-1] ] \end{code} Note that |lenI|, |lenR|, and |lenC| are the integer, real, and complex versions, respectively, of $N$. Otherwise the code is fairly straightforward---note in particular how list comprehensions are used to implement the ranges of $n$ and $k$ in Equation \ref{eq:dft}. To test our program, let's first create a couple of waveforms. For example, recall that Equation \ref{eq:square-wave-series} defines the Fourier series for a square wave. We can implement the first, first two, and first three terms of this series, corresponding respectively to Figures~\ref{fig:square-wave-series}a, \ref{fig:square-wave-series}b, and \ref{fig:square-wave-series}c, by the following Haskell code: \begin{code} mkTerm :: Int -> Double -> [Complex Double] mkTerm num n = let f = 2 * pi / fromIntegral num in [ sin (n * f * fromIntegral i) / n :+ 0 | i <- [0,1..num-1] ] mkxa, mkxb, mkxc :: Int-> [Complex Double] mkxa num = mkTerm num 1 mkxb num = zipWith (+) (mkxa num) (mkTerm num 3) mkxc num = zipWith (+) (mkxb num) (mkTerm num 5) \end{code} Thus |mkTerm num n| is the |n|$^{th}$ term in the series, using |num| samples. Using the helper function |printComplexL| defined in Figure~\ref{fig:pp-code}, which ``pretty prints'' a list of complex numbers, we can look at the result of our DFT in a more readable form.\footnote{``Pretty-printing'' real numbers is a subtle task. The code in Figure~\ref{fig:pp-code} rounds the number to 10 decimal places of accuracy, and inserts spaces before and after to line up the decimal points and give a consistent string length. The fractional part is not padded with zeros, since that would give a false impression of its accuracy. (It is not necessary to understand this code in order to understand the concepts in this chapter.)} \begin{figure} \begin{code} printComplexL :: [Complex Double] -> IO () printComplexL xs = let f (i,rl:+im) = do putStr (spaces (3 - length (show i)) ) putStr (show i ++ ": (" ) putStr (niceNum rl ++ ", " ) putStr (niceNum im ++ ")\n" ) in mapM_ f (zip [0..length xs - 1] xs) niceNum :: Double -> String niceNum d = let d' = fromIntegral (round (1e10 * d)) / 1e10 (dec, fra) = break (== '.') (show d') (fra',exp) = break (== 'e') fra in spaces (3 - length dec) ++ dec ++ take 11 fra' ++ exp ++ spaces (12 - length fra' - length exp) spaces :: Int -> String spaces n = take n (repeat ' ') \end{code} \caption{Helper Code for Pretty-Printing DFT Results} \label{fig:pp-code} \end{figure} For example, suppose we want to take the DFT of a 16-sample representation of the first three terms of the square wave series. Typing the following at the GHCi prompt: \begin{spec} printComplexL (dft (mkxc 16)) \end{spec} will yield the result of the DFT, pretty-printing each number as a pair, along with its index: {\small \begin{verbatim} 0: ( 0.0 , 0.0 ) 1: ( 0.0 , -0.5 ) 2: ( 0.0 , 0.0 ) 3: ( 0.0 , -0.1666666667 ) 4: ( 0.0 , 0.0 ) 5: ( 0.0 , -0.1 ) 6: ( 0.0 , 0.0 ) 7: ( 0.0 , 0.0 ) 8: ( 0.0 , 0.0 ) 9: ( 0.0 , 0.0 ) 10: ( 0.0 , 0.0 ) 11: ( 0.0 , 0.1 ) 12: ( 0.0 , 0.0 ) 13: ( 0.0 , 0.1666666667 ) 14: ( 0.0 , 0.0 ) 15: ( 0.0 , 0.5 ) \end{verbatim} } Let's study this result more closely. For sake of argument, assume a sample rate of 1.6 KHz. Then by construction using |mkxc|, our square-wave input's fundamental frequency is 100 Hz. Similarly, recall that the resolution of the DFT is |r/N|, which is also 100 Hz. Now compare the overall result to Figure \ref{fig:fourier-transforms}b. Recalling also Equation \ref{eq:f2}, we note that the above DFT results are non-zero precisely at 100, 300, 500, -500, -300, and -100 Hz. This is just what we would expect. Furthermore, the amplitudes are one-half of the corresponding harmonically decreasing weights dictated by Equation \ref{eq:square-wave-series}, namely the values 1, $\nicefrac{1}{6}$, and $\nicefrac{1}{10}$ (recall the discussion in Section \ref{sec:amp-spectrum}). Let's do another example. We can create an impulse function as follows: \begin{code} mkPulse :: Int -> [Complex Double] mkPulse n = 100 : take (n-1) (repeat 0) \end{code} and print its DFT with the command: \begin{spec} printComplexL (dft (mkPulse 16)) \end{spec} whose effect is: {\small \begin{verbatim} 0: ( 6.25 , 0.0 ) 1: ( 6.25 , 0.0 ) 2: ( 6.25 , 0.0 ) 3: ( 6.25 , 0.0 ) 4: ( 6.25 , 0.0 ) 5: ( 6.25 , 0.0 ) 6: ( 6.25 , 0.0 ) 7: ( 6.25 , 0.0 ) 8: ( 6.25 , 0.0 ) 9: ( 6.25 , 0.0 ) 10: ( 6.25 , 0.0 ) 11: ( 6.25 , 0.0 ) 12: ( 6.25 , 0.0 ) 13: ( 6.25 , 0.0 ) 14: ( 6.25 , 0.0 ) 15: ( 6.25 , 0.0 ) \end{verbatim}} Compare this to Figure \ref{fig:fourier-transforms}c, and note how the original magnitude of the impulse (100) is distributed evenly among the 16 points in the DFT ($100/16 = 6.25$). So far we have considered only input signals whose frequency components are integral multiples of the DFT's resolution. This rarely happens in practice, however, because music is simply too complex, and noisy. As mentioned in \ref{sec:freq-spectrum}, the energy of the signals that ``fall in the gaps'' is distributed among neighboring points, although not in as simple a way as you might think. To get some perspective on this, let's do one other example. We define a function to generate a signal whose frequeny is $\pi$ times the fundamental frequency: %% We will modify the function |mkTerm| so that it creates a signal %% 42\% higher than it used to be: \begin{code} x1 num = let f = pi * 2 * pi / fromIntegral num in map (:+ 0) [ sin (f * fromIntegral i) | i <- [0,1..num-1] ] \end{code} $\pi$ is an irrational number, but any number that ``falls in the gaps'' between indices would do. We can see the result by typing the command: \begin{spec} printComplexL (dft x1) \end{spec} which yields: {\small \begin{verbatim} 0: ( -7.9582433e-3 , 0.0 ) 1: ( -5.8639942e-3 , -1.56630897e-2) 2: ( 4.7412105e-3 , -4.56112124e-2) 3: ( 0.1860052232 , -0.4318552865 ) 4: ( -5.72962095e-2, 7.33993364e-2) 5: ( -3.95845728e-2, 3.14378088e-2) 6: ( -3.47994673e-2, 1.65400768e-2) 7: ( -3.29813518e-2, 7.4048103e-3 ) 8: ( -3.24834325e-2, 0.0 ) 9: ( -3.29813518e-2, -7.4048103e-3 ) 10: ( -3.47994673e-2, -1.65400768e-2) 11: ( -3.95845728e-2, -3.14378088e-2) 12: ( -5.72962095e-2, -7.33993364e-2) 13: ( 0.1860052232 , 0.4318552865 ) 14: ( 4.7412105e-3 , 4.56112124e-2) 15: ( -5.8639942e-3 , 1.56630897e-2) \end{verbatim}} This is much more complicated than the previous examples! Not only do the points in the spectrum seem to have varying amounts of energy, they also have both non-zero real and non-zero imaginary components, meaning that the magnitude and phase vary at each point. We can define a function that converts a list of complex numbers into a list of their polar representations as follows: \begin{code} mkPolars :: [Complex Double] -> [Complex Double] mkPolars = map ((\(m,p)-> m:+p) . polar) \end{code} which we can then use to reprint our result: \begin{spec} printComplexL (mkPolars (dft x1)) \end{spec} {\small \begin{verbatim} 0: ( 7.9582433e-3 , 3.1415926536 ) 1: ( 1.67247961e-2, -1.9290259418 ) 2: ( 4.58569709e-2, -1.4672199604 ) 3: ( 0.470209455 , -1.1640975898 ) 4: ( 9.31145435e-2, 2.2336013741 ) 5: ( 5.05497204e-2, 2.4704023271 ) 6: ( 3.85302097e-2, 2.6979021519 ) 7: ( 3.38023784e-2, 2.9207398294 ) 8: ( 3.24834325e-2, -3.1415926536 ) 9: ( 3.38023784e-2, -2.9207398294 ) 10: ( 3.85302097e-2, -2.6979021519 ) 11: ( 5.05497204e-2, -2.4704023271 ) 12: ( 9.31145435e-2, -2.2336013741 ) 13: ( 0.470209455 , 1.1640975898 ) 14: ( 4.58569709e-2, 1.4672199604 ) 15: ( 1.67247961e-2, 1.9290259418 ) \end{verbatim} } If we focus on the magnitude (the first column), we can see that there is a peak near index 3 (corresponding roughly to the frequency $\pi$), with small amounts of energy elsewhere. \vspace{.1in}\hrule \begin{exercise}{\em Write a Haskell function |idft| that implements the \emph{inverse} DFT as captured in Equation~\ref{eq:ift}. Test your code by applying |idft| to one of the signals used earlier in this section. In other words, show empirically that, up to round-off errors, |idft (dft xs) == xs|. } \end{exercise} \begin{exercise}{\em Use |dft| to analyze some of the signals generated using signal functions defined in Chapter~\ref{ch:sigfuns}. } \end{exercise} \todo{To do the above exercise we need to provide a function that extracts |N| samples from a sigfun, and somehow keeps it in the sigfun world. Perhaps something like: \begin{spec} sample :: Rate -> Int -> Signal c a (Event (Table a)) \end{spec} such that |sample r n| is a sigfun that generates an event every |1/r| seconds, each event being a table containing |n| samples of the input. These tables may or may not overlap, depending on the relationship between |r|, |n|, and the sampling rate. } \begin{exercise}{\em Define a function |mkSqWave :: Int -> Int -> [Complex Double]| such that |mkSqWave num n| is the sum of the first $n$ terms of the Fourier series of a square wave, having $num$ samples in the result. } \end{exercise} \begin{exercise}{\em Prove mathematically that $x$ and $\hat{x}$ are inverses. Also prove, using equational reasoning, that |dft| and |idft| are inverses. (For the latter you may assume that Haskell numeric types obey the standard axioms of real arithmetic.) } \end{exercise} \vspace{.1in}\hrule \vspace{.1in} \section{The Fast Fourier Transform} \label{sec:fft} In the last section a DFT program was developed in Haskell that was easy to understand, being a faithful translation of Equation \ref{eq:dft}. For pedogogical purposes, this effort served us well. However, for practical purposes, the program is inherently inefficient. To see why, think of $x[n]$ and $\hat{x}[k]$ as vectors. Thus, for example, each element of $\hat{x}$ is the sum of $N$ multiplications of a vector by a complex exponential (which can be represented as a pair, the real and imaginary parts). And this overall process must be repeated for each value of $k$, also $N$ times. Therefore the overall time complexity of the implied algorithm is O($N^2$). For even moderate values of $N$, this can be computationally intractable. (Our choice of lists for the implementation of vectors makes the complexity even worse, because of the linear-time complexity of indexing, but the discussion below makes this a moot point.) Fortunately, there exists a much faster algorithm called the \emph{Fast Fourier Transform}, or FFT, that reduces the complexity to O($N\log N$). This difference is quite significant for large values of $N$, and is the standard algorithm used in most signal processing applications. We will not go into the details of the FFT algorithm, other than to note that it is a divide-and-conquer algorithm that depends on the vector size being a power of two.\footnote{The basic FFT algorithm was invented by James Cooley and John Tukey in 1965.} %% \footnote{James Cooley and John Tukey are %% usually credited with inventing the FFT in 1965, although it was %% later discovered that Carl Friedrich Gauss proposed the same %% algorithm around 1805.} Rather than developing our own program for the FFT, we will instead use the Haskell library |Numeric.FFT| to import a function that will do the job for us. Specifically: \begin{spec} fft :: ... \end{spec} With this function we could explore the use of the FFT on specific iinput vectors, as we did earlier with |dft|. However, our ultimate goal is to have a version of FFT that works on \emph{signals}. We would like to be able to specify the number of samples as a power of two (which we can think of as the ``window size''), the clock rate, and how often we would like to take a snapshot of the current window (and thus successive windows may or may not overlap). The resulting signal function takes a signal as input, and outputs \emph{events} at the specified rate. Events are discussed in more detail in Chapter~\ref{ch:MUI}. Indeed, Euterpea provide this functionality for us in a function called |fftA|: \begin{spec} fftA :: Int -> Double -> Int -> SF Double (Event FFTData) type FFTData = Map Double Double \end{spec} |SF| is a signal function type similar to |SigFun|, except that it is targeted for use in the Musical User Interface (MUI) discussed in detail in Chapter~\ref{ch:mui}, and thus, for example, does not have a clock rate. |Map T1 T2| is an abstract type that maps values of type |T1| to values of type |T2|, and is imported from |Data.Map|. |fftA winInt rate size| is a signal function that, every |winInt| samples of the input, creates a window of size |2 ^ size|, and computes the FFT of that window. For every such result, it issues an |Event| that maps from frequency to magnitude (using the clock rate |rate| to determine the proper mapping). Combining |fftA| with the MUI widgets discussed in Chapter~\ref{ch:mui}, we can write a simple program that generates a sine wave whose frequency is controlledd by a slider, and whose real-time graph as well as its FFT are displayed. The program to do this is shown in Figure~\ref{fig:fft-mui}. \begin{figure} \begin{spec} fftEx :: UISF () () fftEx = proc _ -> do f <- hSlider (1, 2000) 440 -< () (d,_) <- convertToUISF 100 simpleSig -< f let (s, fft) = unzip d _ <- histogram (500, 150) 20 -< listToMaybe (catMaybes fft) _ <- realtimeGraph' (500, 150) 200 20 Black -< s outA -< () where simpleSig :: SigFun CtrRate Double (Double, Event [Double]) simpleSig = proc f -> do s <- osc (tableSinesN 4096 [1]) 0 -< f fftData <- fftA 100 256 -< s outA -< (s, fftData) t0 = runMUI (500, 600) "fft Test" fftEx \end{spec} \caption{A Real-Time Display of FFT Results} \label{fig:fft-mui} \end{figure} %% fftEx :: UISF () () %% fftEx = proc _ -> do %% f <- hSlider (1, 2000) 440 -< () %% (d,_) <- convertToUISF 100 simpleSig -< f %% let (s,fft) = unzip d %% _ <- histogram (500,150) 20 -< listToMaybe (catMaybes fft) %% _ <- realtimeGraph' (500,150) 200 20 Black -< s %% outA -< () %% where %% simpleSig :: SigFun CtrRate Double (Double, Event FFTData) %% simpleSig = proc f -> do %% s <- osc (tableSinesN 4096 [1]) 0 -< f %% fft <- fftA 100 (rate (undefined :: CtrRate)) 8 -< s %% outA -< (s, fft) %% t0 = runMUI (500,600) "fft Test" fftEx \section{Further Pragmatics} \todo{Discuss windowing.} \vspace{.1in}\hrule \begin{exercise}{\em Modify the program in Figure~\ref{fig:fft-mui} in the following ways: \begin{enumerate} \item Add a second slider, and use it to control the frequency of a second oscillator. \item Let |s1| and |s2| be the names of the signals whose frequencies are controlled by the first and second sliders, respectively. Instead of displaying the FFT of just |s1|, try a variety of combinations of |s1| and |s2|, such as |s1 + s2|, |s1 - s2|, |s1 * s2|, |1/s1 + 1/s2|, and |s1 / s2|. Comment on the results. \item Use |s2| to control the frequency of |s1| (as was done with |vibrato| in Chapter~\ref{ch:sigfuns}). Plot the fft of |s1| and comment on the result. \item Instead of using |osc| to generate a pure sine wave, try using other oscillators and/or table generators to create more complex tones, and plot their FFT's. Comment on the results. \end{enumerate} } \end{exercise} \vspace{.1in}\hrule \section{References} Most of the ideas in this chapter can be found in any good textbook on signal processing. The particular arrangement of the material here, in particular Figure~\ref{fig:fourier-transforms} and the development and demonstration of a program for the DFT, is borrowed from the excellent text \emph{Elements of Computer Music} by Moore \cite{Moore90}.