\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\Z}{\mathbb{Z}}\)
\(\newcommand{\RR}{\mathbb{R}}\)
\(\newcommand{\C}{\mathbb{C}}\)
\(\newcommand{\N}{\mathbb{N}}\)
Notes largely from this excellent book and these more concise but extremely useful notes.
The general idea¶
A function \(\R \to \R\) is a continuous version of an array, in the sense that for an array \(v\), we have a discrete index for elements, e.g. \(v_i\) is a number for \(i \in \mathbb{N}\), while for a function, we have a continuous "index", e.g. \(f(x)\) is a number for \(x \in \mathbb{R}\).
Accordingly, functions form an inner product space, in the sense that for \(f, g : \mathbb{R} \to \mathbb{R}\), \((af + bg)(x) = af(x) + bg(x)\) and \(\langle f | g \rangle = \int f(x)^*g(x)dx\) (an analogue of the dot product for finite dimensional spaces).
Many of the important concepts from the theory of inner product spaces apply. In particular, there is a self-adjoint operator, the second derivative \(\frac{d^2}{dx^2}\), and a notion of changing to an orthonormal eigenbasis (the complex exponential functions \(x \to e^{ikx}\)) with a unitary linear operator \(\mathcal{F}\) known as the Fourier transform.
This already encapsulates the key facts about the Fourier transform: it is unitary (Plancherel's theorem), its inverse is its conjugate transpose (Fourier inversion theorem), and it expresses a function as a sum/integral of its eigenvectors, which are the complex exponentials (standard definition of the Fourier transform).
[^1: One has to be careful about which functions are allowed. This is the side of Fourier analysis that involves analysis.]
This plays out in various different settings:
-
\(\mathcal{F}:(\R/n\Z\to\C)\to(\Z\to\C)\) is called the Fourier Series: that is, a (sufficiently nice) periodic function can be expressed as an (infinite) sum of complex exponentials. In other words, \(e^{ki\pi x}\), for \(k\in\Z\), is a basis for periodic functions.
-
\(\mathcal{F}:(\R\to\C)\to(\R\to\C)\) is called the Fourier Transform: that is, a (sufficiently nice) function can be expressed as an integral of complex exponentials. This is like a continuous analog of a basis.
-
\(\mathcal{F}:(V^n\to\C)\to(V^n\to\C)\) is called the Discrete Fourier Transform: is it a linear operator between finite dimension vector spaces, so can be expressed as a matrix.
-
\(\mathcal{F}:(\Z\to\C)\to(\R\to\C)\) is called the Discrete Time Fourier Transform.
A range of properties hold, mutatis mutandis, across all the cases. I'll state them by default for the Fourier transform:
-
The transform is unitary. E.g. \(||f||^2 = ||\mathcal{F}f||^2\). Unpacking that: \(\int |f(t)|^2 dt = \int |\mathcal{F}f(k)|^2dk\), which is (depending on the community, and the degree of generality), Parseval's theorem, Plancherel's theorem or Rayleigh's identity. Here we're using the inner product of the space we're in, to define the norm \(||\cdot ||\). Unitary transforms are invertible with \(U^{*}=U^{-1}\), so we now know the inverse of the Fourier transform too.
-
Define shifts and squashes as:
Then the shift and stretch theorems:
As an extremely useful special case, we have, writing \(f^- = \lambda x \to f(-x)\):
- Differentiation:
- Convolution
Define the convolution operator \(*\) by:
Convolution forms a commutative monoid on functions, which, to unpack that statement, means that it yields a new function, is commutative (as can be shown by a change of variables of the above equation), is associative (shown similarly), and has an identity.
Furthermore, the Fourier transform induces a monoid homomorphism from this monoid to the product monoid of functions (i.e. the binary operation maps \(f\) and \(g\) to \(fg\), and the unit is \(1\)). In particular, \(\mathcal{F}(f*g)(k) = \mathcal{F}f\mathcal{F}g\). This is very useful in engineering and physics applications.
Deconvolution is then solving \(f*g=h\) for \(f\) given \(g,h\), which (ignore division by \(0\) issues) is given by \(\mathcal{F}^{-1}(\frac{\mathcal{Fh}}{Fg})\)
Concrete details¶
I now give more concrete details of the points outlined above.
Self-adjoint operators¶
Linear operators on the function space include the operator \(p=\frac{d}{dt}\) (which physically is the momentum operator in quantum mechanics). This takes a function and gives back a new function. It is linear. It is also skew-self-adjoint (using integration by parts, and noting that the first term in the integration by parts disappears assuming a boundary condition that all functions in the space vanish at \(\pm\infty\)):
But if an operator \(A\) is skew-self-adjoint, then \(iA\) must be self-adjoint (since \(\langle iA x, y \rangle = -\langle x, (iA)^{\dagger}y \rangle = \langle x, Ay \rangle\)).
\(p\) being self-adjoint means that (a version of) the spectral theorem gives that it has an orthonormal eigenbasis. The functions \(e^{ikt}\) are one such eigenbasis, for \(k\) real. (Note that I'm using "basis" to include this uncountable set of functions - the full story requires more finesse).
Orthogonality of complex exponentials¶
Let \(T(x)\) be the complex conjugate of \(x\). Orthogonality is shown as follows:
For \(n,m \in \Z, n\neq m\):
Even and Odd functions¶
Consider the operator \(M\), such that \(M(f)(x)=f(-x)\). This is Hermitian (and unitary). Being the square root of \(I\), \(1\) and \(-1\) are the eigenvalues. The corresponding eigenspaces are the even and odd functions.
To spell that out more concretely. An even function \(f_e\) is such that \(f_e(-x)=f_e(x)\). An odd function \(f_o\) is such that \(f_o(-x)=-f_o(x)\).
Even and odd functions form respective subspaces: linear combinations of even functions are even, and likewise for odd.
Every function \(f\) can be written as \(\frac{f(x)+f(-x)}{2} + \frac{f(x)-f(-x)}{2}\), where the first term is even and the second is odd. So the whole space is a direct sum of the two subspaces.
The two subspaces are orthogonal:
The integral of an odd function over the whole real line is \(0\).
Fourier Series¶
Note how the Fourier series is the projection of a periodic function onto an orthonormal basis of complex exponentials:
Here, the inner product is on \(L^2([0,1])\), so:
Functions which are not smooth (either discontinuous or possessing discontinuous \(n\)th derivative) require infinite sum of complex exponential basis elements, since finite sum of smooth functions is smooth.
Fourier Transform¶
What is \(\mathcal{F}f\)? It's the Fourier transform of \(f\). Here is the definition:
\(\mathcal{F}f\) is the Fourier transform of \(f\), giving the representation of \(f\) in the eigenbasis of the 2nd derivative operator.
\(f\) is the inverse Fourier transform of \(\mathcal{F}f\). The Fourier transform of \(f(x)\) projects \(f\) onto a basis of complex exponentials and the inverse Fourier transform of \(f\) builds up \(F^{-1}f\) from a basis. This duality between projection and linear combination is at first a little weird looking.
Inverse of \(\mathcal{F}^{-1}\):¶
Key properties:
Note the consequence, via the shift theorem that \(\mathcal{F}\delta(t) = 1\)
Extending Fourier analysis to linear functionals¶
There's a story about the analysis part of Fourier analysis, namely finding classes of functions for which the Fourier transform is a well defined unitary isomorphism. One clear problem is that we want things like the \(\delta\) function to be such that \(\mathcal{F}\delta = 1\), but it has none of the properties of a function. Not to mention the Fourier transform of, e.g. a complex exponential.
Delta
The Dirac delta is a continuous analog to the Kronecker delta \(\delta_{ij}\), which is \(0\) except when \(i=j\) and is then \(1\). We want the same but in a continuous space. In particular, we want that \(\int_{-\infty}^{\infty}\delta(x)dx=1\) and \(\langle \delta_{x_0},f\rangle = f(x_0)\). No function has these properties, and the Dirac \(\delta\) is actually a functional.
In summary, one common approach is to first take some quite restricted set of well-behaved functions (such as the Schwartz functions, which are functions which decay very rapidly) and then obtain "distributions" (not to be confused with the probabilistic sense), which are linear functionals on this restricted set.
More concretely, call the restricted set \(\mathcal{S}\), and for a function \(f\) such that \(\forall h\in \mathcal{S}: \langle f, h \rangle \leq \infty\), let \(T_f(g) = \langle f, g \rangle\), where \(T_f \in \mathcal{T}\), the set of distributions, i.e. linear functionals \(\mathcal{S}\to \C\).
Because of the infinite setting we're in, \(\lambda f \to T_f\) is not an isomorphism (I believe), which is to say that there will be operators \(T \in \mathcal{T}\) which do not come from dualizing a function in \(\mathcal{S}\). The prominent example being \(\delta\).
\(\delta\) and the like we just define directly by their action on \(f\in \mathcal{S}\). For example, \(\delta f = f(0)\).
It's then straightforward to define the Fourier transform on \(\mathcal{T}\) in the usual way you define things on dual objects, namely \((\mathcal{F}T)f = T(\mathcal{F}f)\).
What we get is the extremely useful ability to take things which integrate to infinity on the real line, and take the Fourier transform of them. In poor notation, people write \(\langle T, f \rangle\) to mean \(Tf\).
Terminological note: when \(\mathcal{S}\) is the Schwartz functions, \(\mathcal{T}\) is called the set of tempered distributions.
Delta function as a distribution¶
What is the identity of the convolution monoid? It's \(\delta\), since, as we know, \(\mathcal{F}\delta = 1\). Making this concrete, we have that \(\delta * f = f\).
Actually, extending this to distributions is a little tricky, because you can't take a product of arbitrary distributions, so you have to just define convolution of a distribution with a function in \(\mathcal{S}\).
People often write \(\delta(x-b)\) to mean \(\tau_b\delta\), although this is not well-typed. With that in mind, the two key properties of \(\delta\):
Fourier transform as unitary operator¶
Unitarity of Fourier transform is known as Parseval's theorem:
Taking any (appropriate) functions \(f, g\), and writing \(g(x) = \int_{-\infty}^{\infty}\mathcal{F}g(k)e^{2\pi i k x}dk\), we have:
Discrete Fourier Transform¶
The operates on discrete n-periodic signals, which can be represented by vectors of length n. As such, it is just a linear map, written in Einstein notation as:
(Note that the sum starts with \(n=0\))
Discrete Time Fourier Transform¶
N-dimensional Fourier transform¶
The Fourier transform (as well as its relatives) is easily extended to functions of type \(\R^n\to\C\) (yielding functions of the same type). The definition, for \(f\) of this type:
In this setting, we have some new facts. First, an extension of the shift theorem which more or less follows from the change of variables formula:
Second, separability:
Note that the tensor product, in this setting is just: \((f \otimes g)(x) = f(x)g(x)\), so nothing conceptually fancy here.
Various important functions and facts¶
Step function¶
Define \(\theta(t)\) as \(0\) for \(t < 0\) and \(1\) otherwise.
\(\mathcal{F}(\theta)(s) = \frac{1}{2}(\delta(s) + \frac{1}{\pi i s})\).
\(\Pi\) and sinc¶
\(\Pi_p\) is the function which is \(1\) for \(|x|\lt p\) and \(0\) otherwise.
Its Fourier transform is \(sinc(x)=\frac{\sin (\pi x)}{\pi x}\)
Gaussian function¶
Gaussian function \(f(x)=e^{-\pi x^2}\) is fixed point of \(\mathcal{F}\):
Showing that is a nice application of ODEs, using the notation from these notes where \(F\) is not \(\mathcal{F}f\):
Start by observing that for \(F(x(t),t)=-2\pi tx(t)\), the corresponding solution to the ODE \(\frac{d\phi(x_0)(t)}{dt} = F(\phi(x_0)(t))\) is \(\phi_F(1,t)=e^{-\pi t^2}\).
The goal is to show that
This amounts to the claim that the Gaussian is an eigenfunction of the Fourier transform. We do so as follows:
Dirac Comb¶
The Dirac comb \({III}_p(x) = \sum_{n=-\infty}^{\infty}\delta(x-kp)\), in particular, \(III_1:=III\) is also a fixpoint of \(\mathcal{F}\), since, using the Poisson summation formula
Convolving with the comb, i.e. applying the function \((III *)\), periodizes.
Multiplying by the comb samples.
Poisson summation formula¶
Linear Systems¶
A linear system \(L\) is just a linear operator on a function space, taking \(v(t)\) to \(Lv(t)=w(t)\). Note that like an operator on a finite dimensional space, it can be characterized by how it acts on (the equivalent of) a basis, namely, let \(h(x,y)=L\delta(x-y)\). Then:
Here, \(h\) is known as the impulse response and \(H=\mathcal{F}h\) as the transfer function.
Linear time invariant systems¶
\(L\) is linear and time invariant (LTI) if \(L\tau_b=\tau_bL\).
\(L\) being LTI is equivalent to \(L=(h*)\) for some \(h\). In that case, complex exponentials \(e^{2\pi i st}\) are the eigenvectors, with eigenvalues \(\mathcal{F}h(s)\). To see that, note that for \(w=L(e^{2\pi i vx})=h*e^{2\pi i vx}\):
Causal systems¶
A system is causal if for an input function \(f\), the output \(Lf\) at time \(t\) only depends on the input \(f\) at times \(s\leq t\). A convenient definition is that if \(\forall t\leq t_0, v_1(t)=v_2(t) \Rightarrow Lv_1(t)= Lv_2(t)\).
For linear time invariant systems, we have that a system \(L\) is causal iff \(\forall t < 0, v(t) = 0 \Rightarrow Lv(t) = 0\). This is convenient, since we can view functions with \(\forall t <0, f(t)=0\) as objects in a category with LTI causal systems as morphisms (since they preserve this property).
LTI Causal systems¶
Systems which are both LTI and causal have frequent physical applications and useful properties.
First recall that
so that if we solve the integral by countour integration in the upper half plane, which is to say, let \(\omega\) range over complex numbers with a positive imaginary part, we see that \(e^{-i\omega t}\) decays to \(0\) when \(t\) is negative, so that the integral is the sum of the residues of \(\mathcal{F}f\) at the poles in the upper half plane.
But for \(t<0\), \(f(t)=0\). This means that there must be no poles in the upper half plane of \(\mathcal{F}f\).
Kramers-Kronig¶
The Fourier transform of an LTI causal system has some noteworthy properties that are easy to deduce.
First note that for the step function \(u\), and a causal function \(f\), we have:
and
Hilbert transform¶
\(\mathcal{H}(f)(x) := (-\frac{1}{\pi x} * f)(x) = \frac{1}{\pi}\int_{-\infty}^\infty dy \frac{f(y)}{y-x}\)
where we avoid the singularity at \(y=x\) by taking an obvious limit, known as the Cauchy principal value.
Sampling theory¶
Limited functions¶
A function \(f\) is time-limited at \(L\) if for any \(|t|\gt L\), \(f(t)=0\).
A function \(f\) is band-limited at \(B\) if for any \(|k|\gt B\), \(\mathcal{F}f(k)=0\).
A function cannot be both time and band limited. To see this, suppose \(f\) is bandlimited at \(B\). Then \(\mathcal{F}f = \Pi_B\mathcal{F}f\), which means that \(f = \mathcal{F}^{-1}\mathcal{F}f = p\cdot sinc(pt)*f(t)\).
Orthonormal basis of band-limited functions¶
Generalizing the above point, and taking \(f\) bandlimited:
Note that the first step uses the compact support of \(\mathcal{F}f\) (i.e. the band-limitedness of \(f\)), because it says: periodize and then chopping back does nothing. The significance of the above result is that a countably infinite set of samples of \(f\) suffice to represent the whole function. That is cool and moreover, important.
It's further clear that the sampling rate is the reciprocal of the bandwidth, i.e. twice the max frequency (in absolute value). This is the Nyquist frequency.
One can also obtain a similar result in which a periodic bandlimited function is the sum of a finite number of quotients of trig functions (it's a bit more fiddly).
Solving differential equations¶
PDEs¶
A good example of the usefulness of convolution is that the solution to the heat equation (on an infinite rod), i.e. \(u_t = \frac{1}{2}u_{xx}\) is
where \(g(x,t)=\frac{1}{\sqrt{2\pi t}}e^{\frac{-x^2}{2t}}\), i.e. the PDF of a Gaussian with \(\mu=0,\sigma^2=t\). We should understand the above equation as saying: take the initial heat distribution \(u(x,0)\), and convolve it with the kernel \(g(x,t)\) to move forward to time \(t\).