This post was inspired by a recent email exchange with Leo Ng — a graduate student in Karen Willcox‘s research group in MIT’s Aero/Astro department.
There is no shortage of names for polynomial approximation methods in uncertainty quantification (UQ): polynomial chaos, generalized polynomial chaos (gPC), stochastic Galerkin, stochastic collocation, non-intrusive spectral projection, pseudospectral methods, stochastic expansion methods, polynomial dimensional decomposition — did I miss any? And then there are various tweaks for improving each of these, like sparse grids, sparse sampling strategies, sparse approximation strategies (e.g., compressed sensing), anisotropic methods that preference certain input variables over others, or dimension reductions that ignore unimportant input variables. Everyone has his or her favorite name, and the variations are easy to mix and match to suit a particular application (or publish a new paper).
The common element here is the polynomial approximation; I have an unknown function – presumably sufficiently smooth – that I want to approximate with a polynomial of the input variables. How do I arrange the monomial terms into a set of basis polynomials? And once I decide on that, how do I compute coefficients of a series in that bases such that the resulting polynomial is close to the unknown function? Often there is a system of equations that defines the function implicitly, like a partial differential equation with operators that depend on input parameters representing uncertainty. In such cases, the system residual can be used to build a minimization problem to compute the coefficients — and this gives us the Galerkin approximation. I’ll leave the Galerkin system for another post and instead focus on the case where I have a function that I can evaluate directly given values for the inputs — the so-called non-intrusive variety.
I’m going to discuss these polynomial approximation ideas using terms more familiar to folks in signal processing, like discrete Fourier transform. This is certainly not the first time this has been done with polynomial approximation. In fact, the top-notch team on Nick Trefethen‘s Chebfun project has been analyzing and implementing polynomial approximation in this manner since it got started. They take advantage of the connections between Chebyshev polynomials/points and cosine transforms to implement fast transform algorithms.
In contrast, I’ll discuss similar transforms using Gaussian quadrature points, since these are more common in UQ due to their connections with density functions of random variables. Unfortunately, we won’t get any fast algorithms out of this approach; all of the transform algorithms will be “order en squared” instead of “order en log en.” This isn’t typically a problem in UQ applications where we assume that the function evaluations are significantly more costly than any transform.
Okay, why do I keep mentioning transforms? Let’s get to that.
Assume I have a function that maps the interval into the reals. Equip the input space with a uniform measure of , which seasoned UQ-ers will recognize as the case corresponding to a uniform random variable . I want to approximate by a polynomial in . My approximation model is
where is the degree polynomial from the set of orthonormal Legendre polynomials. Note that I wrote this in linear algebra notation with being an -vector of coefficients and being a vector containing the polynomial bases. This notation will really come in handy.
If we set the coefficients to be the Fourier (or spectral) coefficients,
then the resulting polynomial would be the best th degree polynomial approximation to in the mean-squared sense. But we can’t compute integrals with in the integrand if we don’t know ! So we instead approximate them with a Gaussian quadrature rule. Let and be the points and weights, respectively, of the -point Gauss-Legendre quadrature rule. We’ll come back to why we use a rule with the same number of points as terms in the expansion.
To approximate the Fourier coefficients, we compute
which are the pseudospectral coefficients. (Get it?) We can write all of these sums — one for each — conveniently using linear algebra notation. Let be the vector of all the , i.e., all the evaluations of at the quadrature points. Define the matrix with elements
Then define a diagonal matrix with . The squaring will make sense in a moment. Now we can write the computation of the pseudospectral coefficients as
and I’ll let you verify that if it’s not obvious. The next thing to note is that the matrix is orthogonal, that is . There are two ways to verify this: One is to know — like I do — that is the matrix of normalized eigenvectors of the symmetric, tridiagonal Jacobi matrix of recurrence coefficients for the Legendre polynomials. The other way is to use the polynomial exactness of the Gaussian quadrature rule for one side and the Christoffel-Darboux formula for the other side. I’ll let the interested reader pursue this further.
With , we can write
which is the discrete Fourier transform, and
which is the inverse discrete Fourier transform. Voila! In signal processing parlance, these operations transform the function between physical space (the Gauss points) and coefficient space (the pseudospectral coefficients).
This is the primary reason I keep the number of points in the quadrature rule the same as the number of terms in the polynomial expansion. It is not strictly necessary in other derivations. (At some point I’ll post about least-squares methods.) But you can see from this perspective that you lose the forward/inverse discrete transform relationship without it.
If you look carefully at the inverse discrete transform, you see that it’s just evaluating the expansion at the quadrature points and getting back the true function at the quadrature points. If you remember that the polynomial interpolant of degree on a set of distinct points is unique, then you’ve just proven that the Lagrange interpolant at the Gauss quadrature points — which in UQ we call stochastic collocation — is equivalent to the pseudospectral approximation. Cool, huh?
All of this works for univariate functions, and it extends to multi-variate functions with tensor product constructions; the multivariate transform operators become Kronecker products of the 1d transform operators. However, if you give up tensor product construction, then it gets really messy really quickly. We’ll leave that for the subject of another post.