Exciting news!

We here at Simulation Informatics want to congratulate David for being awarded a National Science Foundation CAREER award. Here’s the announcement at the Purdue Computer Science home page:

http://www.cs.purdue.edu/news/03-02-12_CAREER.html

Congratulations, David!

Some recent work

The lack of activity on this blog should not be interpreted as a lack of activity in our research. David and I are working on a few exciting possibilities for extending the simulation informatics activities, and we’ll update this site if and when they materialize.

In the meantime, here are a few recent preprints of methods I’ve been working on with various collaborators:

I just returned from the Conference on Data Analysis in Santa Fe where I presented a poster entitled MapReduce-enabled Model Reduction for Large Scale Simulation Data. Thanks to Kary Myers and the organizers for a great conference!

Minisymposium at SIAM UQ12

We are very pleased to announce that we will have a minisymposium on simulation informatics at the SIAM Conference on Uncertainty Quantification! Our speakers and their respective talks include:

  • David Gleich (Purdue, CS) and Paul Constantine (Stanford, ME) — that’s us — with an introduction to simulation informatics.
  • Qiqi Wang (MIT, Aero/Astro), Extracting non-conventional information in simulation of chaotic dynamical systems.
  • Barnabas Poczos (CMU, Machine Learning), Machine Learning to Recognize Phenomena in Large Scale Simulations.
  • Chandrika Kamath (LLNL), Scientific Data Mining Techniques for Extracting Information from Simulations.

More info on the session and abstracts for the talks can be found here. We hope to see you in Raleigh in April!

A Discrete Fourier Transform View of Polynomial Approximation Methods in Uncertainty Quantification

This post was inspired by a recent email exchange with Leo Ng — a graduate student in Karen Willcox‘s research group in MIT’s Aero/Astro department.

There is no shortage of names for polynomial approximation methods in uncertainty quantification (UQ): polynomial chaos, generalized polynomial chaos (gPC), stochastic Galerkin, stochastic collocation, non-intrusive spectral projection, pseudospectral methods, stochastic expansion methods, polynomial dimensional decomposition — did I miss any? And then there are various tweaks for improving each of these, like sparse grids, sparse sampling strategies, sparse approximation strategies (e.g., compressed sensing), anisotropic methods that preference certain input variables over others, or dimension reductions that ignore unimportant input variables. Everyone has his or her favorite name, and the variations are easy to mix and match to suit a particular application (or publish a new paper).

The common element here is the polynomial approximation; I have an unknown function – presumably sufficiently smooth – that I want to approximate with a polynomial of the input variables. How do I arrange the monomial terms into a set of basis polynomials? And once I decide on that, how do I compute coefficients of a series in that bases such that the resulting polynomial is close to the unknown function? Often there is a system of equations that defines the function implicitly, like a partial differential equation with operators that depend on input parameters representing uncertainty. In such cases, the system residual can be used to build a minimization problem to compute the coefficients — and this gives us the Galerkin approximation. I’ll leave the Galerkin system for another post and instead focus on the case where I have a function that I can evaluate directly given values for the inputs — the so-called non-intrusive variety.

I’m going to discuss these polynomial approximation ideas using terms more familiar to folks in signal processing, like discrete Fourier transform. This is certainly not the first time this has been done with polynomial approximation. In fact, the top-notch team on Nick Trefethen‘s Chebfun project has been analyzing and implementing polynomial approximation in this manner since it got started. They take advantage of the connections between Chebyshev polynomials/points and cosine transforms to implement fast transform algorithms.

In contrast, I’ll discuss similar transforms using Gaussian quadrature points, since these are more common in UQ due to their connections with density functions of random variables. Unfortunately, we won’t get any fast algorithms out of this approach; all of the transform algorithms will be “order en squared” instead of “order en log en.” This isn’t typically a problem in UQ applications where we assume that the function evaluations are significantly more costly than any transform.

Okay, why do I keep mentioning transforms? Let’s get to that.

Assume I have a function f(x) that maps the interval [-1,1] into the reals. Equip the input space with a uniform measure of 1/2, which seasoned UQ-ers will recognize as the case corresponding to a uniform random variable x. I want to approximate f(x) by a polynomial in x. My approximation model is

f(x) \;\approx\; \sum_{i=1}^n \hat{f}_i\, \pi_i(x) \;=\; \hat{\mathbf{f}}^T\mathbf{\pi}(x),

where \pi_i(x) is the i-1 degree polynomial from the set of orthonormal Legendre polynomials. Note that I wrote this in linear algebra notation with \hat{\mathbf{f}} being an n-vector of coefficients and \mathbf{\pi}(x) being a vector containing the polynomial bases. This notation will really come in handy.

If we set the coefficients to be the Fourier (or spectral) coefficients,

\hat{f}_i = \int_{-1}^1 f(x)\,\pi_i(x)\,dx,

then the resulting polynomial would be the best nth degree polynomial approximation to f in the mean-squared sense. But we can’t compute integrals with f in the integrand if we don’t know f! So we instead approximate them with a Gaussian quadrature rule. Let x_j and w_j be the points and weights, respectively, of the n-point Gauss-Legendre quadrature rule. We’ll come back to why we use a rule with the same number of points as terms in the expansion.

To approximate the Fourier coefficients, we compute

\hat{f}_i = \sum_{j=1}^n f(x_j)\,\pi_i(x_j)\,w_j,

which are the pseudospectral coefficients. (Get it?) We can write all of these sums  — one for each i — conveniently using linear algebra notation. Let \mathbf{f} be the vector of all the f_j, i.e., all the evaluations of f at the quadrature points. Define the matrix \mathbf{P} with elements

\mathbf{P}_{ij} = \pi_i( x_j ).

Then define a diagonal matrix \mathbf{W}^2 with \mathbf{W}^2_{ii}=w_i. The squaring will make sense in a moment. Now we can write the computation of the pseudospectral coefficients as

\hat{\mathbf{f}} = \mathbf{P}\mathbf{W}^2\mathbf{f},

and I’ll let you verify that if it’s not obvious. The next thing to note is that the matrix \mathbf{Q}=\mathbf{P}\mathbf{W} is orthogonal, that is \mathbf{Q}^{-1}=\mathbf{Q}^T. There are two ways to verify this: One is to know — like I do — that \mathbf{Q} is the matrix of normalized eigenvectors of the symmetric, tridiagonal Jacobi matrix of recurrence coefficients for the Legendre polynomials. The other way is to use the polynomial exactness of the Gaussian quadrature rule for one side and the Christoffel-Darboux formula for the other side. I’ll let the interested reader pursue this further.

With \mathbf{Q}, we can write

\hat{\mathbf{f}} = \mathbf{Q}\mathbf{W}\mathbf{f},

which is the discrete Fourier transform, and

\mathbf{f} = \mathbf{W}^{-1}\mathbf{Q}^T\hat{\mathbf{f}},

which is the inverse discrete Fourier transform. Voila! In signal processing parlance, these operations transform the function between physical space (the Gauss points) and coefficient space (the pseudospectral coefficients).

This is the primary reason I keep the number of points in the quadrature rule the same as the number of terms in the polynomial expansion. It is not strictly necessary in other derivations. (At some point I’ll post about least-squares methods.) But you can see from this perspective that you lose the forward/inverse discrete transform relationship without it.

If you look carefully at the inverse discrete transform, you see that it’s just evaluating the expansion at the quadrature points and getting back the true function at the quadrature points. If you remember that the polynomial interpolant of degree n-1 on a set of n distinct points is unique, then you’ve just proven that the Lagrange interpolant at the Gauss quadrature points — which in UQ we call stochastic collocation — is equivalent to the pseudospectral approximation. Cool, huh?

All of this works for univariate functions, and it extends to multi-variate functions with tensor product constructions; the multivariate transform operators become Kronecker products of the 1d transform operators. However, if you give up tensor product construction, then it gets really messy really quickly. We’ll leave that for the subject of another post.

Stacked Monte Carlo

One of the themes of simulation informatics is a blending of ideas from statistics and machine learning and how they apply to problems in simulation.  A beautiful example of such a fusion is a recent idea from

Brendan Tracey, David Wolpert, Juan J. Alonso.  Using Supervised Learning to Improve Monte Carlo Integral Estimation.  arXiv:1108.4870

These authors tackle the problem of how to reduce the variance in a Monte Carlo estimation of an integral.  The standard Monte Carlo scheme is:

I[f] = \int_D f(x) \, dx \approx \frac{1}{k} \sum_{i=1}^k f(x_i)

where each sample x_i is drawn from a uniform random distribution over the domain D.  (Note that they address more complicated probability spaces in the paper and include a discussion of importance sampling, but these complicate the essence of the idea.)  This estimate is known to be bias-free, by which we mean that the expectation of the sum is equal to the integral.  But errors arise in Monte Carlo estimates because the variance of the random estimate depends on the variance of the function f(x).  In contrast, quadrature integration formulae have zero variance, but have abias term.  Consequently, the authors hope to improve a Monte Carlo estimate by reducing its variance.

To do so, they combine two ideas.  First, suppose that we have a function g(x) that is a reasonable approximation to f(x).  If it’s a good approximation, then we can write:

I[f] = \int_D \alpha g(x) \, dx + \int_D f(x) - \alpha g(x) \, dx

where the function f(x) - \alpha g(x) will have smaller variance than f(x) itself.  When g(x) is a function that is easy to integrate exactly, then using a Monte Carlo estimate of I[f - \alpha g] will produce a more accurate estimate of the underlying integral.

But where can we such a function g(x)?  This is where the second idea arises.  We don’t get a single function g(x).  We actually get a set of functions g(x) from a cross-validation study, which brings up another simulation infomatics theme: treating functions like data.  Suppose we have 20 evaluations of f(x) over the real interval [-1, 1].  With any four of them, we can fit a unique cubic polynomial.  Such a polynomial is the function g(x), and something that is easy to integrate exactly.  Consequently, we have the left hand term \int_D \alpha g(x) \, dx in our approximation.  (The issue of how to pick \alpha is discussed in the paper, and we don’t deal with it here.) The remaining 16 points will let us compute a Monte Carlo estimate of the residual integral \int_D f(x) - \alpha g(x) \, dx.  In a standard cross-validation study, these values would be used to pick the best cubic model of the data by minimizing the error on the “hold-out” set, the remaining 16 examples.  Here, they constitute useful data!  The cross-validation idea arises because the authors propose to do this for all of the cross-validation sets.  That is, we divide the points into 5 bins of 4 points, and then average the 5 estimates we get from each of the 5 functions g(x).  Hence, the term “stacked Monte Carlo” it’s a Monte Carlo estimate inside of a Monte Carlo estimate.

Here is a Matlab code that exemplifies the idea:

%% A quick example of Stacked Monte Carlo integration

%% Setup the data
f = @(x) 1./(1.1-x);
xvals = rand(20,1)*2 - 1;
fvals = f(xvals);

%% Compute the Monte Carlo estimate
mc = mean(fvals)

%%
% Boy that was easy!
trueint = log(2.1) - log(0.1) 

%%
% But it's really wrong.

%% Compute the stacked Monte Carlo estimate
nsets = 5;
nsetvals = 4;
alpha = 0.5;

ests = zeros(nsets,1);

for seti=1:nsets
    trainset = nsetvals*(seti-1)+1:nsetvals*seti;
    testset = setdiff(1:nsetvals*nsets,trainset);

    % compute the integral of the function g
    xtrain = xvals(trainset);
    ftrain = fvals(trainset);
    gcoeffs = polyfit(xtrain,ftrain,nsetvals-1);
    intgcoeffs = polyint(gcoeffs);
    intg = polyval(intgcoeffs,1)-polyval(intgcoeffs,-1);

    % compute the MC estimate of the remainder
    xtest = xvals(testset);
    ftest = fvals(testset);
    intr = mean(ftest - alpha*polyval(gcoeffs,xtest));

    ests(seti) = alpha*intg + intr;
end

smc = mean(ests)

When I run this, I get

mc =
    1.7526

trueint =
    3.0445

smc =
    2.2405

I repeated this a few times.  the smc estimate is always better.  Playing around with \alpha shows that using larger alpha produces a better estimate.  But it’s still great to get better accuracy from a rather simple modification.

A Simple Test Problem for Multivariate Quadrature

I’ve found it difficult to find really simple test problems for multivariate quadrature ideas. Here’s one that I use over and over. It’s useful for testing Gaussian type quadrature rules and polynomial approximation. Its solution contains many of the features seen in the solution of an elliptic PDE with parameterized coefficients and a constant forcing term.

Let x=(x_1,x_2)\in [-1,1]^2. Given parameters \delta_1>1 and \delta_2>1, define the function

f(x) = \frac{1}{(x_1-\delta_1)(x_2-\delta_2)}.

Notice that f is monotonic, and f(x)>0. The parameters \delta_1 and \delta_2 determine how quickly f grows near the boundary. The closer \delta_1 and \delta_2 are to 1, the closer the singularity in the function gets to the domain, which determines how large f is at the point (x_1=1,x_2=1). Anisotropy can be introduced in the function by choosing \delta_1\not=\delta_2. Based on experience, I’ve seen that 1<\delta_1,\delta_2<1.1 are difficult functions to approximate with polynomials, while \delta_1,\delta_2\approx 2 will be a very easy functions to approximate with polynomials. For any choice of \delta_1,\delta_2>1, the function f is analytic in x, so we expect polynomial approximations to converge exponentially as the degree of approximation increases. However, the rate of exponential convergence and the number of polynomial terms required to achieve a given error tolerance will depend heavily on \delta_1 and \delta_2.

I’ll update this with some Matlab examples shortly.