I'm hoping that folks in the class would like
to do a final project. I will describe a couple of options in
class, and some of your colleagues may propose projects as
well. As an alternative to a final project, I give you the
option of doing a relatively larger homework assignment, which
is laid out below. This homework assignment is a bit more open
ended than the previous assignments, so you have an opportunity
to be creative and explore.
If you are concerned with grades or making up for missed work,
we will look most favorably on an ambitious final project
(Option 1), then next most favorably on Option 2 if you go above
and beyond the bare minimum.
For Parts 1a and 1b, you will explore Bayesian optimization
using Gaussian processes. I would like you to get first hand
experience setting priors over hyperparameters, optimizing the
hyperparameters, iteratively running experiments, and
observing the convergence to a global optimum. You are welcome
to write your own software, but you can also use existing
software packages. Here are a few packages I know about:
MOE, Spearmint, BayesOpt, PyBO.
For Parts 2a-c, you will be exploring timeseries models.
Part 1a
Write code to implement a function over
which we will optimize. This function takes an input vector
and outputs a function value. I'd suggest using a
2-dimensional input space in order that you can easily
visualize the function as well as the progress of Bayesian
optimization. For example, your function might be
based on a mixture of Gaussians added to a linear function of
the inputs. Make the function complex enough that there are
some local optima and that the global optimum isn't in one
corner of the space. I'd like your function to add noise to
the returned value, e.g., Gaussian mean zero noise. Make a
plot of the expected value of the function.
Part 1b
Run Bayesian optimization with queries
being answered by a call to your function. For the Bayesian
optimization, do not build in any knowledge you have about the
solution, e.g., allow the optimization procedure to estimate
the function mean and observation noise variance.
Describe the assumptions you made in your model and
report on the outcome of experiments, e.g., report how close
the estimated optimum is from the true optimum as more
experiments are performed. To get a reliable estimate of
convergence, you'll need to run the optimization process
multiple times.
Test at least two different GP covariance functions, and run
the optimization process enough times you can determine which
covariance function converges most rapidly. Summarize your
results in graphs and a few sentences.
Test at least two different acquisition functions (i.e.,
schemes for performing active selection), and run the
optimization procedure enough times that you can determine
which acquisition function is more effective. Summarize your
results in graphs and a few sentences.
Part 2a
Generate a time series using a special case
of a linear dynamical system that is similar to an
autoregressive model in the time series literature.The hidden
state of the dynamical system will be a vector with 2
elements, which evolve over time according to:
y2(t) = y1(t-1) + μ1
y1(t) = α1 y1(t-1)
+ α2 y2(t-1) + μ2
The observation will be a scalar:
z(t) = y1(t) + μ3
where for i={1,2,3}, μi ~ Normal(0,σ2)
and you should pick {α1,α2, σ} to
obtain interesting dynamics.
Plot a couple hundred points that are representative of this
process.
Part 2b
Turn your model into a switched linear
dynamical system with 3 modes.
Each mode has an associated set of {α1,α2}
parameters.
Switching is memoryless, i.e., at each time with some small
probability β, the mode m,
is re-drawn from the set {1,2,3} with uniform probability.
Draw a graphical model that depicts this switched dynamical
system. Show the conditional distributions on each arc,
e.g., write out the transition table for
P(m(t) | m(t-1))
Generate a nice graph illustrating the switching dynamics.
Part 2c
Now try to perform inference using the data
set you generated in either part 2a (easy option) or part 2b
(harder option).
If you use the data set from part 2a, then you can model it
with a Kalman filter. Implement a KF and plot a portion of the
time series along with the posterior predictive distribution.
The posterior predictive distribution is
P(z(t+1) | z(1), ..., z(t)). Since this
distribution is Gaussian, plot the mean and some
representation of uncertainty (e.g., lines at +/- 2 SD, or the
shaded representation one sometimes sees.)
If you use the data set from part 2b, you could be ambitious
and use a particle filter to perform inference, in which case
you'll obtain a set of samples that represent
the posterior predictive distribution. Or you could try
using a model that isn't quite up to the task, e.g., an HMM
with a Gaussian observation model (i.e., each state of the HMM
outputs a constant value corrupted by Gaussian noise). You'd
have to train the HMM, and it would require more than 3
states, since each state represents a constant level, but it
would be interesting to see how poorly it models the data.