CSCI 5822

For Parts 1a and 1b, you will explore Bayesian optimization using Gaussian processes. I would like you to get first hand experience setting priors over hyperparameters, optimizing the hyperparameters, iteratively running experiments, and observing the convergence to a global optimum. You are welcome to write your own software, but you can also use existing software packages. Here are a few packages I know about: MOE, Spearmint, BayesOpt, PyBO.

For Parts 2a-c, you will be exploring timeseries models.

Part 1a

Write code to implement a function over which we will optimize. This function takes an input vector and outputs a function value. I'd suggest using a 2-dimensional input space in order that you can easily visualize the function as well as the progress of Bayesian optimization. For example, your function might be based on a mixture of Gaussians added to a linear function of the inputs. Make the function complex enough that there are some local optima and that the global optimum isn't in one corner of the space. I'd like your function to add noise to the returned value, e.g., Gaussian mean zero noise. Make a plot of the expected value of the function.

Part 1b

Run Bayesian optimization with queries being answered by a call to your function. For the Bayesian optimization, do not build in any knowledge you have about the solution, e.g., allow the optimization procedure to estimate the function mean and observation noise variance. Describe the assumptions you made in your model and report on the outcome of experiments, e.g., report how close the estimated optimum is from the true optimum as more experiments are performed. To get a reliable estimate of convergence, you'll need to run the optimization process multiple times.

Test at least two different GP covariance functions, and run the optimization process enough times you can determine which covariance function converges most rapidly. Summarize your results in graphs and a few sentences.

Test at least two different acquisition functions (i.e., schemes for performing active selection), and run the optimization procedure enough times that you can determine which acquisition function is more effective. Summarize your results in graphs and a few sentences.

Part 2a

Generate a time series using a special case of a linear dynamical system that is similar to an autoregressive model in the time series literature.The hidden state of the dynamical system will be a vector with 2 elements, which evolve over time according to:

y₂(t) = y₁(t-1) + μ₁
y₁(t) = α₁ y₁(t-1) + α₂ y₂(t-1) + μ₂

The observation will be a scalar:

z(t) = y₁(t) + μ₃

where for i={1,2,3}, μ_i ~ Normal(0,σ²) and you should pick {α_1,α₂, σ} to obtain interesting dynamics.

Plot a couple hundred points that are representative of this process.

Part 2b

Turn your model into a switched linear dynamical system with 3 modes. Each mode has an associated set of {α_1,α₂} parameters.
Switching is memoryless, i.e., at each time with some small probability β, the mode m, is re-drawn from the set {1,2,3} with uniform probability.

Draw a graphical model that depicts this switched dynamical system. Show the conditional distributions on each arc, e.g., write out the transition table for P(m(t) | m(t-1))

Generate a nice graph illustrating the switching dynamics.

Part 2c

Now try to perform inference using the data set you generated in either part 2a (easy option) or part 2b (harder option).

If you use the data set from part 2a, then you can model it with a Kalman filter. Implement a KF and plot a portion of the time series along with the posterior predictive distribution. The posterior predictive distribution is P(z(t+1) | z(1), ..., z(t)). Since this distribution is Gaussian, plot the mean and some representation of uncertainty (e.g., lines at +/- 2 SD, or the shaded representation one sometimes sees.)

If you use the data set from part 2b, you could be ambitious and use a particle filter to perform inference, in which case you'll obtain a set of samples that represent the posterior predictive distribution. Or you could try using a model that isn't quite up to the task, e.g., an HMM with a Gaussian observation model (i.e., each state of the HMM outputs a constant value corrupted by Gaussian noise). You'd have to train the HMM, and it would require more than 3 states, since each state represents a constant level, but it would be interesting to see how poorly it models the data.

Probabilistic Models of
Human and Machine Learning

CSCI 5822

Assignment 9

Assigned 4/12/2018
Due 5/3/2018

Your options for this assignment

Option 1: Final Project

Option 2: Bayesian Optimization and Time Series

Part 1a

Part 1b

Part 2a

Part 2b

Part 2c

Probabilistic Models of Human and Machine Learning

CSCI 5822

Assignment 9

Assigned 4/12/2018 Due 5/3/2018

Your options for this assignment

Option 1: Final Project

Option 2: Bayesian Optimization and Time Series

Part 1a

Part 1b

Part 2a

Part 2b

Part 2c

Probabilistic Models of
Human and Machine Learning

Assigned 4/12/2018
Due 5/3/2018