Neural Networks and Deep Learning

Spring 2017

Due: Tue Sep 26, 2017

The goal of this assignment is to build a
one-hidden-layer back propagation network to process real
data. For this assignment, I want you to implement the
neural net (activation function and training code) yourself, not
using tensorflow or other software tools. The purpose is for you
to understand the nitty gritty of what these tools are doing for
you before we switch over to using the tools.

I picked a data set from the UCI Machine Learning
Repository, a nice source of data set. It consists of
experimental data used for binary classification of room
occupancy (i.e., room is occupied versus empty) based on
temperature, humidity, light, and CO_{2} sensors. The
training and test data sets are each collected over a week
period. Information about the data set and how it has been used
in academic publications can be found here.

The data set includes time stamps with date and hour/minute/second within the day. You are**not** to**
use time stamp features for predicting occupancy****. Since
this is a commercial office building, ****the ****time
stamp is a strong predictor of ****occupancy. ****Rather,
the goal is to determine whether occupancy can be
sensed from****: (1) temperature****,**** expressed
in degrees Cel****sius, (2) relative
humidity, expressed as a %, (3) ****light,
in lux, (4) CO**_{2}, in ppm, and
(5) the humidity ratio, which is derive
from the temperature and the relative
humi**d****ity.**

**The training data are to be found
here.****
**The test data are to be
found here.
There ar**e 8144 training
examples and 9753 test examples.**

The data set includes time stamps with date and hour/minute/second within the day. You are

Using the perceptron code you wrote for
Assignment 1, train a perceptron (linear activation function
with a binary threshold) using the training set. Your perceptron
should have the 5 input variables described above.

(1a) Report the training and test set performance in terms of % examples classified correctly.

Remember an important property of the perceptron algorithm: it is guaranteed to converge only if there is a setting of the weights that will classify the training set perfectly. (The learning rule corrects errors. When all examples are classified correctly, the weights stop changing.) With a noisy data set like this one, the algorithm will not find an exact solution. Also remember that the perceptron algorithm is not performing gradient descent. Instead, it will jitter around the solution continually changing the weights from one iteration to the next. The weight changes will have a small effect on performance, so you'll see training set performance jitter a bit as well.

(1a) Report the training and test set performance in terms of % examples classified correctly.

Remember an important property of the perceptron algorithm: it is guaranteed to converge only if there is a setting of the weights that will classify the training set perfectly. (The learning rule corrects errors. When all examples are classified correctly, the weights stop changing.) With a noisy data set like this one, the algorithm will not find an exact solution. Also remember that the perceptron algorithm is not performing gradient descent. Instead, it will jitter around the solution continually changing the weights from one iteration to the next. The weight changes will have a small effect on performance, so you'll see training set performance jitter a bit as well.

Write your own code to implement a
feedforward neural net with a single hidden layer. You should
have 5 input units, *H** *hidden units, and 1 output
unit. Write your own code to train the network with back
propagation. Write your code so that it loops over training
epochs, and within a training epoch it chooses mini-batches of
size *N*, where *N** *can range from 1 to the
total number of examples in the training set. As a way of
testing your code, set the learning rate very small and set *N
*to be the training set size (i.e., you're doing batch
training). In this situation, you should be guaranteed that the
error monotonically decreases over epochs of training.

(2a) Decide what error function you wish to use. Two obvious candidates are squared error and cross entropy. Report which you have picked.

(2b) As a way of verifying that your network learns something beyond prior statistics of the training set, let's compute a measure of baseline performance. As a measure of baseline, use the training set to determine the constant output level of the network, call it*C*, that will minimize your error
measure. That is, assume your net doesn't learn to respond
to the inputs, but rather gives its best guess of the output
without regard to the input. Then for any example where the
target is 0 the network will output *C *and for any
example where the target is 1 the network will output *C. *Using
your error measure, solve for *C *and compute the
baseline error. Report the baseline error.

(2c) Using a network with*H=5* hidden units, and
mini-batches of size *N=100*, select a learning rate (or a
learning rate schedule) that results in fairly consistent drops
in error from one epoch to the next, make a plot of the training
error as a function of epochs. On this graph, show a
constant horizontal line for the baseline error. If your network
doesn't drop below this baseline, there's something going awry.
For now, train your net until you're pretty sure the training
error isn't dropping further (i.e., a local optimum has been
reached).

(2d) Report the learning rate (or learning rate schedule) you used to produce the plot in (2c)

(2e) Report training and test set performance in terms of % examples classified correctly.

(2f) Now train nets with varying size,*H*, in {1, 2, 5,
10 20}. You may have to adjust your learning rates based on *H*,
or use one of the heuristics in the text for setting learning
rates to be independent of *H*. Decide when to stop
training the net based on training set performance. Make a plot,
as a function of *H*, of the training and test set
performance in terms of % examples classified correctly.

(2a) Decide what error function you wish to use. Two obvious candidates are squared error and cross entropy. Report which you have picked.

(2b) As a way of verifying that your network learns something beyond prior statistics of the training set, let's compute a measure of baseline performance. As a measure of baseline, use the training set to determine the constant output level of the network, call it

(2c) Using a network with

(2d) Report the learning rate (or learning rate schedule) you used to produce the plot in (2c)

(2e) Report training and test set performance in terms of % examples classified correctly.

(2f) Now train nets with varying size,

See how much adding information about time of
day helps the network. Add a new set of inputs that represent
the time of day. (Don't add information about day of week or
absolute date.)

(3a) Determine an appropriate representation for the time of day. Describe the representation you used. For example, you might add one unit with a value ranging from 0 to 1 for times ranging from 00:00 to 23:59. Report the representation you selected.

(3b) Train your net with*H=5* hidden and compare training
and test set performance to the net you built in (2e)

(3a) Determine an appropriate representation for the time of day. Describe the representation you used. For example, you might add one unit with a value ranging from 0 to 1 for times ranging from 00:00 to 23:59. Report the representation you selected.

(3b) Train your net with