Neural Networks and Deep Learning

Spring 2017

Due: Thu Nov 16, 2017

The goal of this assignment is to explore
recurrent neural nets (RNNs) and to understand their
limitations..

I would like you to implement a recurrent
neural net to learn the parity operator. The net will have a
single input unit and a single output unit, and a
fully-connected layer of *H* hidden units. The
inputs and target outputs are binary. When an input sequence is
presented, the output state at the end of the sequence should be
a parity bit: output should be 1 if the input has an odd
number of '1' values. For example, the sequence 1-0-0-1-0-1
should yield output 1 and the sequence 0-0-0-0-1-1 should yield
output 0. Note that a target is given only at the end of each
sequence. (Parity is easy to learn if there is a target at each
step that indicates parity given the sequence so far.)

Parity is a hard problem for neural nets to learn because very similar inputs produce different outputs, and very dissimilar inputs can produce the same output.

The aspects of the task we will manipulate are:*H*,
the number of hidden units, *N*, the length of the input
strings, and the activation function for the hidden units,
either tanh or LSTM-style neurons. The output neuron should have
a logistic activation function.

Parity is a hard problem for neural nets to learn because very similar inputs produce different outputs, and very dissimilar inputs can produce the same output.

The aspects of the task we will manipulate are:

Tensorflow has built in recurrent net
functionality via tf.contrib.rnn.BasicRNNCell and
tf.contrib.rnn.BasicLSTMCell. More help may be on its way. Denis
and I are thinking of providing you with a shell of the code.

Denis wrote some code to generate random data strings for training:

Denis wrote some code to generate random data strings for training:

def generate_parity_sequences(N, count):

"""

Generate :count: sequences of length :N:.

If odd # of 1's -> output 1

else -> output 0

"""

xor = lambda x: 1 if (x % 2 == 1) else 0

sequences = np.random.choice([0, 1], size=[count, N], replace=True)

counts = np.count_nonzero(sequences == 1, axis=1)

# xor each sequence, expand dimensions by 1 to match sequences shape

y = np.expand_dims(np.array([xor(x) for x in counts]), axis=1)

# In case if you wanted to have the answer just appended at the end of the sequence:

# # append the answer at the end of each sequence

# seq_plus_y = np.concatenate([sequences, y], axis=1)

# print(sequences.shape, y.shape, seq_plus_y.shape)

# return seq_plus_y

return np.expand_dims(sequences, axis=2), y

Set your code up to train a net given *H* and *N. *Each time you run the code, it should randomize the initial weights and generate a random training set of 10000 examples of length *N*. Also generate a random test set of 10000 examples of length *N*.

Train your net for*H* ∈ {5, 25} and for *N* ∈ {2, 10, 25, 50}. Use an RNN with tanh activation functions. For each combination of *H* and *N*, run 10 replications of your simulation.

Make a graph of mean % correct on the test set for the different values of*H* and *N*. I'll be more impressed if you plot not only the mean but also the standard error of the mean (= standard deviation of the 10 replications divided by sqrt(10) ).

Train your net for

Make a graph of mean % correct on the test set for the different values of

Repeat the experiment of Part 1, but use LSTM neurons instead of standard tanh neurons in the recurrent layer.