Neural Networks and Deep Learning

CSCI 5922
Fall 2017

Tu, Th 9:30–10:45
Muenzinger D430


Professor Michael Mozer
Department of Computer Science
Engineering Center Office Tower 741
Office Hours:  Thu 11:00-12:30

Denis Kazakov
Grader and Teaching Assistant

Course Description

Neural networks have enjoyed several waves of popularity over the past half century. Each time they become popular, they promise to provide a general purpose artificial intelligence--a computer that can learn to do any task that you could program it to do. The first wave of popularity, in the late 1950s, was crushed by theoreticians who proved serious limitations to the techniques of the time. These limitations were overcome by advances that allowed neural networks to discover internal representations, leading to another wave of enthusiasm in the late 1980s. The second wave died out as more elegant, mathematically principled algorithms were developed (e.g., support-vector machines, Bayesian models). Around 2010, neural nets had a third resurgence. What happened over the past 20 years? Basically, computers got much faster and data sets got much larger, and the algorithms from the 1980s—with a few critical tweaks and improvements—appear to once again be state of the art, consistently winning competitions in computer vision, speech recognition, and natural language processing. The many accomplishments of the field have helped move research from academic journals into systems that improve our daily lives: apps that identify our friends in photos, automated vision systems that match or outperform humans in large-scale object recognition, phones and home appliances that recognize continuous, natural speech, self-driving cars, and software that translates from any language to any other language.

Neural networks are mentioned regularly in the popular press. Below is a comic strip circa 1990, when neural nets first reached public awareness. You might expect to see the same comic today, touting neural nets as the hot new thing, except that now the field has been rechristened deep learning to emphasize the architecture of neural nets that leads to discovery of task-relevant representations.

Dick Tracy

In this course, we'll examine the history of neural networks and state-of-the-art approaches to deep learning. Students will learn to design neural network architectures and training procedures via hands-on assignments. Students will read current research articles to appreciate state-of-the-art approaches as well as to question some of the hype that comes with the resurgence of popularity. We will learn and use a critical software tool for modern deep learning: TensorFlow.


The course is open to any students who have some background in cognitive science or artificial intelligence and who have taken introductory probability/statistics and linear algebra. Students must be competent in python (ironic, given that the professor is not).

Course Readings

The primary text will be Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. The text is available online by chapter in html, but it should serve as a good reference and is worth purchasing. If you wish additional background reading, I recommend:
In the second half of the course, we'll discuss current articles from the literature, all of which will be available on arXiv or other online sources. I will post links on the class-by-class syllabus.

The grandfather of the modern neural net field is Geoffrey Hinton from the University of Toronto (now at Google). He taught a Coursera class in 2012 which is a bit dated, but Geoff gives such beautiful explanations and intuitions that his lectures are well worth viewing. Many of his tutorials and invited talks are available on the web and I recommend viewing these talks over pretty much any other way you might spend your time.

Course Discussions and Administration

We will use Piazza for class discussion.  Rather than emailing me, I encourage you to post your questions on Piazza. The Piazza signup page is here. Once you've signed up, the class page is here.

Assignments will be submitted via CU's course management system, Desire2Learn. Here is a link to our course.

Course Requirements


Even though my lecture notes cover most of the material I care for you to know, the text will provide a more detailed and formal treatment of some of the topics. I'm very happy if you read the text in advance of class so that you can ask informed questions, or ask me to clarify material.

Homework Assignments

We can all delude ourselves into believing we understand some math or algorithm by reading, but implementing and experimenting with the algorithm is both fun and valuable for obtaining a true understanding.  Students will implement small-scale versions of as many of the models we discuss as possible.  I will give about half a dozen homework assignments that involve implementation over the semester, details to be determined. My preference is for you to work in python, as we will rely on software that runs under python. One or more of the assignments may involve writing a commentary on a research article or presenting the article to the class.

Information on submitting assignments can be found here.

Semester Grades

Semester grades will be based 15% on class attendance and participation and 85% on the homework assignments.  I will weight the assignments in proportion to their difficulty, in the range of 10-20% of the course grade.

Class-By-Class Plan and Course Readings

Date Topics / Activity
 Hinton Videos Readings
Lecture Notes Assignments
Aug 29
  • introduction
  • history
GBC Chapter 1

Aug 31
  • processing units
  • Hebbian learning
  • linear models (regression)
  • LMS algorithm
GBC Chapter 1

Bengio, Learning deep architectures for AI (section 1)

Chronicle of Higher Education article on Deep Learning
homework 1 assigned
Sep 5
  • Perceptrons (classification)
  • limitations of linear nets and perceptrons
GBC Chapter 2

Sep 7
  • activation functions
  • error functions
  • back propagation
GBC Chapter 4
GBC Chapter 6.1-6.4

Sep 12
  • representation

homework 1 due; homework 2 assigned
Sep 14
  • activation and loss functions
  • practical advice
  • optimization
GBC Chapter 8.1-8.5
GBC Chapter 5.5

GBC Chapter 8.6-8.7

Sep 19
  • bias-variance dilemma
  • overfitting
  • inductive bias

GBC Chapter 5.2-5.4

bias-variance trade off
bias and variance

Sep 21
  • regularization
  • drop out
GBC Chapter 7.{1,2,3,5,8,10,11,12}

pruning algorithms
Sep 26
  • convolutional nets
  • object recognition
GBC Chapter 9
convolutional net demos


Sep 28
  • tensorflow (Denis Kazakov will lead)

GBC Chapter 6.5, 11
lecture notes delivered via piazza
homework 2 due; homework 3 assigned
Oct 3,5
  • incorporating domain knowledge into models

GCB Chapter 7.{4,9,14}
GBC Chapter 15.2-6
leveraging domain knowledge

Oct 10
  • deep learning
deep nets

Oct 12,17

  • neural nets for sequences
  • recurrent nets
  • LSTM
  • GRU
  • reservoir computing
GBC Chapter 10
recurrent nets 1
recurrent nets 2
homework 3 due (Oct 12); homework 4 assigned
Oct 19
  • unsupervised learning
  • autoencoders
  • variational autoencoders
  • attractor nets
GBC Chapter 14

Oct 24,26
  • language modeling
GBC Chapter 12.4
assignment 4 due (Oct 26)
orphan class
  • speech recognition

GBC Chapter 12.3

Oct 31,
Nov 2
  • class participants speak about ongoing research
  • Teddy Weverka: optimal deep learning hardware
  • Sean Kelly, Brent Milne: WootMath
  • Jason Dou
  • Yoshinari Fujinuma (LORELEI)
  • Bo Cao - gesture robotics recognition
  • Aditya

Hall lecture
optical neural nets climate/weather prediction
assignment 5
Nov 7
  • probabilistic neural nets
  • Boltzmann machines
  • RBMs
  • sigmoid belief nets
  • generative models
GBC Chapter 3

assignment 5 due Nov 7; assignment 6 handed out
Nov 9
CLASS CANCELLED (Mike traveling)

Nov 14
  • final projects
  • capsule networks

capsule nets

Nov 16
  • memory nets

assignment 6 due; assignment 7 handed out
Nov 21, 23

Nov 28
  • adversarial nets


Nov 30
  • image captioning


Dec 5,7

Dec 12
  • NIPS highlights
  • attention in neural nets


Dec 14
  • deep reinforcement learning

deep RL

Dec 16,
[Final exam slot]
  • limitations of deep learning

Assignment 7 due


Other Interesting Papers

Alternative activation functions
Image processing
Alternative training procedures/loss functions

Relevant Links


Popular Press


Modeling tools

TensorFlow - Google contribution
CNTK - microsoft toolkit
See list at
Caffe -- rapidly evolving, but not terribly well documented; requires GPU
Theano -- general purpose but learning curve may be steep (documentation)
Torch7 -- looks to be pretty solid; requires learning matlab-like language  (documentation)
deep learning exercises -- code for Stanford deep learning tutorial, includes convolutional nets
convnet.js -- not the fastest, but may be the easiest
Matlab toolboxes for convolutional nets:  matconvnet cnn cuda-cnn
Mocha -- deep learning framework for Julia
Spearmint for Bayesian optimization of hyperparameters (code cleaned up by Jan Yperman)
Recurrent net resources

Additional information for students (click to read)