Deep Learning - Sequence Models

After some delay, I’m getting around to finishing the Andrew Ng series of deep learning classes. The Sequence Model class is especially interesting due to the progress being made in applying deep learning to NLP.

What is a recurrent neural network?

In The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy explains it like this:

“The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.”6 A basic neural network simulates some unkown function, mapping a fixed-size vector of inputs to a fixed-size vector of outputs. “If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.”6 RNNs accumulate state over an arbitrary length sequence of steps, enabling information from previous steps to inform output at the current step. A common example is subject-verb agreement in constructing grammatical sentences.

Another good resource is Understanding LSTM Networks by Chris Olah.

Applications of RNNs

RNN Applications

RNN Structure

An RNN unit uses both the current input, X^<t> and the activations for the previous step, a^<t-1> to produce the output y^<t>.

RNN Structure

Summary of RNN Types

Summary of RNN types

Gated recurrent unit

The gated recurrent unit is simplified relative to LSTM. It has the advantage of fewer parameters to tune, at the expense of a less expressive model. See: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

GRU

Long short-term memory

LSTM units add a bit more sophistication, taking previous activation a^<t-1> and the contents of a memory cell c^<t-1>, which maybe be additively updated or forgotten at each step.

LSTM diagram

Programming exercises

The 1st week of the class has 3 programming exercises. The first is the mechanics of the RNN and LSTM cells and the corresponding backpropagation. I need to work on understanding the calculus better. In the 2nd exercise, they have you train character based language models for dinosaur names and Shakespeare sonnets.

The 3rd exercise, an LSTM Jazz generator, is near and dear to my heart as a music fanatic.

References

Text generation

Minimal character-level Vanilla RNN model. Written by Andrej Karpathy
LSTM text generation example by the Keras team

Jazz

The ideas presented in this notebook came primarily from three computational music papers cited below. The implementation here also took significant inspiration and used many components from Ji-Sung Kim’s github repository.

Ji-Sung Kim, 2016, deepjazz
Jon Gillick, Kevin Tang and Robert Keller, 2009. Learning Jazz Grammars
Robert Keller and David Morrison, 2007, A Grammatical Approach to Automatic Improvisation
François Pachet, 1999, Surprising Harmonies