Finally getting my hands on LSTM networks.  Originally developed in the late 1990’s by Jürgen Schmidhuber, the LSTM block allows a part of the neural network to store a memory cell, and have gates to control whether that memory cell can be overwritten by an input, forgotten, or allowed to be fed to the output gates, kind of like an actual memory cell in a computer.


schematic of LSTM network


The main difference is that in a computer’s memory cell, everything is either one or off (1 or 0), whereas in the LSTM network, the cells will be from zero to one, controlled by a sigmoid function (Although in a memory cell, the actual voltage in the transistors can be closer to a sigmoid function than just 1 or 0).  The network can also be trained via stochastic gradient descent, as the entire network can be differentiated and back propagation through time can be applied to train the weights.  The advantage of this network is that memories can be stored indefinitely, while normal recurrent networks composed of only sigmoid functions can lose their states (or memory) quickly.


Wonders can be done with LSTM especially in the area of speech recognition, and recently in image recognition.  What especially interests me is the ability for RNNs to generate content.  I saw this demo from Alex Grave’s website that uses an RNN with LSTM cells to generate handwriting, which is one of the coolest things I have ever seen with neural nets.