Recurrent networks

We can build unfolded recurrent networks using transducers, which are extensions of class Transducer. Their inputs and outputs are sequences of expressions or other objects (e.g., words). A transducer contains, not only parameters like layers do, but an expression representing its internal state. A transducer should define two methods:

  • start() resets the transducer’s internal state to the initial state.
  • step(inp) reads in inp as an input and returns the output, updating the internal state.

The following convenience method is defined in terms of the above:

  • transduce(inps) reads in a sequence of inputs and returns a sequence of outputs.

Module recurrent defines three RNN classes:

  • Simple(ni, no) is a simple RNN, with a tanh inside. Argument ni is the number of input units, and no the number of output units.
  • LSTM(ni, no) is a long short term memory RNN, as defined in Graves, “Generating Sequences with RNNs.” Arguments are the same as Simple.
  • GRU(ni, no) is a gated recurrent unit, as defined in Cho et al., “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation.”

In addition, it defines:

  • Map(f) makes a RNN without any state at all; it just applies the function f to each input.
  • Stack(r1, r2, ...) stacks the RNNs r1, r2, etc., into a deep RNN. Formally, this is the same as FST composition: the output sequence of r1 is the input sequence of r2, and so on.
from penne import recurrent
nh = 100
r = recurrent.LSTM(-256, nh)
output_layer = Layer(nh, 256, f=logsoftmax)
r.start()
w = map(ord, "^the cat sat on the mat$")
loss = constant(0.)
for t in xrange(len(w)-1):
    h = r.step(w[t])
    o = output_layer(h)
    loss -= o[w[t+1]]

Normally, to train the model, you would build a new expression for each string, but for this simple example, just train the model on the same string over and over:

trainer = Adagrad(learning_rate=0.1)
for epoch in xrange(10):
    l = 0.
    for i in xrange(10):
        l += trainer.receive(loss)
    print l
877.429193148
460.270893697
301.754642841
134.779809251
55.4581145699
20.7747129205
11.5706977893
7.77766847724
5.75939614471
4.52777647117

Obligatory randomly generated strings:

for i in xrange(10):
    c = ord('^')
    r.start()
    values = {}
    w = []
    for t in xrange(40):
        h = r.step(c)
        o = output_layer(h)
        values = compute_values(o, values)
        c = numpy.argmax(numpy.random.multinomial(1, numpy.exp(values[o])))
        if c == ord('$'): break
        w.append(chr(c))
    print ''.join(w)
the cat sat on the mat
ethe cat
the cat sat on the mat
the cat
the cat sat on the mat
t e cat sat on the mat
the cat sat on the mat
the cat sat on the mat
the cat sat on the mat
the cat sat on the mat

The implementation of LSTM is not terribly complicated, and illustrates how to implement transducers. The __init__ method creates all the parameters (indirectly, using Layer). The start method sets the initial states (an LSTM has two of them). The step method updates the states according to the LSTM definition.

class LSTM(recurrent.Transducer):
    def __init__(self, input_dims, output_dims):
        dims = [input_dims, output_dims, output_dims]
        self.input_gate = Layer(dims, output_dims, f=sigmoid)
        self.forget_gate = Layer(dims, output_dims, f=sigmoid)
        self.output_gate = Layer(dims, output_dims, f=sigmoid)
        self.input_layer = Layer(dims[:-1], output_dims, f=tanh)
        self.h0 = constant(numpy.zeros((output_dims,)))
        self.c0 = constant(numpy.zeros((output_dims,)))

    def start(self):
        self.h = self.h0
        self.c = self.c0

    def step(self, inp):
        i = self.input_gate(inp, self.h, self.c)
        f = self.forget_gate(inp, self.h, self.c)
        self.c = f * self.c + i * self.input_layer(inp, self.h)
        o = self.output_gate(inp, self.h, self.c)
        self.h = o * tanh(self.c)
        return self.h

Reference

Recurrent neural networks as finite-state transducers.

class penne.recurrent.Transducer[source]

Base class for transducers.

start(state=None)[source]

Prepare the transducer to read a new sequence. :param state: initial state :type state: Expression

state()[source]

Return the transducer’s current state.

transduce(inps)[source]

Apply transducer to a sequence of input symbols. :param inps: list of input symbols

class penne.recurrent.Map(f)[source]

Stateless transducer that just applies a function to every symbol.

Parameters:f – function to apply to every symbol
class penne.recurrent.Stack(*layers)[source]

A stack of recurrent networks, or, the composition of FSTs.

Parameters:layers (list of Transducers) – recurrent networks to stack
class penne.recurrent.Simple(insize, outsize, f=<class 'penne.expr.tanh'>, model=[])[source]

Simple (Elman) recurrent network.

Parameters:
  • insize – number of input units.
  • outsize – number of output units.
  • f – activation function (default tanh)
step(inp)[source]

inp can be either a vector Expression or an int

class penne.recurrent.GatedRecurrentUnit(insize, outsize, model=[])[source]

Gated recurrent unit.

Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. EMNLP.

Parameters:
  • insize – size of input vector, or list of sizes of input vectors
  • outsize – size of output vector
penne.recurrent.GRU

alias of GatedRecurrentUnit

class penne.recurrent.LongShortTermMemory(insize, outsize, model=[])[source]

Long short-term memory recurrent network.

This version is from: Alex Graves, “Generating sequences with recurrent neural networks,” arXiv:1308.0850, which has:

  • diagonal peephole connections
  • output activation function
  • differently from Graves 2013, there is no forget gate; in its place is one minus the input gate.
Parameters:
  • insize – number of input units.
  • outsize – number of output units.
  • f – activation function (default tanh)
penne.recurrent.LSTM

alias of LongShortTermMemory