Recurrent networks¶
We can build unfolded recurrent networks using transducers, which are
extensions of class Transducer
. Their inputs and outputs are
sequences of expressions or other objects (e.g., words). A transducer
contains, not only parameters like layers do, but an expression
representing its internal state. A transducer should define two
methods:
start()
resets the transducer’s internal state to the initial state.step(inp)
reads ininp
as an input and returns the output, updating the internal state.
The following convenience method is defined in terms of the above:
transduce(inps)
reads in a sequence of inputs and returns a sequence of outputs.
Module recurrent
defines three RNN classes:
Simple(ni, no)
is a simple RNN, with a tanh inside. Argumentni
is the number of input units, andno
the number of output units.LSTM(ni, no)
is a long short term memory RNN, as defined in Graves, “Generating Sequences with RNNs.” Arguments are the same asSimple
.GRU(ni, no)
is a gated recurrent unit, as defined in Cho et al., “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation.”
In addition, it defines:
Map(f)
makes a RNN without any state at all; it just applies the functionf
to each input.Stack(r1, r2, ...)
stacks the RNNsr1
,r2
, etc., into a deep RNN. Formally, this is the same as FST composition: the output sequence ofr1
is the input sequence ofr2
, and so on.
from penne import recurrent
nh = 100
r = recurrent.LSTM(-256, nh)
output_layer = Layer(nh, 256, f=logsoftmax)
r.start()
w = map(ord, "^the cat sat on the mat$")
loss = constant(0.)
for t in xrange(len(w)-1):
h = r.step(w[t])
o = output_layer(h)
loss -= o[w[t+1]]
Normally, to train the model, you would build a new expression for each string, but for this simple example, just train the model on the same string over and over:
trainer = Adagrad(learning_rate=0.1)
for epoch in xrange(10):
l = 0.
for i in xrange(10):
l += trainer.receive(loss)
print l
877.429193148
460.270893697
301.754642841
134.779809251
55.4581145699
20.7747129205
11.5706977893
7.77766847724
5.75939614471
4.52777647117
Obligatory randomly generated strings:
for i in xrange(10):
c = ord('^')
r.start()
values = {}
w = []
for t in xrange(40):
h = r.step(c)
o = output_layer(h)
values = compute_values(o, values)
c = numpy.argmax(numpy.random.multinomial(1, numpy.exp(values[o])))
if c == ord('$'): break
w.append(chr(c))
print ''.join(w)
the cat sat on the mat
ethe cat
the cat sat on the mat
the cat
the cat sat on the mat
t e cat sat on the mat
the cat sat on the mat
the cat sat on the mat
the cat sat on the mat
the cat sat on the mat
The implementation of LSTM
is not terribly complicated, and
illustrates how to implement transducers. The __init__
method
creates all the parameters (indirectly, using Layer
). The start
method sets the initial states (an LSTM has two of them). The step
method updates the states according to the LSTM definition.
class LSTM(recurrent.Transducer):
def __init__(self, input_dims, output_dims):
dims = [input_dims, output_dims, output_dims]
self.input_gate = Layer(dims, output_dims, f=sigmoid)
self.forget_gate = Layer(dims, output_dims, f=sigmoid)
self.output_gate = Layer(dims, output_dims, f=sigmoid)
self.input_layer = Layer(dims[:-1], output_dims, f=tanh)
self.h0 = constant(numpy.zeros((output_dims,)))
self.c0 = constant(numpy.zeros((output_dims,)))
def start(self):
self.h = self.h0
self.c = self.c0
def step(self, inp):
i = self.input_gate(inp, self.h, self.c)
f = self.forget_gate(inp, self.h, self.c)
self.c = f * self.c + i * self.input_layer(inp, self.h)
o = self.output_gate(inp, self.h, self.c)
self.h = o * tanh(self.c)
return self.h
Reference¶
Recurrent neural networks as finite-state transducers.
-
class
penne.recurrent.
Transducer
[source]¶ Base class for transducers.
-
class
penne.recurrent.
Map
(f)[source]¶ Stateless transducer that just applies a function to every symbol.
Parameters: f – function to apply to every symbol
-
class
penne.recurrent.
Stack
(*layers)[source]¶ A stack of recurrent networks, or, the composition of FSTs.
Parameters: layers (list of Transducers) – recurrent networks to stack
-
class
penne.recurrent.
Simple
(insize, outsize, f=<class 'penne.expr.tanh'>, model=[])[source]¶ Simple (Elman) recurrent network.
Parameters: - insize – number of input units.
- outsize – number of output units.
- f – activation function (default tanh)
-
class
penne.recurrent.
GatedRecurrentUnit
(insize, outsize, model=[])[source]¶ Gated recurrent unit.
Cho et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. EMNLP.
Parameters: - insize – size of input vector, or list of sizes of input vectors
- outsize – size of output vector
-
penne.recurrent.
GRU
¶ alias of
GatedRecurrentUnit
-
class
penne.recurrent.
LongShortTermMemory
(insize, outsize, model=[])[source]¶ Long short-term memory recurrent network.
This version is from: Alex Graves, “Generating sequences with recurrent neural networks,” arXiv:1308.0850, which has:
- diagonal peephole connections
- output activation function
- differently from Graves 2013, there is no forget gate; in its place is one minus the input gate.
Parameters: - insize – number of input units.
- outsize – number of output units.
- f – activation function (default tanh)
-
penne.recurrent.
LSTM
¶ alias of
LongShortTermMemory