Minibatching¶
This isn’t a topic that deserves to have its own page, but here it is.
“Minibatch-first” convention¶
Penne assumes that:
- A minibatch of n-dimensional arrays is an (n+1)-dimensional array, where the first axis ranges over the instances in the minibatch.
- A minibatch of ints (representing one-hot vectors) is a list of ints.
The way that NumPy’s broadcasting rules work (which Penne follows) means that many operations will automatically work on minibatches. For example, elementwise addition works correctly if either or both arguments are minibatches.
But of course there are lots of exceptions.
dot(x,y)
sums over the last axis ofx
and the second-to-last axis ofy
, which will behave differently ify
is a vector or a minibatch of vectors.- If
x
is a vector or minibatch of vectors, usevecdot
instead. - If
x
is a matrix/tensor, the solution that theLayer
class uses is to writedot(y, x)
instead.
- If
concatenate
andstack
default toaxis=0
. Use negative axis numbers to get code that works with or without minibatches.
The functions and modules that Penne provides are (as far as I know) safe to use on minibatches.
The penne.lm
module provides a simple utility function for
grouping a sequence of training examples into a sequence of
minibatches (lists) of training examples:
Sequences¶
With sequence models, the sentences in a minibatch are not all the
same length. The simplest solution is just to pad all the sentences
with a dummy symbol so that they are all the same length. The
penne.lm
module provides some utility functions for making this
easier.
-
penne.lm.
pack_batch
(batch, fillvalue=None)[source]¶ Converts a minibatch from “minibatch-first” order to “minibatch-second” order. Right-pads sentences with fillvalue.
In “minibatch-first” order, batch[i][j] is sentence i, word j. In “minibatch-second” order, batch[i][j] is sentence j, word i.
Parameters: batch (list of lists) – minibatch in “minibatch-first” order Returns: minibatch in “minibatch-second” order Return type: list of lists
-
penne.lm.
unpack_batch
(batch, fillvalue=None)[source]¶ Converts a minibatch from “minibatch-second” order to “minibatch-second” order. Deletes all trailing fillvalues.
In “minibatch-first” order, batch[i][j] is sentence i, word j. In “minibatch-second” order, batch[i][j] is sentence j, word i.
Parameters: batch (list of lists) – minibatch in “minibatch-second” order Returns: minibatch in “minibatch-first” order Return type: list of lists
Training¶
You may need to scale some training parameters by the minibatch size:
learning_rate
should be divided by minibatch sizeclip_gradients
should be multiplied by minibatch size