Minibatching

This isn’t a topic that deserves to have its own page, but here it is.

“Minibatch-first” convention

Penne assumes that:

  • A minibatch of n-dimensional arrays is an (n+1)-dimensional array, where the first axis ranges over the instances in the minibatch.
  • A minibatch of ints (representing one-hot vectors) is a list of ints.

The way that NumPy’s broadcasting rules work (which Penne follows) means that many operations will automatically work on minibatches. For example, elementwise addition works correctly if either or both arguments are minibatches.

But of course there are lots of exceptions.

  • dot(x,y) sums over the last axis of x and the second-to-last axis of y, which will behave differently if y is a vector or a minibatch of vectors.
    • If x is a vector or minibatch of vectors, use vecdot instead.
    • If x is a matrix/tensor, the solution that the Layer class uses is to write dot(y, x) instead.
  • concatenate and stack default to axis=0. Use negative axis numbers to get code that works with or without minibatches.

The functions and modules that Penne provides are (as far as I know) safe to use on minibatches.

The penne.lm module provides a simple utility function for grouping a sequence of training examples into a sequence of minibatches (lists) of training examples:

penne.lm.batches(data, batch_size)[source]

Iterator over minibatches of training examples.

All batches but the first have batch_size elements; the last batch may have fewer elements.

Parameters:
  • data (iterable) – input data
  • batch_size (int) – batch size
Return type:

iterator over lists

Sequences

With sequence models, the sentences in a minibatch are not all the same length. The simplest solution is just to pad all the sentences with a dummy symbol so that they are all the same length. The penne.lm module provides some utility functions for making this easier.

penne.lm.pack_batch(batch, fillvalue=None)[source]

Converts a minibatch from “minibatch-first” order to “minibatch-second” order. Right-pads sentences with fillvalue.

In “minibatch-first” order, batch[i][j] is sentence i, word j. In “minibatch-second” order, batch[i][j] is sentence j, word i.

Parameters:batch (list of lists) – minibatch in “minibatch-first” order
Returns:minibatch in “minibatch-second” order
Return type:list of lists
penne.lm.unpack_batch(batch, fillvalue=None)[source]

Converts a minibatch from “minibatch-second” order to “minibatch-second” order. Deletes all trailing fillvalues.

In “minibatch-first” order, batch[i][j] is sentence i, word j. In “minibatch-second” order, batch[i][j] is sentence j, word i.

Parameters:batch (list of lists) – minibatch in “minibatch-second” order
Returns:minibatch in “minibatch-first” order
Return type:list of lists

Training

You may need to scale some training parameters by the minibatch size:

  • learning_rate should be divided by minibatch size
  • clip_gradients should be multiplied by minibatch size