Minibatching ============ This isn't a topic that deserves to have its own page, but here it is. "Minibatch-first" convention ---------------------------- Penne assumes that: - A minibatch of n-dimensional arrays is an (n+1)-dimensional array, where the first axis ranges over the instances in the minibatch. - A minibatch of ints (representing one-hot vectors) is a list of ints. The way that NumPy's broadcasting rules work (which Penne follows) means that many operations will automatically work on minibatches. For example, elementwise addition works correctly if either or both arguments are minibatches. But of course there are lots of exceptions. - ``dot(x,y)`` sums over the last axis of ``x`` and the second-to-last axis of ``y``, which will behave differently if ``y`` is a vector or a minibatch of vectors. - If ``x`` is a vector or minibatch of vectors, use ``vecdot`` instead. - If ``x`` is a matrix/tensor, the solution that the ``Layer`` class uses is to write ``dot(y, x)`` instead. - ``concatenate`` and ``stack`` default to ``axis=0``. Use negative axis numbers to get code that works with or without minibatches. The functions and modules that Penne provides are (as far as I know) safe to use on minibatches. The ``penne.lm`` module provides a simple utility function for grouping a sequence of training examples into a sequence of minibatches (lists) of training examples: .. autofunction:: penne.lm.batches Sequences --------- With sequence models, the sentences in a minibatch are not all the same length. The simplest solution is just to pad all the sentences with a dummy symbol so that they are all the same length. The ``penne.lm`` module provides some utility functions for making this easier. .. autofunction:: penne.lm.pack_batch .. autofunction:: penne.lm.unpack_batch Training -------- You may need to scale some training parameters by the minibatch size: - ``learning_rate`` should be divided by minibatch size - ``clip_gradients`` should be multiplied by minibatch size