MLP - Test your knowledge!

Try to answer the following questions:
  1. Are the results in every experiment the same? Why?
  2. What are the main differences between Batch Backpropagation and Stochastic Backpropagation?
  3. What is the influence of a second hidden layer on the training of the network?
  4. What happens when the learning rate is very large or very small?
  5. How can it be that the mapping view of the output layer fails to separate the classes well, even if there is already a high accuracy? (hint: there are more than two dimensions)
  1. Because the weights are initialised at random, so that the error surface is not traversed in the same way. This may eventually lead to convergence to different local minima in different training runs.
  2. The Stochastic Backpropagation algorithm updates the weight after feeding just one data vector to the network. In Batch Backpropagation the weights are only updated after all vectors are fed to the network and the weight updates are summed before being applied. One epoch takes more time in the latter case, but that is not reflected in the demo. See Mitchell pp. 92-94.
  3. Training is more slow, but a more complex behavior can be modelled. The data sets in this demo are not that complex and can be classified with just one hidden layer.
  4. You can clearly see what happens in the Mapping View during training. One big difference with the Perceptron algorithm is that convergence toward a (local) minimum is assured, whether or not the data are linearly separable. When the learning rate is too large or too small however, convergence can be extremely slow. The error view shows the evolution of the Mean Square Error.
  5. It is enough that one dimension is different among the classes for the data to be linearly separable. If classes A and B have attributes {a,b,c}, it is enough that they differ in c to be linearly separable. The invisible dimensions in the Output Layer of Mapping View are evaluated at the average of all data vectors. This means that the separation will be better when the most relevant dimensions are shown.

If you have any questions or comments, let me know:
It would help improve these demos.