четверг, 4 сентября 2014 г.

Neural Network Dropout Training

Neural Network Dropout Training

Dropout training is a relatively new algorithm which appears to be highly effective for improving the quality of neural network predictions. Dropout training is not yet widely implemented in neural network API libraries. The information presented in this article will enable you to understand how to use dropout training if it's available in an existing system, or add dropout training to systems where it's not yet available.
A major challenge when working with a neural network is training the network in such a way that the resulting model doesn't over-fit the training data -- that is, generate weights and bias values that predict the dependent y-values of the training data with very high accuracy, but predict the y-values for new data with poor accuracy. One interesting approach for dealing with neural network over-fitting is a technique called dropout training. The idea is simple: During the training process, hidden nodes and their connections are randomly dropped from the neural network. This prevents the hidden nodes from co-adapting with each other, forcing the model to rely on only a subset of the hidden nodes. This makes the resulting neural network more robust. Another way of looking at dropout training is that dropout generates many different virtual subsets of the original neural network and then these subsets are averaged to give a final network that generalizes well.
Take a look at the demo run in Figure 1. The demo program creates and trains a neural network classifier that predicts the species of an iris flower (setosa, versicolor or virginica) based on four numeric x-values for sepal length and width and petal length and width. The training set consists of 120 data items. The 4-9-3 neural network uses the back-propagation training algorithm combined with dropout. After training, the resulting neural network model with (4 * 9) + (9 * 3) + (9 + 3) = 75 weights and bias values correctly predicts the species of 29 of the 30 data items (0.9667 accuracy) in the test set. The dropout process occurs behind the scenes.

[Click on image for larger view.] Figure 1. Neural Network Training Using Dropout

This article assumes you have a solid understanding of neural network concepts, including the feed-forward mechanism, and the back-propagation algorithm, and that you have at least intermediate level programming skills, but does not assume you know anything about dropout training. The demo is coded using C# but you should be able to refactor the code to other languages such as JavaScript or Visual Basic .NET without too much difficulty. The demo code is too long to present in its entirety, so this article focuses on the methods that use dropout. Most normal error checking has been omitted from the demo to keep the main ideas as clear as possible.
Overall Program Structure
The demo program is a console application. The overall structure of the program, with some minor edits and WriteLine statements removed, is presented in Listing 1. Compared to a neural network class that doesn't use dropout, the neural network code in Listing 1 has additional methods MakeDropNodes and IsDropNode, and methods ComputeOutputs and UpdateWeights have an additional input parameter array named dropNodes.
Listing 1: Dropout Training Demo Program Structure
using System;
using System.Collections.Generic;
namespace DropoutDemo
{
  class DropoutProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin neural network dropout demo");
      double[][] trainData = new double[120][];
      trainData[0] = new double[] { 6.0,3.4,4.5,1.6, 0,1,0 };
      // Etc. ... 
      trainData[119] = new double[] { 5.7,2.8,4.5,1.3, 0,1,0 };

      double[][] testData = new double[30][];
      testData[0] = new double[] { 6.0,2.7,5.1,1.6, 0,1,0 };
      // Etc. ... 
      testData[29] = new double[] { 5.8,2.6,4.0,1.2, 0,1,0 };

      Console.WriteLine("\nFirst 5 rows of training data:");
      ShowMatrix(trainData, 5, 1, true);
      Console.WriteLine("First 3 rows of test data:");
      ShowMatrix(testData, 3, 1, true);

      const int numInput = 4;
      const int numHidden = 7;
      const int numOutput = 3;
      NeuralNetwork nn = new NeuralNetwork(numInput,
        numHidden, numOutput);

      int maxEpochs = 500;
      double learnRate = 0.05;
      nn.Train(trainData, maxEpochs, learnRate);
      Console.WriteLine("Training complete\n");

      double[] weights = nn.GetWeights();
      Console.WriteLine("Final weights and bias values:");
      ShowVector(weights, 10, 3, true);

      double trainAcc = nn.Accuracy(trainData);
      Console.WriteLine("\nAccuracy on training data = " +
        trainAcc.ToString("F4"));

      double testAcc = nn.Accuracy(testData);
      Console.WriteLine("\nAccuracy on test data = " +
        testAcc.ToString("F4"));

      Console.WriteLine("\nEnd dropout demo\n");
      Console.ReadLine();
    }

    static void ShowVector(double[] vector, int valsPerRow,
      int decimals, bool newLine) { . . }
    static void ShowMatrix(double[][] matrix, int numRows,
      int decimals, bool newLine) { . . }
  } // class Program

  public class NeuralNetwork
  {
    private static Random rnd;

    private int numInput;
    private int numHidden;
    private int numOutput;

    private double[] inputs;
    private double[][] ihWeights; // input-hidden
    private double[] hBiases;
    private double[] hOutputs;

    private double[][] hoWeights; // hidden-output
    private double[] oBiases;
    private double[] outputs;

    public NeuralNetwork(int numInput, int numHidden,
      int numOutput) { . . }
    private static double[][] MakeMatrix(int rows,
      int cols) { . . }
    public void SetWeights(double[] weights) { . . }
    private void InitializeWeights() { . . }
    public double[] GetWeights() { . . }

    private int[] MakeDropNodes() { . . }
    private bool IsDropNode(int node, int[] dropNodes) { . . }

    private double[] ComputeOutputs(double[] xValues,
      int[] dropNodes) { . . }
    private static double HyperTanFunction(double x) { . . }
    private static double[] Softmax(double[] oSums) { . . }

    private void UpdateWeights(double[] tValues,
      double learnRate, int[] dropNodes) { . . }

    public void Train(double[][] trainData, int maxEprochs,
      double learnRate) { . . }
    private static void Shuffle(int[] sequence) { . . }
    public double Accuracy(double[][] testData) { . . }
    private static int MaxIndex(double[] vector) { . . }
  } // class NeuralNetwork
} // ns
The Dropout Process
Although neural network training using dropout is conceptually simple, the implementation details are a bit tricky. Take a look at the diagram in Figure 2. The diagram represents a dummy neural network that has three inputs, four hidden nodes and two outputs. When using dropout, as each training data item is presented, some of the hidden nodes are randomly selected to be dropped just for the current training item. Specifically, each node independently has a probability equal to 0.50 of being dropped. This means that no hidden nodes might be selected to be dropped, or all hidden nodes might be selected, but on average, about half of the hidden nodes will be selected as drop-nodes for each training item.

[Click on image for larger view.] Figure 2. Effect of Dropout Nodes on Feed-Forward
In Figure 2, hidden nodes [1] and [2] were selected to be dropped. The dropped nodes do not participate in the feed-forward computation of the output values, or in the back-propagation computation to update the neural network weights and bias values.
The figure shows how the two dropped nodes affect the feed-forward computation. Hidden node [0] isn't a drop-node, so its hSums value is computed as normal. If the input-to-hidden weights are:
0.01  0.02  0.03  0.04
0.05  0.06  0.07  0.08
0.09  0.10  0.11  0.12
and the three hidden node bias value are -2.0, -2.3, -2.6 and -3.0, and the input x-values are 1.0, 3.0, 5.0, then hSums[0] = (1.0)(0.01) + (3.0)(0.05) + (5.0)(0.09) + (-2.0) = -1.39. If the tanh function is used for hidden node activation, the output for hidden node [0] = tanh(-1.39) = -0.88. The hSums and output values for hidden nodes [1] and [2] aren't computed for the current training item because those nodes have been selected as drop-nodes for the training item. Note that hidden nodes selected to be dropped aren't physically removed from the neural network, they're virtually removed by ignoring them.
After hidden node output values are computed, ignoring the drop-nodes, the final output values of the neural network are computed in much the same way. For example, in Figure 2, if the hidden-to-output weights are:
0.13  0.14
0.15  0.16
0.17  0.18
0.19  0.20
and the two output node bias values are 4.0 and 5.0, and the outputs for hidden nodes [0] and [2] are -0.88 and -0.97, then oSums[0] = (-0.88)(0.13) + (-0.97)(0.19) + 4.0 = 3.70. And oSums[1] = (-0.88)(0.14) + (-0.97)(0.20) + 5.0 = 4.68. If the softmax function is used for output layer activation, then the two final outputs of the neural network are softmax(0, 3.70, 4.68) = 0.27 and softmax(1, 3.70, 4.68) = 0.73. In short, those hidden nodes selected to be dropped for the current training item are ignored.
The effect of drop-nodes on the back-propagation pass through the neural network is similar. Hidden node gradients aren't computed for the drop-nodes, and the drop-node input-hidden weights, hidden-output weights, and hidden biases aren't updated.
Generating Nodes to Drop
As each training item is presented for training, a new set of randomly selected hidden nodes to drop must be generated. Method MakeDropNodes returns an array of type int where the values in the array are indices of drop-nodes. For example, if a neural network has nine hidden nodes and the return from MakeDropNodes is an array of size four with values 0, 2, 6, 8 then there are four drop-nodes at indices [0], [2], [6] and [8]. Method MakeDropNodes is defined in Listing 2.
Listing 2: The MakeDropNodes Method
private int[] MakeDropNodes()
{
  List<int> resultList = new List<int>();
  for (int i = 0; i < this.numHidden; ++i) {
    double p = rnd.NextDouble();
    if (p < 0.50) resultList.Add(i);
  }

  if (resultList.Count == 0)
    resultList.Add(rnd.Next(0, numHidden));
  else if (resultList.Count == numHidden)
    resultList.RemoveAt(rnd.Next(0, numHidden));
  return resultList.ToArray();
}

Комментариев нет:

Отправить комментарий