четверг, 4 сентября 2014 г.

The Backpropagation Network


The Backpropagation Network


This program is copyright © 1996 by the author. It is made available as is, and no warranty - about the program, its performance, or its conformity to any specification - is given or implied. It may be used, modified, and distributed freely for private and commercial purposes, as long as the original author is credited as part of the final work.

Understanding Neural Network Batch Training: A Tutorial


Understanding Neural Network Batch Training: A Tutorial

There are two different techniques for training a neural network: batch and online. Understanding their similarities and differences is important in order to be able to create accurate prediction systems.

namespace DropoutDemo

using System;
using System.Collections.Generic;

namespace DropoutDemo
{
  class DropoutProgram

BackPropTraining

using System;
namespace BackPropTraining
{
  class BackPropTrainingProgram

BackPropTraining

using System;
namespace BackPropTraining
{
  class BackPropTrainingProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin neural network training with back-propagation demo\n");
      Console.WriteLine("\nData is the famous Iris flower set.");
      Console.WriteLine("Data is sepal length, sepal width, petal length, petal width -> iris species");
      Console.WriteLine("Iris setosa = 0 0 1, Iris versicolor = 0 1 0, Iris virginica = 1 0 0 ");
      Console.WriteLine("The goal is to predict species from sepal length, sepal width, petal length, petal width\n");
  

Neural Network Back-Propagation Using C#

using System;
// demonstration of neural network back-propagation
namespace BackProp
{
  class BackPropProgram
  {
    static void Main(string[] args)
    {
      try

demonstration of neural network back-propagation

using System;
// demonstration of neural network back-propagation

Neural Network Back-Propagation Using C#

Neural Network Back-Propagation Using C#

Understanding how back-propagation works will enable you to use neural network tools more effectively.

Training a neural network is the process of finding a set of weight and bias values so that for a given set of inputs, the outputs produced by the neural network are very close to some target values. For example, if you have a neural network that predicts the scores of two basketball teams in an upcoming game, you might have all kinds of historical data such as turnovers per game for every team, rebounds per game and so on. Each historical data vector would have dozens of inputs and two associated outputs: the score of the first team and the score of the second team. Training the neural network searches for a set of weights and biases that most accurately predicts both teams' scores from the input data. Once you have these weight and bias values, you could apply them to an upcoming game to predict the results of that game.

implementing neural networks

I suck at implementing neural networks in octave

A few days ago I implemented my first full neural network in Octave. Nothing too major, just a three layer network recognising hand-written letters. Even though I finally understood what a neural network is, this was still a cool challenge.

Dive into Neural Networks

Dive into Neural Networks

James McCaffrey
Download the Code Sample
James McCaffreyAn artificial neural network (usually just called a neural network) is an abstraction loosely modeled on biological neurons and synapses. Although neural networks have been studied for decades, many neural network code implementations on the Internet are not, in my opinion, explained very well. In this month’s column, I’ll explain what artificial neural networks are and present C# code that implements a neural network.
The best way to see where I’m headed is to take a look at Figure 1 and Figure 2. One way of thinking about neural networks is to consider them numerical input-output mechanisms. The neural network in Figure 1 has three inputs labeled x0, x1 and x2, with values 1.0, 2.0 and 3.0, respectively. The neural network has two outputs labeled y0 and y1, with values 0.72 and -0.88, respectively. The neural network in Figure 1 has one layer of so-called hidden neurons and can be described as a three-layer, fully connected, feedforward network with three inputs, two outputs and four hidden neurons. Unfortunately, neural network terminology varies quite a bit. In this article, I’ll generally—but not always—use the terminology described in the excellent neural network FAQ at bit.ly/wfikTI.
Neural Network Structure
Figure 1 Neural Network Structure
Neural Network Demo Program
Figure 2 Neural Network Demo Program
Figure 2 shows the output produced by the demo program presented in this article. The neural network uses both a sigmoid activation function and a tanh activation function. These functions are suggested by the two equations with the Greek letters phi in Figure 1. The outputs produced by a neural network depend on the values of a set of numeric weights and biases. In this example, there are a total of 26 weights and biases with values 0.10, 0.20 ... -5.00. After the weight and bias values are loaded into the neural network, the demo program loads the three input values (1.0, 2.0, 3.0) and then performs a series of computations as suggested by the messages about the input-to-hidden sums and the hidden-to-output sums. The demo program concludes by displaying the two output values (0.72, -0.88).
I’ll walk you through the program that produced the output shown in Figure 2. This column assumes you have intermediate programming skills but doesn’t assume you know anything about neural networks. The demo program is coded using the C# language but you should have no trouble refactoring the demo code to another language such as Visual Basic .NET or Python. The program presented in this article is essentially a tutorial and a platform for experimentation; it does not directly solve any practical problem, so I’ll explain how you can expand the code to solve meaningful problems. I think you’ll find the information quite interesting, and some of the programming techniques can be valuable additions to your coding skill set.

Modeling a Neural Network

Conceptually, artificial neural networks are modeled on the behavior of real biological neural networks. In Figure 1 the circles represent neurons where processing occurs and the arrows represent both information flow and numeric values called weights. In many situations, input values are copied directly into input neurons without any weighting and emitted directly without any processing, so the first real action occurs in the hidden layer neurons. Assume that input values 1.0, 2.0 and 3.0 are emitted from the input neurons. If you examine Figure 1, you can see an arrow representing a weight value between each of the three input neurons and each of the four hidden neurons. Suppose the three weight arrows shown pointing into the top hidden neuron are named w00, w10 and w20. In this notation the first index represents the index of the source input neuron and the second index represents the index of the destination hidden neuron. Neuron processing occurs in three steps. In the first step, a weighted sum is computed. Suppose w00 = 0.1, w10 = 0.5 and w20 = 0.9. The weighted sum for the top hidden neuron is (1.0)(0.1) + (2.0)(0.5) + (3.0)(0.9) = 3.8. The second processing step is to add a bias value. Suppose the bias value is -2.0; then the adjusted weighted sum becomes 3.8 + (-2.0) = 1.8. The third step is to apply an activation function to the adjusted weighted sum. Suppose the activation function is the sigmoid function defined by 1.0 / (1.0 * Exp(-x)), where Exp represents the exponential function. The output from the hidden neuron becomes 1.0 / (1.0 * Exp(-1.8)) = 0.86. This output then becomes part of the weighted sum input into each of the output layer neurons. In Figure 1, this three-step process is suggested by the equation with the Greek letter phi: weighted sums (xw) are computed, a bias (b) is added and an activation function (phi) is applied.
After all hidden neuron values have been computed, output layer neuron values are computed in the same way. The activation function used to compute output neuron values can be the same function used when computing the hidden neuron values, or a different activation function can be used. The demo program shown running in Figure 2 uses the hyperbolic tangent function as the hidden-to-output activation function. After all output layer neuron values have been computed, in most situations these values are not weighted or processed but are simply emitted as the final output values of the neural network.

Internal Structure

The key to understanding the neural network implementation presented here is to closely examine Figure 3, which, at first glance, might appear extremely complicated. But bear with me—the figure is not nearly as complex as it might first appear. Figure 3 shows a total of eight arrays and two matrices. The first array is labeled this.inputs. This array holds the neural network input values, which are 1.0, 2.0 and 3.0 in this example. Next comes the set of weight values that are used to compute values in the so-called hidden layer. These weights are stored in a 3 x 4 matrix labeled i-h weights where the i-h stands for input-to-hidden. Notice in Figure 1 that the demo neural network has four hidden neurons. The i-h weights matrix has a number of rows equal to the number of inputs and a number of columns equal to the number of hidden neurons.
Neural Network Internal Structure
Figure 3 Neural Network Internal Structure
The array labeled i-h sums is a scratch array used for computation. Note that the length of the i-h sums array will always be the same as the number of hidden neurons (four, in this example). Next comes an array labeled i-h biases. Neural network biases are additional weights used to compute hidden and output layer neurons. The length of the i-h biases array will be the same as the length of the i-h sums array, which in turn is the same as the number of hidden neurons.
The array labeled i-h outputs is an intermediate result and the values in this array are used as inputs to the next layer. The i-h sums array has length equal to the number of hidden neurons.
Next comes a matrix labeled h-o weights where the h-o stands for hidden-to-output. Here the h-o weights matrix has size 4 x 2 because there are four hidden neurons and two outputs. The h-o sums array, the h-o biases array and the this.outputs array all have lengths equal to the number of outputs (two, in this example).
The array labeled weights at the bottom of Figure 3 holds all the input-to-hidden and hidden-to-output weights and biases. In this example, the length of the weights array is (3 * 4) + 4 + (4 * 2) + 2 = 26. In general, if Ni is the number of input values, Nh is the number of hidden neurons and No is the number of outputs, then the length of the weights array will be Nw = (Ni * Nh) + Nh + (Nh * No) + No.

Computing the Outputs

After the eight arrays and two matrices described in the previous section have been created, a neural network can compute its output based on its inputs, weights and biases. The first step is to copy input values into the this.inputs array. The next step is to assign values to the weights array. For the purposes of a demonstration you can use any weight values you like. Next, values in the weights array are copied to the i-h weights matrix, the i-h biases array, the h-o weights matrix and the h-o biases array. Figure 3 should make this relationship clear.
The values in the i-h sums array are computed in two steps. The first step is to compute the weighted sums by multiplying the values in the inputs array by the values in the appropriate column of the i-h weights matrix. For example, the weighted sum for hidden neuron [3] (where I’m using zero-based indexing) uses each input value and the values in column [3] of the i-h weights matrix: (1.0)(0.4) + (2.0)(0.8) + (3.0)(1.2) = 5.6. The second step when computing i-h sum values is to add each bias value to the current i-h sum value. For example, because i-h biases [3] has value -7.0, the value of i-h sums [3] becomes 5.6 + (-7.0) = -1.4.
After all the values in the i-h sums array have been calculated, the input-to-hidden activation function is applied to those sums to produce the input-to-hidden output values. There are many possible activation functions. The simplest activation function is called the step function, which simply returns 1.0 for any input value greater than zero and returns 0.0 for any input value less than or equal to zero. Another common activation function, and the one used in this article, is the sigmoid function, which is defined as f(x) = 1.0 / (1.0 * Exp(-x)). The graph of the sigmoid function is shown in Figure 4.
The Sigmoid Function
Figure 4 The Sigmoid Function
Notice the sigmoid function returns a value in the range strictly greater than zero and strictly less than one. In this example, if the value for i-h sums [3] after the bias value has been added is -1.4, then the value of i-h outputs [3] becomes 1.0 / (1.0 * Exp(-(-1.4))) = 0.20.
After all the input-to-hidden output neuron values have been computed, those values serve as the inputs for the hidden-to-output layer neuron computations. These computations work in the same way as the input-to-hidden computations: preliminary weighted sums are calculated, biases are added and then an activation function is applied. In this example I use the hyperbolic tangent function, abbreviated as tanh, for the hidden-to-output activation function. The tanh function is closely related to the sigmoid function. The graph of the tanh function has an S-shaped curve similar to the sigmoid function, but tanh returns a value in the range (-1,1) instead of in the range (0,1).

Combining Weights and Biases

All of the neural network implementations I’ve seen on the Internet don’t maintain separate weight and bias arrays, but instead combine weights and biases into the weights matrix. How is this possible? Recall that the computation of the value of input-to-hidden neuron [3] resembled (i0 * w03) + (i1 * w13) + (i2 * w23) + b3, where i0 is input value [0], w03 is the weight for input [0] and neuron [3], and b3 is the bias value for hidden neuron [3]. If you create an additional, fake input [4] that has a dummy value of 1.0, and an additional row of weights that hold the bias values, then the previously described computation becomes: (i0 * w03) + (i1 * w13) + (i2 * w23) + (i3 * w33), where i3 is the dummy 1.0 input value and w33 is the bias. The argument is that this approach simplifies the neural network model. I disagree. In my opinion, combining weights and biases makes a neural network model more difficult to understand and more error-prone to implement. However, apparently I’m the only author who seems to have this opinion, so you should make your own design decision.

Implementation

I implemented the neural network shown in Figures 1, 2 and 3 using Visual Studio 2010. I created a C# console application named NeuralNetworks. In the Solution Explorer window I right-clicked on file Program.cs and renamed it to NeuralNetworksProgram.cs, which also changed the template-generated class name to NeuralNetworksProgram. The overall program structure, with most WriteLine statements removed, is shown in Figure 5.
Figure 5 Neural Network Program Structure
using System;
namespace NeuralNetworks
{
  class NeuralNetworksProgram
  {
    static void Main(string[] args)
    {
      try
      {
        Console.WriteLine("\nBegin Neural Network demo\n");
        NeuralNetwork nn = new NeuralNetwork(3, 4, 2);
        double[] weights = new double[] {
          0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,
          -2.0, -6.0, -1.0, -7.0,
          1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
          -2.5, -5.0 };
        nn.SetWeights(weights);
        double[] xValues = new double[] { 1.0, 2.0, 3.0 };
        double[] yValues = nn.ComputeOutputs(xValues);
        Helpers.ShowVector(yValues);
        Console.WriteLine("End Neural Network demo\n");
      }
      catch (Exception ex)
      {
        Console.WriteLine("Fatal: " + ex.Message);
      }
    }
  }
  class NeuralNetwork
  {
    // Class members here
    public NeuralNetwork(int numInput, int numHidden, int numOutput) { ... }
    public void SetWeights(double[] weights) { ... }
    public double[] ComputeOutputs(double[] xValues) { ... }
    private static double SigmoidFunction(double x) { ... }
    private static double HyperTanFunction(double x) { ... }
  }
  public class Helpers
  {
    public static double[][] MakeMatrix(int rows, int cols) { ... }
    public static void ShowVector(double[] vector) { ... }
    public static void ShowMatrix(double[][] matrix, int numRows) { ... }
  }
} // ns
I deleted all the template-generated using statements except for the one referencing the System namespace. In the Main function, after displaying a begin message, I instantiate a NeuralNetwork object named nn with three inputs, four hidden neurons and two outputs. Next, I assign 26 arbitrary weights and biases to an array named weights. I load the weights into the neural network object using a method named SetWeights. I assign values 1.0, 2.0 and 3.0 to an array named xValues. I use method ComputeOutputs to load the input values into the neural network and determine the resulting outputs, which I fetch into an array named yValues. The demo concludes by displaying the output values.

The NeuralNetwork Class

The NeuralNetwork class definition starts:
class NeuralNetwork
{
  private int numInput;
  private int numHidden;
  private int numOutput;
...
As explained in the previous sections, the structure of a neural network is determined by the number of input values, the number of hidden layer neurons and the number of output values. The class definition continues as:
private double[] inputs;
private double[][] ihWeights; // input-to-hidden
private double[] ihSums;
private double[] ihBiases;
private double[] ihOutputs;
private double[][] hoWeights;  // hidden-to-output
private double[] hoSums;
private double[] hoBiases;
private double[] outputs;
...
These seven arrays and two matrices correspond to the ones shown in Figure 3. I use an ih prefix for input-to-hidden data and an ho prefix for hidden-to-output data. Recall that the values in the ihOutputs array serve as the inputs for the output layer computations, so naming this array precisely is a bit troublesome.
Figure 6 shows how the NeuralNetwork class constructor is defined.
Figure 6 The NeuralNetwork Class Constructor
public NeuralNetwork(int numInput, int numHidden, int numOutput)
{
  this.numInput = numInput;
  this.numHidden = numHidden;
  this.numOutput = numOutput;
  inputs = new double[numInput];
  ihWeights = Helpers.MakeMatrix(numInput, numHidden);
  ihSums = new double[numHidden];
  ihBiases = new double[numHidden];
  ihOutputs = new double[numHidden];
  hoWeights = Helpers.MakeMatrix(numHidden, numOutput);
  hoSums = new double[numOutput];
  hoBiases = new double[numOutput];
  outputs = new double[numOutput];
}
After copying the input parameter values numInput, numHidden and numOutput into their respective class fields, each of the nine member arrays and matrices are allocated with the sizes I explained earlier. I implement matrices as arrays of arrays rather than using the C# multidimensional array type so that you can more easily refactor my code to a language that doesn’t support multidimensional array types. Because each row of my matrices must be allocated, it’s convenient to use a helper method such as MakeMatrix.
The SetWeights method accepts an array of weights and bias values and populates ihWeights, ihBiases, hoWeights and hoBiases. The method begins like this:
public void SetWeights(double[] weights)
{
  int numWeights = (numInput * numHidden) +
    (numHidden * numOutput) + numHidden + numOutput;
  if (weights.Length != numWeights)
    throw new Exception("xxxxxx");
  int k = 0;
...
As explained earlier, the total number of weights and biases, Nw, in a fully connected feedforward neural network is (Ni * Nh) + (Nh * No) + Nh + No. I do a simple check to see if the weights array parameter has the correct length. Here, “xxxxxx” is a stand-in for a descriptive error message. Next, I initialize an index variable k to the beginning of the weights array parameter. Method SetWeights concludes:
for (int i = 0; i < numInput; ++i)
  for (int j = 0; j < numHidden; ++j)
    ihWeights[i][j] = weights[k++];
for (int i = 0; i < numHidden; ++i)
  ihBiases[i] = weights[k++];
for (int i = 0; i < numHidden; ++i)
  for (int j = 0; j < numOutput; ++j)
    hoWeights[i][j] = weights[k++];
for (int i = 0; i < numOutput; ++i)
  hoBiases[i] = weights[k++]
}
Each value in the weights array parameter is copied sequentially into ihWeights, ihBiases, hoWeights and hoBiases. Notice no values are copied into ihSums or hoSums because those two scratch arrays are used for computation.

Computing the Outputs

The heart of the NeuralNetwork class is method ComputeOutputs. The method is surprisingly short and simple and begins:
public double[] ComputeOutputs(double[] xValues)
{
  if (xValues.Length != numInput)
    throw new Exception("xxxxxx");
  for (int i = 0; i < numHidden; ++i)
    ihSums[i] = 0.0;
  for (int i = 0; i < numOutput; ++i)
    hoSums[i] = 0.0;
...
First I check to see if the length of the input x-values array is the correct size for the NeuralNetwork object. Then I zero out the ihSums and hoSums arrays. If ComputeOutputs is called only once, then this explicit initialization is not necessary, but if ComputeOutputs is called more than once—because ihSums and hoSums are accumulated values—the explicit initialization is absolutely necessary. An alternative design approach is to not declare and allocate ihSums and hoSums as class members, but instead make them local to the ComputeOutputs method. Method ComputeOutputs continues:
for (int i = 0; i < xValues.Length; ++i)
  this.inputs[i] = xValues[i];
for (int j = 0; j < numHidden; ++j)
  for (int i = 0; i < numInput; ++i)
    ihSums[j] += this.inputs[i] * ihWeights[i][j];
...
The values in the xValues array parameter are copied to the class inputs array member. In some neural network scenarios, input parameter values are normalized, for example by performing a linear transform so that all inputs are scaled between -1.0 and +1.0, but here no normalization is performed. Next, a nested loop computes the weighted sums as shown in Figures 1 and 3. Notice that in order to index ihWeights in standard form where index i is the row index and index j is the column index, it’s necessary to have j in the outer loop. Method ComputeOutputs continues:
for (int i = 0; i < numHidden; ++i)
  ihSums[i] += ihBiases[i];
for (int i = 0; i < numHidden; ++i)
  ihOutputs[i] = SigmoidFunction(ihSums[i]);
...
Each weighted sum is modified by adding the appropriate bias value. At this point, to produce the output shown in Figure 2, I used method Helpers.ShowVector to display the current values in the ihSums array. Next, I apply the sigmoid function to each of the values in ihSums and assign the results to array ihOutputs. I’ll present the code for method SigmoidFunction shortly. Method ComputeOutputs continues:
for (int j = 0; j < numOutput; ++j)
  for (int i = 0; i < numHidden; ++i)
    hoSums[j] += ihOutputs[i] * hoWeights[i][j];
for (int i = 0; i < numOutput; ++i)
  hoSums[i] += hoBiases[i];
...
I use the just-computed values in ihOutputs and the weights in hoWeights to compute values into hoSums, then I add the appropriate hidden-to-output bias values. Again, to produce the output shown in Figure 2, I called Helpers.ShowVector. Method ComputeOutputs finishes:
for (int i = 0; i < numOutput; ++i)
    this.outputs[i] = HyperTanFunction(hoSums[i]);
  double[] result = new double[numOutput];
  this.outputs.CopyTo(result, 0);
  return result;
}
I apply method HyperTanFunction to the hoSums to generate the final outputs into class array private member outputs. I copy those outputs to a local result array and use that array as a return value. An alternative design choice would be to implement ComputeOutputs without a return value, but implement a public method GetOutputs so that the outputs of the neural network object could be retrieved.

The Activation Functions and Helper Methods

Here’s the code for the sigmoid function used to compute the input-to-hidden outputs:
private static double SigmoidFunction(double x)
{
  if (x < -45.0) return 0.0;
  else if (x > 45.0) return 1.0;
  else return 1.0 / (1.0 + Math.Exp(-x));
}
Because some implementations of the Math.Exp function can produce arithmetic overflow, checking the value of the input parameter is usually performed. The code for the tanh function used to compute the hidden-to-output results is:
private static double HyperTanFunction(double x)
{
  if (x < -10.0) return -1.0;
  else if (x > 10.0) return 1.0;
  else return Math.Tanh(x);
}
The hyperbolic tangent function returns values between -1 and +1, so arithmetic overflow is not a problem. Here the input value is checked merely to improve performance.
The static utility methods in class Helpers are just coding conveniences. The MakeMatrix method used to allocate matrices in the NeuralNetwork constructor allocates each row of a matrix implemented as an array of arrays:
public static double[][] MakeMatrix(int rows, int cols)
{
  double[][] result = new double[rows][];
  for (int i = 0; i < rows; ++i)
    result[i] = new double[cols];
  return result;
}
Methods ShowVector and ShowMatrix display the values in an array or matrix to the console. You can see the code for these two methods in the code download that accompanies this article (available at msdn.microsoft.com/magazine/msdnmag0512).

Next Steps

The code presented here should give you a solid basis for understanding and experimenting with neural networks. You might want to examine the effects of using different activation functions and varying the number of inputs, outputs and hidden layer neurons. You can modify the neural network by making it partially connected, where some neurons are not logically connected to neurons in the next layer. The neural network presented in this article has one hidden layer. It’s possible to create more complex neural networks that have two or even more hidden layers, and you might want to extend the code presented here to implement such a neural network.
Neural networks can be used to solve a variety of practical problems, including classification problems. In order to solve such problems there are several challenges. For example, you must know how to encode non-numeric data and how to train a neural network to find the best set of weights and biases. I will present an example of using neural networks for classification in a future article.

Dr. James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers working at Microsoft’s Redmond, Wash., campus. He has worked on several Microsoft products including Internet Explorer and MSN Search. He’s the author of “.NET Test Automation Recipes” (Apress, 2006), and can be reached at jammc@microsoft.com.
Thanks to the following Microsoft technical experts for reviewing this article: Dan Liebling and Anne Loomis Thompson

John Bullinaria's Step by Step Guide to Implementing a Neural Network in C


John Bullinaria's Step by Step Guide to Implementing a Neural Network in C

By John A. Bullinaria from the School of Computer Science of The University of Birmingham, UK.

This document contains a step by step guide to implementing a simple neural network in C. It is aimed mainly at students who wish to (or have been told to) incorporate a neural network learning component into a larger system they are building. Obviously there are many types of neural network one could consider using - here I shall concentrate on one particularly common and useful type, namely a simple three-layer feed-forward back-propagation network (multi layer perceptron).
This type of network will be useful when we have a set of input vectors and a corresponding set of output vectors, and our system must produce an appropriate output for each input it is given. Of course, if we already have a complete noise-free set of input and output vectors, then a simple look-up table would suffice. However, if we want the system to generalize, i.e. produce appropriate outputs for inputs it has never seen before, then a neural network that has learned how to map between the known inputs and outputs (i.e. our training set) will often do a pretty good job for new inputs as well.
I shall assume that the reader is already familiar with C, and, for more details about neural networks in general, simply refer the reader to the newsgroup comp.ai.neural-nets and the associated Neural Networks FAQ. So, let us begin...
A single neuron (i.e. processing unit) takes it total input In and produces an output activation Out. I shall take this to be the sigmoid function
    Out = 1.0/(1.0 + exp(-In));         /* Out = Sigmoid(In) */
though other activation functions are often used (e.g. linear or hyperbolic tangent). This has the effect of squashing the infinite range of In into the range 0 to 1. It also has the convenient property that its derivative takes the particularly simple form
    Sigmoid_Derivative = Sigmoid * (1.0 - Sigmoid) ;
Typically, the input In into a given neuron will be the weighted sum of output activations feeding in from a number of other neurons. It is convenient to think of the activations flowing through layers of neurons. So, if there are NumUnits1 neurons in layer 1, the total activation flowing into our layer 2 neuron is just the sum over Layer1Out[i]*Weight[i], where Weight[i] is the strength/weight of the connection between unit i in layer 1 and our unit in layer 2. Each neuron will also have a bias, or resting state, that is added to the sum of inputs, and it is convenient to call this weight[0]. We can then write
    Layer2In = Weight[0] ;         /* start with the bias */
    for( i = 1 ; i <= NumUnits1 ; i++ ) {         /* i loop over layer 1 units */
      Layer2In += Layer1Out[i] * Weight[i] ;        /* add in weighted contributions from layer 1 */
    }
    Layer2Out = 1.0/(1.0 + exp(-Layer2In)) ;     /* compute sigmoid to give activation */
Normally layer 2 will have many units as well, so it is appropriate to write the weights between unit i in layer 1 and unit j in layer 2 as an array Weight[i][j]. Thus to get the output of unit j in layer 2 we have

    Layer2In[j] = Weight[0][j] ;
    for( i = 1 ; i <= NumUnits1 ; i++ ) {
      Layer2In[j] += Layer1Out[i] * Weight[i][j] ;
    }
    Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
Remember that in C the array indices start from zero, not one, so we would declare our variables as
    double Layer1Out[NumUnits1+1] ;
    double Layer2In[NumUnits2+1] ;
    double Layer2Out[NumUnits2+1] ;
    double Weight[NumUnits1+1][NumUnits2+1] ;
(or, more likely, declare pointers and use calloc or malloc to allocate the memory). Naturally, we need another loop to get all the layer 2 outputs
    for( j = 1 ; j <= NumUnits2 ; j++ ) {
      Layer2In[j] = Weight[0][j] ;
      for( i = 1 ; i <= NumUnits1 ; i++ ) {
        Layer2In[j] += Layer1Out[i] * Weight[i][j] ;
      }
      Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
    }
Three layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into a third layer in the same way as above

    for( j = 1 ; j <= NumUnits2 ; j++ ) {         /* j loop computes layer 2 activations */
      Layer2In[j] = Weight12[0][j] ;
      for( i = 1 ; i <= NumUnits1 ; i++ ) {
        Layer2In[j] += Layer1Out[i] * Weight12[i][j] ;
      }
      Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
    }
    for( k = 1 ; k <= NumUnits3 ; k++ ) {         /* k loop computes layer 3 activations */
      Layer3In[k] = Weight23[0][k] ;
      for( j = 1 ; j <= NumUnits2 ; j++ ) {
        Layer3In[k] += Layer2Out[j] * Weight23[j][k] ;
      }
      Layer3Out[k] = 1.0/(1.0 + exp(-Layer3In[k])) ;
    }
The code can start to become confusing at this point - I find that keeping a separate index i, j, k for each layer helps, as does an intuitive notation for distinguishing between the different layers of weights Weight12 and Weight23. For obvious reasons, for three layer networks, it is traditional to call layer 1 the Input layer, layer 2 the Hidden layer, and layer 3 the Output layer. Our network thus takes on the familiar form that we shall use for the rest of this document
Also, to save getting all the In's and Out's confused, we can write LayerNIn as SumN. Our code can thus be written

    for( j = 1 ; j <= NumHidden ; j++ ) {         /* j loop computes hidden unit activations */
      SumH[j] = WeightIH[0][j] ;
      for( i = 1 ; i <= NumInput ; i++ ) {
        SumH[j] += Input[i] * WeightIH[i][j] ;
      }
      Hidden[j] = 1.0/(1.0 + exp(-SumH[j])) ;
    }
    for( k = 1 ; k <= NumOutput ; k++ ) {         /* k loop computes output unit activations */
      SumO[k] = WeightHO[0][k] ;
      for( j = 1 ; j <= NumHidden ; j++ ) {
        SumO[k] += Hidden[j] * WeightHO[j][k] ;
      }
      Output[k] = 1.0/(1.0 + exp(-SumO[k])) ;
    }
Generally we will have a whole set of NumPattern training patterns, i.e. pairs of input and target output vectors,
    Input[p][i] , Target[p][k]
labelled by the index p. The network learns by minimizing some measure of the error of the network's actual outputs compared with the target outputs. For example, the sum squared error over all output units k and all training patterns p will be given by

    Error = 0.0 ;
    for( p = 1 ; p <= NumPattern ; p++ ) {
      for( k = 1 ; k <= NumOutput ; k++ ) {
        Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
      }
    }
(The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning algorithm.) If we insert the above code for computing the network outputs into the p loop of this, we end up with

    Error = 0.0 ;
    for( p = 1 ; p <= NumPattern ; p++ ) {         /* p loop over training patterns */
      for( j = 1 ; j <= NumHidden ; j++ ) {         /* j loop over hidden units */
        SumH[p][j] = WeightIH[0][j] ;
        for( i = 1 ; i <= NumInput ; i++ ) {
          SumH[p][j] += Input[p][i] * WeightIH[i][j] ;
        }
        Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;
      }
      for( k = 1 ; k <= NumOutput ; k++ ) {         /* k loop over output units */
        SumO[p][k] = WeightHO[0][k] ;
        for( j = 1 ; j <= NumHidden ; j++ ) {
          SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;
        }
        Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
        Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;     /* Sum Squared Error */
      }
    }
I'll leave the reader to dispense with any indices that they don't need for the purposes of their own system (e.g. the indices on SumH and SumO).
The next stage is to iteratively adjust the weights to minimize the network's error. A standard way to do this is by 'gradient descent' on the error function. We can compute how much the error is changed by a small change in each weight (i.e. compute the partial derivatives dError/dWeight) and shift the weights by a small amount in the direction that reduces the error. The literature is full of variations on this general approach - I shall begin with the 'standard on-line back-propagation with momentum' algorithm. This is not the place to go through all the mathematics, but for the above sum squared error we can compute and apply one iteration (or 'epoch') of the required weight changes DeltaWeightIH and DeltaWeightHO using

    Error = 0.0 ;
    for( p = 1 ; p <= NumPattern ; p++ ) {         /* repeat for all the training patterns */
      for( j = 1 ; j <= NumHidden ; j++ ) {         /* compute hidden unit activations */
        SumH[p][j] = WeightIH[0][j] ;
        for( i = 1 ; i <= NumInput ; i++ ) {
          SumH[p][j] += Input[p][i] * WeightIH[i][j] ;
        }
        Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;
      }
      for( k = 1 ; k <= NumOutput ; k++ ) {         /* compute output unit activations and errors */
        SumO[p][k] = WeightHO[0][k] ;
        for( j = 1 ; j <= NumHidden ; j++ ) {
          SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;
        }
        Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
        Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
        DeltaO[k] = (Target[p][k] - Output[p][k]) * Output[p][k] * (1.0 - Output[p][k]) ;
      }
      for( j = 1 ; j <= NumHidden ; j++ ) {         /* 'back-propagate' errors to hidden layer */
        SumDOW[j] = 0.0 ;
        for( k = 1 ; k <= NumOutput ; k++ ) {
          SumDOW[j] += WeightHO[j][k] * DeltaO[k] ;
        }
        DeltaH[j] = SumDOW[j] * Hidden[p][j] * (1.0 - Hidden[p][j]) ;
      }
      for( j = 1 ; j <= NumHidden ; j++ ) {         /* update weights WeightIH */
        DeltaWeightIH[0][j] = eta * DeltaH[j] + alpha * DeltaWeightIH[0][j] ;
        WeightIH[0][j] += DeltaWeightIH[0][j] ;
        for( i = 1 ; i <= NumInput ; i++ ) {
          DeltaWeightIH[i][j] = eta * Input[p][i] * DeltaH[j] + alpha * DeltaWeightIH[i][j];
          WeightIH[i][j] += DeltaWeightIH[i][j] ;
        }
      }
      for( k = 1 ; k <= NumOutput ; k ++ ) {         /* update weights WeightHO */
        DeltaWeightHO[0][k] = eta * DeltaO[k] + alpha * DeltaWeightHO[0][k] ;
        WeightHO[0][k] += DeltaWeightHO[0][k] ;
        for( j = 1 ; j <= NumHidden ; j++ ) {
          DeltaWeightHO[j][k] = eta * Hidden[p][j] * DeltaO[k] + alpha * DeltaWeightHO[j][k] ;
          WeightHO[j][k] += DeltaWeightHO[j][k] ;
        }
      }
    }
(There is clearly plenty of scope for re-ordering, combining and simplifying the loops here - I will leave that for the reader to do once they have understood what the separate code sections are doing.) The weight changes DeltaWeightIH and DeltaWeightHO are each made up of two components. First, the eta component that is the gradient descent contribution. Second, the alpha component that is a 'momentum' term which effectively keeps a moving average of the gradient descent weight change contributions, and thus smoothes out the overall weight changes. Fixing good values of the learning parameters eta and alpha is usually a matter of trial and error. Certainly alpha must be in the range 0 to 1, and a non-zero value does usually speed up learning. Finding a good value for eta will depend on the problem, and also on the value chosen for alpha. If it is set too low, the training will be unnecessarily slow. Having it too large will cause the weight changes to oscillate wildly, and can slow down or even prevent learning altogether. (I generally start by trying eta = 0.1 and explore the effects of repeatedly doubling or halving it.)
The complete training process will consist of repeating the above weight updates for a number of epochs (using another for loop) until some error crierion is met, for example the Error falls below some chosen small number. (Note that, with sigmoids on the outputs, the Error can only reach exactly zero if the weights reach infinity! Note also that sometimes the training can get stuck in a 'local minimum' of the error function and never get anywhere the actual minimum.) So, we need to wrap the last block of code in something like

    for( epoch = 1 ; epoch < LARGENUMBER ; epoch++ ) {
      /* ABOVE CODE FOR ONE ITERATION */
      if( Error < SMALLNUMBER ) break ;
    }
If the training patterns are presented in the same systematic order during each epoch, it is possible for weight oscillations to occur. It is therefore generally a good idea to use a new random order for the training patterns for each epoch. If we put the NumPattern training pattern indices p in random order into an array ranpat[], then it is simply a matter of replacing our training pattern loop

    for( p = 1 ; p <= NumPattern ; p++ ) {
with

    for( np = 1 ; np <= NumPattern ; np++ ) {
      p = ranpat[np] ;
Generating the random array ranpat[] is not quite so simple, but the following code will do the job

    for( p = 1 ; p <= NumPattern ; p++ ) {         /* set up ordered array */
      ranpat[p] = p ;
    }
    for( p = 1 ; p <= NumPattern ; p++) {         /* swap random elements into each position */
      np = p + rando() * ( NumPattern + 1 - p ) ;
      op = ranpat[p] ; ranpat[p] = ranpat[np] ; ranpat[np] = op ;
    }
Naturally, one must set some initial network weights to start the learning process. Starting all the weights at zero is generally not a good idea, as that is often a local minimum of the error function. It is normal to initialize all the weights with small random values. If rando() is your favourite random number generator function that returns a flat distribution of random numbers in the range 0 to 1, and smallwt is the maximum absolute size of your initial weights, then an appropriate section of weight initialization code would be
    for( j = 1 ; j <= NumHidden ; j++ ) {         /* initialize WeightIH and DeltaWeightIH */
      for( i = 0 ; i <= NumInput ; i++ ) {
        DeltaWeightIH[i][j] = 0.0 ;
        WeightIH[i][j] = 2.0 * ( rando() - 0.5 ) * smallwt ;
      }
    }
    for( k = 1 ; k <= NumOutput ; k ++ ) {         /* initialize WeightHO and DeltaWeightHO */
      for( j = 0 ; j <= NumHidden ; j++ ) {
        DeltaWeightHO[j][k] = 0.0 ;
        WeightHO[j][k] = 2.0 * ( rando() - 0.5 ) * smallwt ;
      }
    }
Note, that it is a good idea to set all the initial DeltaWeights to zero at the same time.
We now have enough code to put together a working neural network program. I have cut and pasted the above code into the file nn.c (which your browser should allow you to save into your own file space). I have added the standard #includes, declared all the variables, hard coded the standard XOR training data and values for eta, alpha and smallwt, #defined an over simple rando(), added some print statements to show what the network is doing, and wrapped the whole lot in a main(){ }. The file should compile and run in the normal way (e.g. using the UNIX commands 'cc nn.c -O -lm -o nn' and 'nn').
I've left plenty for the reader to do to convert this into a useful program, for example:
  • Read the training data from file
  • Allow the parameters (eta, alpha, smallwt, NumHidden, etc.) to be varied during runtime
  • Have appropriate array sizes determined and allocate them memory during runtime
  • Saving of weights to file, and reading them back in again
  • Plotting of errors, output activations, etc. during training
There are also numerous network variations that could be implemented, for example:
  • Batch learning, rather than on-line learning
  • Alternative activation functions (linear, tanh, etc.)
  • Real (rather than binary) valued outputs require linear output functions
    • Output[p][k] = SumO[p][k] ;
      DeltaO[k] = Target[p][k] - Output[p][k] ;
  • Cross-Entropy cost function rather than Sum Squared Error
    • Error -= ( Target[p][k] * log( Output[p][k] ) + ( 1.0 - Target[p][k] ) * log( 1.0 - Output[p][k] ) ) ;
      DeltaO[k] = Target[p][k] - Output[p][k] ;
  • Separate training, validation and testing sets
  • Weight decay / Regularization
But from here on, you're on your own. I hope you found this page useful...

Implement a Neural Net

Implement a Neural Net


(Original image by Hljod.HuskonaCC BY-SA 2.0).
I used to hate neural nets. Mostly, I realise now, because I struggled to implement them correctly. Texts explaining the working of neural nets focus heavily on the mathematical mechanics, and this is good for theoretical understanding and correct usage. However, this approach is terrible for the poor implementer, neglecting many of the details that concern him or her.

Нейро на бейсике

https://cloud.mail.ru/public/9ef7abc2bf4b%2F%D0%9D%D0%B5%D0%B9%D1%80%D0%BE%D1%81%D0%B5%D1%82%D1%8C%20%D0%BD%D0%B0%20%D0%B2%D0%B8%D0%B7%D1%83%D0%B0%D0%BB%D0%B1%D0%B5%D0%B9%D1%81%D0%B8%D0%BA%D0%B5%2F
скачать