Dive into Neural Networks
James McCaffreyDownload the Code Sample
An artificial neural network (usually just called a neural network) is an abstraction loosely modeled on biological neurons and synapses. Although neural networks have been studied for decades, many neural network code implementations on the Internet are not, in my opinion, explained very well. In this month’s column, I’ll explain what artificial neural networks are and present C# code that implements a neural network.
The best way to see where I’m headed is to take a look at Figure 1 and Figure 2. One way of thinking about neural networks is to consider them numerical input-output mechanisms. The neural network in Figure 1 has three inputs labeled x0, x1 and x2, with values 1.0, 2.0 and 3.0, respectively. The neural network has two outputs labeled y0 and y1, with values 0.72 and -0.88, respectively. The neural network in Figure 1 has one layer of so-called hidden neurons and can be described as a three-layer, fully connected, feedforward network with three inputs, two outputs and four hidden neurons. Unfortunately, neural network terminology varies quite a bit. In this article, I’ll generally—but not always—use the terminology described in the excellent neural network FAQ at bit.ly/wfikTI.
Figure 1 Neural Network Structure
Figure 2 Neural Network Demo Program
Figure 2 shows the output produced by the demo program presented in this article. The neural network uses both a sigmoid activation function and a tanh activation function. These functions are suggested by the two equations with the Greek letters phi in Figure 1. The outputs produced by a neural network depend on the values of a set of numeric weights and biases. In this example, there are a total of 26 weights and biases with values 0.10, 0.20 ... -5.00. After the weight and bias values are loaded into the neural network, the demo program loads the three input values (1.0, 2.0, 3.0) and then performs a series of computations as suggested by the messages about the input-to-hidden sums and the hidden-to-output sums. The demo program concludes by displaying the two output values (0.72, -0.88).
I’ll walk you through the program that produced the output shown in Figure 2. This column assumes you have intermediate programming skills but doesn’t assume you know anything about neural networks. The demo program is coded using the C# language but you should have no trouble refactoring the demo code to another language such as Visual Basic .NET or Python. The program presented in this article is essentially a tutorial and a platform for experimentation; it does not directly solve any practical problem, so I’ll explain how you can expand the code to solve meaningful problems. I think you’ll find the information quite interesting, and some of the programming techniques can be valuable additions to your coding skill set.
Modeling a Neural Network
Conceptually, artificial neural networks are modeled on the behavior of real biological neural networks. In Figure 1 the circles represent neurons where processing occurs and the arrows represent both information flow and numeric values called weights. In many situations, input values are copied directly into input neurons without any weighting and emitted directly without any processing, so the first real action occurs in the hidden layer neurons. Assume that input values 1.0, 2.0 and 3.0 are emitted from the input neurons. If you examine Figure 1, you can see an arrow representing a weight value between each of the three input neurons and each of the four hidden neurons. Suppose the three weight arrows shown pointing into the top hidden neuron are named w00, w10 and w20. In this notation the first index represents the index of the source input neuron and the second index represents the index of the destination hidden neuron. Neuron processing occurs in three steps. In the first step, a weighted sum is computed. Suppose w00 = 0.1, w10 = 0.5 and w20 = 0.9. The weighted sum for the top hidden neuron is (1.0)(0.1) + (2.0)(0.5) + (3.0)(0.9) = 3.8. The second processing step is to add a bias value. Suppose the bias value is -2.0; then the adjusted weighted sum becomes 3.8 + (-2.0) = 1.8. The third step is to apply an activation function to the adjusted weighted sum. Suppose the activation function is the sigmoid function defined by 1.0 / (1.0 * Exp(-x)), where Exp represents the exponential function. The output from the hidden neuron becomes 1.0 / (1.0 * Exp(-1.8)) = 0.86. This output then becomes part of the weighted sum input into each of the output layer neurons. In Figure 1, this three-step process is suggested by the equation with the Greek letter phi: weighted sums (xw) are computed, a bias (b) is added and an activation function (phi) is applied.After all hidden neuron values have been computed, output layer neuron values are computed in the same way. The activation function used to compute output neuron values can be the same function used when computing the hidden neuron values, or a different activation function can be used. The demo program shown running in Figure 2 uses the hyperbolic tangent function as the hidden-to-output activation function. After all output layer neuron values have been computed, in most situations these values are not weighted or processed but are simply emitted as the final output values of the neural network.
Internal Structure
The key to understanding the neural network implementation presented here is to closely examine Figure 3, which, at first glance, might appear extremely complicated. But bear with me—the figure is not nearly as complex as it might first appear. Figure 3 shows a total of eight arrays and two matrices. The first array is labeled this.inputs. This array holds the neural network input values, which are 1.0, 2.0 and 3.0 in this example. Next comes the set of weight values that are used to compute values in the so-called hidden layer. These weights are stored in a 3 x 4 matrix labeled i-h weights where the i-h stands for input-to-hidden. Notice in Figure 1 that the demo neural network has four hidden neurons. The i-h weights matrix has a number of rows equal to the number of inputs and a number of columns equal to the number of hidden neurons.The array labeled i-h sums is a scratch array used for computation. Note that the length of the i-h sums array will always be the same as the number of hidden neurons (four, in this example). Next comes an array labeled i-h biases. Neural network biases are additional weights used to compute hidden and output layer neurons. The length of the i-h biases array will be the same as the length of the i-h sums array, which in turn is the same as the number of hidden neurons.
The array labeled i-h outputs is an intermediate result and the values in this array are used as inputs to the next layer. The i-h sums array has length equal to the number of hidden neurons.
Next comes a matrix labeled h-o weights where the h-o stands for hidden-to-output. Here the h-o weights matrix has size 4 x 2 because there are four hidden neurons and two outputs. The h-o sums array, the h-o biases array and the this.outputs array all have lengths equal to the number of outputs (two, in this example).
The array labeled weights at the bottom of Figure 3 holds all the input-to-hidden and hidden-to-output weights and biases. In this example, the length of the weights array is (3 * 4) + 4 + (4 * 2) + 2 = 26. In general, if Ni is the number of input values, Nh is the number of hidden neurons and No is the number of outputs, then the length of the weights array will be Nw = (Ni * Nh) + Nh + (Nh * No) + No.
Computing the Outputs
After the eight arrays and two matrices described in the previous section have been created, a neural network can compute its output based on its inputs, weights and biases. The first step is to copy input values into the this.inputs array. The next step is to assign values to the weights array. For the purposes of a demonstration you can use any weight values you like. Next, values in the weights array are copied to the i-h weights matrix, the i-h biases array, the h-o weights matrix and the h-o biases array. Figure 3 should make this relationship clear.The values in the i-h sums array are computed in two steps. The first step is to compute the weighted sums by multiplying the values in the inputs array by the values in the appropriate column of the i-h weights matrix. For example, the weighted sum for hidden neuron [3] (where I’m using zero-based indexing) uses each input value and the values in column [3] of the i-h weights matrix: (1.0)(0.4) + (2.0)(0.8) + (3.0)(1.2) = 5.6. The second step when computing i-h sum values is to add each bias value to the current i-h sum value. For example, because i-h biases [3] has value -7.0, the value of i-h sums [3] becomes 5.6 + (-7.0) = -1.4.
After all the values in the i-h sums array have been calculated, the input-to-hidden activation function is applied to those sums to produce the input-to-hidden output values. There are many possible activation functions. The simplest activation function is called the step function, which simply returns 1.0 for any input value greater than zero and returns 0.0 for any input value less than or equal to zero. Another common activation function, and the one used in this article, is the sigmoid function, which is defined as f(x) = 1.0 / (1.0 * Exp(-x)). The graph of the sigmoid function is shown in Figure 4.
Figure 4 The Sigmoid Function
After all the input-to-hidden output neuron values have been computed, those values serve as the inputs for the hidden-to-output layer neuron computations. These computations work in the same way as the input-to-hidden computations: preliminary weighted sums are calculated, biases are added and then an activation function is applied. In this example I use the hyperbolic tangent function, abbreviated as tanh, for the hidden-to-output activation function. The tanh function is closely related to the sigmoid function. The graph of the tanh function has an S-shaped curve similar to the sigmoid function, but tanh returns a value in the range (-1,1) instead of in the range (0,1).
Combining Weights and Biases
All of the neural network implementations I’ve seen on the Internet don’t maintain separate weight and bias arrays, but instead combine weights and biases into the weights matrix. How is this possible? Recall that the computation of the value of input-to-hidden neuron [3] resembled (i0 * w03) + (i1 * w13) + (i2 * w23) + b3, where i0 is input value [0], w03 is the weight for input [0] and neuron [3], and b3 is the bias value for hidden neuron [3]. If you create an additional, fake input [4] that has a dummy value of 1.0, and an additional row of weights that hold the bias values, then the previously described computation becomes: (i0 * w03) + (i1 * w13) + (i2 * w23) + (i3 * w33), where i3 is the dummy 1.0 input value and w33 is the bias. The argument is that this approach simplifies the neural network model. I disagree. In my opinion, combining weights and biases makes a neural network model more difficult to understand and more error-prone to implement. However, apparently I’m the only author who seems to have this opinion, so you should make your own design decision.Implementation
I implemented the neural network shown in Figures 1, 2 and 3 using Visual Studio 2010. I created a C# console application named NeuralNetworks. In the Solution Explorer window I right-clicked on file Program.cs and renamed it to NeuralNetworksProgram.cs, which also changed the template-generated class name to NeuralNetworksProgram. The overall program structure, with most WriteLine statements removed, is shown in Figure 5.
Figure 5 Neural Network Program Structure
using System;
namespace NeuralNetworks
{
class NeuralNetworksProgram
{
static void Main(string[] args)
{
try
{
Console.WriteLine("\nBegin Neural Network demo\n");
NeuralNetwork nn = new NeuralNetwork(3, 4, 2);
double[] weights = new double[] {
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,
-2.0, -6.0, -1.0, -7.0,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
-2.5, -5.0 };
nn.SetWeights(weights);
double[] xValues = new double[] { 1.0, 2.0, 3.0 };
double[] yValues = nn.ComputeOutputs(xValues);
Helpers.ShowVector(yValues);
Console.WriteLine("End Neural Network demo\n");
}
catch (Exception ex)
{
Console.WriteLine("Fatal: " + ex.Message);
}
}
}
class NeuralNetwork
{
// Class members here
public NeuralNetwork(int numInput, int numHidden, int numOutput) { ... }
public void SetWeights(double[] weights) { ... }
public double[] ComputeOutputs(double[] xValues) { ... }
private static double SigmoidFunction(double x) { ... }
private static double HyperTanFunction(double x) { ... }
}
public class Helpers
{
public static double[][] MakeMatrix(int rows, int cols) { ... }
public static void ShowVector(double[] vector) { ... }
public static void ShowMatrix(double[][] matrix, int numRows) { ... }
}
} // ns
The NeuralNetwork Class
The NeuralNetwork class definition starts:class NeuralNetwork
{
private int numInput;
private int numHidden;
private int numOutput;
...
private double[] inputs;
private double[][] ihWeights; // input-to-hidden
private double[] ihSums;
private double[] ihBiases;
private double[] ihOutputs;
private double[][] hoWeights; // hidden-to-output
private double[] hoSums;
private double[] hoBiases;
private double[] outputs;
...
Figure 6 shows how the NeuralNetwork class constructor is defined.
Figure 6 The NeuralNetwork Class Constructor
public NeuralNetwork(int numInput, int numHidden, int numOutput)
{
this.numInput = numInput;
this.numHidden = numHidden;
this.numOutput = numOutput;
inputs = new double[numInput];
ihWeights = Helpers.MakeMatrix(numInput, numHidden);
ihSums = new double[numHidden];
ihBiases = new double[numHidden];
ihOutputs = new double[numHidden];
hoWeights = Helpers.MakeMatrix(numHidden, numOutput);
hoSums = new double[numOutput];
hoBiases = new double[numOutput];
outputs = new double[numOutput];
}
The SetWeights method accepts an array of weights and bias values and populates ihWeights, ihBiases, hoWeights and hoBiases. The method begins like this:
public void SetWeights(double[] weights)
{
int numWeights = (numInput * numHidden) +
(numHidden * numOutput) + numHidden + numOutput;
if (weights.Length != numWeights)
throw new Exception("xxxxxx");
int k = 0;
...
for (int i = 0; i < numInput; ++i)
for (int j = 0; j < numHidden; ++j)
ihWeights[i][j] = weights[k++];
for (int i = 0; i < numHidden; ++i)
ihBiases[i] = weights[k++];
for (int i = 0; i < numHidden; ++i)
for (int j = 0; j < numOutput; ++j)
hoWeights[i][j] = weights[k++];
for (int i = 0; i < numOutput; ++i)
hoBiases[i] = weights[k++]
}
Computing the Outputs
The heart of the NeuralNetwork class is method ComputeOutputs. The method is surprisingly short and simple and begins:public double[] ComputeOutputs(double[] xValues)
{
if (xValues.Length != numInput)
throw new Exception("xxxxxx");
for (int i = 0; i < numHidden; ++i)
ihSums[i] = 0.0;
for (int i = 0; i < numOutput; ++i)
hoSums[i] = 0.0;
...
for (int i = 0; i < xValues.Length; ++i)
this.inputs[i] = xValues[i];
for (int j = 0; j < numHidden; ++j)
for (int i = 0; i < numInput; ++i)
ihSums[j] += this.inputs[i] * ihWeights[i][j];
...
for (int i = 0; i < numHidden; ++i)
ihSums[i] += ihBiases[i];
for (int i = 0; i < numHidden; ++i)
ihOutputs[i] = SigmoidFunction(ihSums[i]);
...
for (int j = 0; j < numOutput; ++j)
for (int i = 0; i < numHidden; ++i)
hoSums[j] += ihOutputs[i] * hoWeights[i][j];
for (int i = 0; i < numOutput; ++i)
hoSums[i] += hoBiases[i];
...
for (int i = 0; i < numOutput; ++i)
this.outputs[i] = HyperTanFunction(hoSums[i]);
double[] result = new double[numOutput];
this.outputs.CopyTo(result, 0);
return result;
}
The Activation Functions and Helper Methods
Here’s the code for the sigmoid function used to compute the input-to-hidden outputs:private static double SigmoidFunction(double x)
{
if (x < -45.0) return 0.0;
else if (x > 45.0) return 1.0;
else return 1.0 / (1.0 + Math.Exp(-x));
}
private static double HyperTanFunction(double x)
{
if (x < -10.0) return -1.0;
else if (x > 10.0) return 1.0;
else return Math.Tanh(x);
}
The static utility methods in class Helpers are just coding conveniences. The MakeMatrix method used to allocate matrices in the NeuralNetwork constructor allocates each row of a matrix implemented as an array of arrays:
public static double[][] MakeMatrix(int rows, int cols)
{
double[][] result = new double[rows][];
for (int i = 0; i < rows; ++i)
result[i] = new double[cols];
return result;
}
Next Steps
The code presented here should give you a solid basis for understanding and experimenting with neural networks. You might want to examine the effects of using different activation functions and varying the number of inputs, outputs and hidden layer neurons. You can modify the neural network by making it partially connected, where some neurons are not logically connected to neurons in the next layer. The neural network presented in this article has one hidden layer. It’s possible to create more complex neural networks that have two or even more hidden layers, and you might want to extend the code presented here to implement such a neural network.Neural networks can be used to solve a variety of practical problems, including classification problems. In order to solve such problems there are several challenges. For example, you must know how to encode non-numeric data and how to train a neural network to find the best set of weights and biases. I will present an example of using neural networks for classification in a future article.
Dr. James McCaffrey works
for Volt Information Sciences Inc., where he manages technical training
for software engineers working at Microsoft’s Redmond, Wash., campus.
He has worked on several Microsoft products including Internet Explorer
and MSN Search. He’s the author of “.NET Test Automation Recipes”
(Apress, 2006), and can be reached at jammc@microsoft.com.
Thanks to the following Microsoft technical experts for reviewing this article: Dan Liebling and Anne Loomis Thompson
Комментариев нет:
Отправить комментарий