John Bullinaria's Step by Step Guide to Implementing a Neural Network in C
By John A. Bullinaria from the School of Computer Science of The University of Birmingham, UK.
This document contains a step by step guide to implementing a simple neural network in C. It is aimed mainly at students who wish to (or have been told to) incorporate a neural network learning component into a larger system they are building. Obviously there are many types of neural network one could consider using - here I shall concentrate on one particularly common and useful type, namely a simple three-layer feed-forward back-propagation network (multi layer perceptron).This type of network will be useful when we have a set of input vectors and a corresponding set of output vectors, and our system must produce an appropriate output for each input it is given. Of course, if we already have a complete noise-free set of input and output vectors, then a simple look-up table would suffice. However, if we want the system to generalize, i.e. produce appropriate outputs for inputs it has never seen before, then a neural network that has learned how to map between the known inputs and outputs (i.e. our training set) will often do a pretty good job for new inputs as well.
I shall assume that the reader is already familiar with C, and, for more details about neural networks in general, simply refer the reader to the newsgroup comp.ai.neural-nets and the associated Neural Networks FAQ. So, let us begin...
A single neuron (i.e. processing unit) takes it total input In and produces an output activation Out. I shall take this to be the sigmoid function
- Out = 1.0/(1.0 + exp(-In));
/* Out = Sigmoid(In) */
- Sigmoid_Derivative = Sigmoid * (1.0 - Sigmoid) ;
-
Layer2In = Weight[0] ;
/* start with the bias */
for( i = 1 ; i <= NumUnits1 ; i++ ) { /* i loop over layer 1 units */
- Layer2In += Layer1Out[i] * Weight[i] ; /* add in weighted contributions from layer 1 */
Layer2Out = 1.0/(1.0 + exp(-Layer2In)) ; /* compute sigmoid to give activation */
-
Layer2In[j] = Weight[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {
- Layer2In[j] += Layer1Out[i] * Weight[i][j] ;
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
- double Layer1Out[NumUnits1+1] ;
double Layer2In[NumUnits2+1] ;
double Layer2Out[NumUnits2+1] ;
double Weight[NumUnits1+1][NumUnits2+1] ;
-
for( j = 1 ; j <= NumUnits2 ; j++ ) {
- Layer2In[j] = Weight[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {
- Layer2In[j] += Layer1Out[i] * Weight[i][j] ;
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
-
for( j = 1 ; j <= NumUnits2 ; j++ ) {
/* j loop computes layer 2 activations */
- Layer2In[j] = Weight12[0][j] ;
for( i = 1 ; i <= NumUnits1 ; i++ ) {
- Layer2In[j] += Layer1Out[i] * Weight12[i][j] ;
Layer2Out[j] = 1.0/(1.0 + exp(-Layer2In[j])) ;
for( k = 1 ; k <= NumUnits3 ; k++ ) { /* k loop computes layer 3 activations */
- Layer3In[k] = Weight23[0][k] ;
for( j = 1 ; j <= NumUnits2 ; j++ ) {
- Layer3In[k] += Layer2Out[j] * Weight23[j][k] ;
Layer3Out[k] = 1.0/(1.0 + exp(-Layer3In[k])) ;
-
for( j = 1 ; j <= NumHidden ; j++ ) {
/* j loop computes hidden unit activations */
- SumH[j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {
- SumH[j] += Input[i] * WeightIH[i][j] ;
Hidden[j] = 1.0/(1.0 + exp(-SumH[j])) ;
for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop computes output unit activations */
- SumO[k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {
- SumO[k] += Hidden[j] * WeightHO[j][k] ;
Output[k] = 1.0/(1.0 + exp(-SumO[k])) ;
- Input[p][i] , Target[p][k]
- Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) {
- for( k = 1 ; k <= NumOutput ; k++ ) {
- Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
- Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) { /* p loop over training patterns */
- for( j = 1 ; j <= NumHidden ; j++ ) {
/* j loop over hidden units */
- SumH[p][j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {
- SumH[p][j] += Input[p][i] * WeightIH[i][j] ;
Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;
for( k = 1 ; k <= NumOutput ; k++ ) { /* k loop over output units */
- SumO[p][k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {
- SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;
Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ; /* Sum Squared Error */
The next stage is to iteratively adjust the weights to minimize the network's error. A standard way to do this is by 'gradient descent' on the error function. We can compute how much the error is changed by a small change in each weight (i.e. compute the partial derivatives dError/dWeight) and shift the weights by a small amount in the direction that reduces the error. The literature is full of variations on this general approach - I shall begin with the 'standard on-line back-propagation with momentum' algorithm. This is not the place to go through all the mathematics, but for the above sum squared error we can compute and apply one iteration (or 'epoch') of the required weight changes DeltaWeightIH and DeltaWeightHO using
- Error = 0.0 ;
for( p = 1 ; p <= NumPattern ; p++ ) { /* repeat for all the training patterns */
- for( j = 1 ; j <= NumHidden ; j++ ) {
/* compute hidden unit activations */
- SumH[p][j] = WeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {
- SumH[p][j] += Input[p][i] * WeightIH[i][j] ;
Hidden[p][j] = 1.0/(1.0 + exp(-SumH[p][j])) ;
for( k = 1 ; k <= NumOutput ; k++ ) { /* compute output unit activations and errors */
- SumO[p][k] = WeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {
- SumO[p][k] += Hidden[p][j] * WeightHO[j][k] ;
Output[p][k] = 1.0/(1.0 + exp(-SumO[p][k])) ;
Error += 0.5 * (Target[p][k] - Output[p][k]) * (Target[p][k] - Output[p][k]) ;
DeltaO[k] = (Target[p][k] - Output[p][k]) * Output[p][k] * (1.0 - Output[p][k]) ;
for( j = 1 ; j <= NumHidden ; j++ ) { /* 'back-propagate' errors to hidden layer */
- SumDOW[j] = 0.0 ;
for( k = 1 ; k <= NumOutput ; k++ ) {
- SumDOW[j] += WeightHO[j][k] * DeltaO[k] ;
DeltaH[j] = SumDOW[j] * Hidden[p][j] * (1.0 - Hidden[p][j]) ;
for( j = 1 ; j <= NumHidden ; j++ ) { /* update weights WeightIH */
- DeltaWeightIH[0][j] = eta * DeltaH[j] + alpha * DeltaWeightIH[0][j] ;
WeightIH[0][j] += DeltaWeightIH[0][j] ;
for( i = 1 ; i <= NumInput ; i++ ) {
- DeltaWeightIH[i][j] = eta * Input[p][i] * DeltaH[j] + alpha * DeltaWeightIH[i][j];
WeightIH[i][j] += DeltaWeightIH[i][j] ;
for( k = 1 ; k <= NumOutput ; k ++ ) { /* update weights WeightHO */
- DeltaWeightHO[0][k] = eta * DeltaO[k] + alpha * DeltaWeightHO[0][k] ;
WeightHO[0][k] += DeltaWeightHO[0][k] ;
for( j = 1 ; j <= NumHidden ; j++ ) {
- DeltaWeightHO[j][k] = eta * Hidden[p][j] * DeltaO[k] + alpha * DeltaWeightHO[j][k] ;
WeightHO[j][k] += DeltaWeightHO[j][k] ;
The complete training process will consist of repeating the above weight updates for a number of epochs (using another for loop) until some error crierion is met, for example the Error falls below some chosen small number. (Note that, with sigmoids on the outputs, the Error can only reach exactly zero if the weights reach infinity! Note also that sometimes the training can get stuck in a 'local minimum' of the error function and never get anywhere the actual minimum.) So, we need to wrap the last block of code in something like
-
for( epoch = 1 ; epoch < LARGENUMBER ; epoch++ ) {
- /* ABOVE CODE FOR ONE ITERATION */
if( Error < SMALLNUMBER ) break ;
- for( p = 1 ; p <= NumPattern ; p++ ) {
- for( np = 1 ; np <= NumPattern ; np++ ) {
- p = ranpat[np] ;
- for( p = 1 ; p <= NumPattern ; p++ ) {
/* set up ordered array */
- ranpat[p] = p ;
for( p = 1 ; p <= NumPattern ; p++) { /* swap random elements into each position */
- np = p + rando() * ( NumPattern + 1 - p ) ;
op = ranpat[p] ; ranpat[p] = ranpat[np] ; ranpat[np] = op ;
- for( j = 1 ; j <= NumHidden ; j++ ) {
/* initialize WeightIH and DeltaWeightIH */
- for( i = 0 ; i <= NumInput ; i++ ) {
- DeltaWeightIH[i][j] = 0.0 ;
WeightIH[i][j] = 2.0 * ( rando() - 0.5 ) * smallwt ;
for( k = 1 ; k <= NumOutput ; k ++ ) { /* initialize WeightHO and DeltaWeightHO */
- for( j = 0 ; j <= NumHidden ; j++ ) {
- DeltaWeightHO[j][k] = 0.0 ;
WeightHO[j][k] = 2.0 * ( rando() - 0.5 ) * smallwt ;
We now have enough code to put together a working neural network program. I have cut and pasted the above code into the file nn.c (which your browser should allow you to save into your own file space). I have added the standard #includes, declared all the variables, hard coded the standard XOR training data and values for eta, alpha and smallwt, #defined an over simple rando(), added some print statements to show what the network is doing, and wrapped the whole lot in a main(){ }. The file should compile and run in the normal way (e.g. using the UNIX commands 'cc nn.c -O -lm -o nn' and 'nn').
I've left plenty for the reader to do to convert this into a useful program, for example:
- Read the training data from file
- Allow the parameters (eta, alpha, smallwt, NumHidden, etc.) to be varied during runtime
- Have appropriate array sizes determined and allocate them memory during runtime
- Saving of weights to file, and reading them back in again
- Plotting of errors, output activations, etc. during training
- Batch learning, rather than on-line learning
- Alternative activation functions (linear, tanh, etc.)
- Real (rather than binary) valued outputs require linear output functions
- Cross-Entropy cost function rather than Sum Squared Error
- Separate training, validation and testing sets
- Weight decay / Regularization
- Output[p][k] = SumO[p][k] ;
- DeltaO[k] = Target[p][k] - Output[p][k] ;
- Error -= ( Target[p][k] * log( Output[p][k] ) + ( 1.0 - Target[p][k] ) * log( 1.0 - Output[p][k] ) ) ;
- DeltaO[k] = Target[p][k] - Output[p][k] ;
Комментариев нет:
Отправить комментарий