Previous PageNext Page

4. Architectures and Learning Processes

Over the years a whole series of research groups have developed their own characteristic artificial neural networks. Some of these models are closer to their biological counterparts than others; some seem to emulate certain processes in the human brain very well, whereas others bear only a slight similarity to their biological forbears.
Our primary concern here however is not the faithfulness to reality of each neural network model. We wish rather to show which kinds of problems can be processed with the various models, and what capacities these neural networks offer to information processing. An artificial neural network can still have great significance for information processing, even if it bears but little relationship to biological systems.
As we mentioned in the introduction, neural networks can be applied to a whole series of problems: classification, modeling, association, and mapping. Each neural network is more or less suitable for handling these problem types; each has its own characteristic strengths and weaknesses. We wish expound upon this here and to develop some feeling for which network model is best applied to a particular task.
Three elements essentially characterize every neural network model:

1.

the arithmetic operation in a neuron

2.

the architecture of the net, that is, the way the individual neurons are connected

3.

the learning process that adapts the weights so that the correct result is obtained.

The model of a neuron mentioned in Section 2 is only one of many, albeit a very common one. Even the organization of neurons into layers need not necessarily be that described in Section 2. The learning process, though, is very tightly coupled to the architecture of the neural network. There are essentially two types of learning processes: learning with or learning without instruction ("supervised" and "unsupervised" learning).
In supervised learning the neural net is presented with a series of objects. The input data X from these objects is given to the net, along with the expected output values Y. The weights in the neural net are then adapted so that for any set p of known objects the output values Y conform as closely as possible to the expected values Y (Fig. 11).

Fig. 11. Supervised learning. A comparison with expected values yields an error d, which determines whether further adjustment cycles are necessary.

In unsupervised learning the input data are passed repeatedly through the network until it has stabilized, and until the input values map an object into certain areas of the network (Fig. 12).

Fig. 12. Unsupervised learning.

4.1. The Hopfield Model

The American physicist Hopfield brought new life into neural net research in 1982 with his model [16]. He pointed out analogies between neural networks and spin systems and was thereby able to apply a whole series of mathematical methods from theoretical physics to research into neural networks. In addition to this he is also responsible for the introduction of nonlinear transfer functions.
The Hopfield net performs one of the most interesting functions of the human brain: it can associate. This means that stored pictures (or any other kind of complex information that can be represented as a multidimensional vector or matrix) can be recognized even from partial information or distorted versions of the original picture. For example, one could recognize one particular face from a collection of faces, even if only the eyes and nose are shown.
The Hopfield net is a one-layered model that has exactly the same number of neurons as input nodes. Because every input node is connected to every neuron (Fig. 7), m x m weights must be determined for m input nodes. The original Hopfield model works with bipolar input data (+ 1 or - 1) (f). The net result in neuron j is attained by the multiplication of all input signals xi by the weights wji for that neuron, as in Equation (d). The transfer function here is a simple step function (Fig. 13) as may he produced by the sign of the value.

(f)

Fig. 13. A step function as a transfer function.

The output signal of a neuron, outj, in a Hopfield net is given in Equation (g), where as already mentioned, xi can only assume the values +1 and -1.

(g)

In order to understand the learning process in Hopfield networks we introduce an example at this point: We require a Hopfield net to learn the four pictures in Figure 14.

Fig. 14. Four pictures X1, X2, X3, and X4 used to train a Hopfield network.

Each picture consists of 4 x 4 fields (pixels) that are either black (+1) or white (-1), so that each picture may be represented as a 4 x 4 = 16 dimensional vector. Figure 15 shows the architecture of the corresponding Hopfield net and the input of the first picture with values represented by -1 (for a white pixel) and +1 (for a black one). In the picture used for training, the output values are exactly equal to the input values.

Fig. 15. Architecture of the Hopfield net complete with input and output of the first picture of Figure 14. The given wji values are determined from Equation (h) for the four pictures of Figure 4.

The weights w in a Hopfield net do not have to be derived from a costly, iterative learning process, but can be calculated directly from the pictures: If p pictures are presented to the Hopfield net, then the weights wji for the neuron j may be derived from the input values of the individual pictures s and thus Xs according to (h) and (i).

(h)

(i)

This means that the weight wji increases by 1 if, in a particular picture, the pixels j and i are either both black or both white, and decreases by 1 if, in that picture, the ith and jth pixels are of different colors. The more pictures there are in which pixels j and i match, the greater is the weight wji. The 4 x 4 pictures from Figure 14 are stored accordingly in a Hopfield net consisting of 16 neurons with 16 weights in a weight-matrix of dimensions 16 x 16.
We should first test whether a Hopfield net has stabilized. For this purpose an input vector (e.g., one of the four pictures from Fig. 14) is fed into the Hopfield net, and the output values derived from Equation (g). These output values are compared to the input values. If they are equal then the process can terminate, otherwise the output values are fed into the net again as new input values and the process is repeated (Fig. 16). If after a few cycles one receives as output the same values as the original input, then one can say that the net has stabilized. Of course a Hopfield net is not intended simply to reproduce its own input values as output; this is just a stability test.

Fig. 16. Testing a Hopfield net for stability.

The true value of a Hopfield net becomes evident in the retrieval of the original stored data from incomplete or distorted data; we can for instance retrieve an original picture in the Hopfield net from a blurred or spoiled picture.
We shall investigate this phenomenon with the four pictures from Figure 14 and a Hopfield net in which they are stored in the form of 16 x 16 weights. We produce distorted pictures by altering a certain number of pixels in the original pictures, that is, by changing fields from black to white and vice versa. In Figure 17 we have shown the results obtained when each picture is altered by two, five, or indeed even thirteen pixels.

Fig. 17. Search for original pictures (showed at the top) stored in a Hopfjeld network after input of pictures with a) two, b) five, and c) thirteen altered fields. N is the number of iterations required to arrive at the result (bottom line in a-c) when presented with the erroneous input (top line in a-c).

One can see that with a distortion of two pixels the original pictures are retrieved correctly after only 1-2 iterations. When five fields are altered (a noise level of 31 %!), the original pictures are still found successfully, but only after 3 - 5 iterations. This is not always so; we have observed cases with a five-pixel distortion where the wrong picture is output, or the right picture in negative form (black and white pixels swapped), or an oscillation occurs between two patterns that represent none of the stored pictures, so that the operation has to be aborted.
If thirteen pixels in the pictures are altered (81 % distortion) the original pictures are retrieved after 2-3 iterations as negatives. How does this come about? After all, significantly more than half the fields have been altered in color. It is remarkable that the negative of the original picture is returned, and not the negative of some other picture.
Thus we have seen that as simple a model as a Hopfield net can still reproduce one of the human brain's more interesting faculties, namely the power of association. The Hopfield net does however have one significant drawback: The number of patterns (pictures, vectors) that can be stored is severely limited. In order to store more pictures an increased number of neurons is required; thus the size of the weight matrix grows very rapidly.

4.2. An Adaptive Bidirectional Associative Memory

With the Hopfield net we have shown that a neural network has the power to associate. An adaptive bidirectional associative memory (or ABAM) [21] can do this as well. We shall now show, however, that an ABAM is able, in addition, to combine patterns. An ABAM is, like a Hopfield net, a one-layered network. The number of output neurons n is however usually much lower than the number of input units m.
In order to simplify the notation, and because in the case of ABAMs the meanings of input and output are apt to blur, in the following sections we refer to an input vector as X and the output values as Y. Thus we always observe pairs of input (Xs) and output (Ys) values (the index s characterizes the individual input values and their corresponding outputs):

Xs=(xs1, xs2, ... xsj, ... xsm)

Ys=(ys1, ys2, ... ysi, ... ysn)

Given a series of such pairs {Xs, Ys } where it is known which Ys value is to be expected from a particular Xs value, the weights are determined in a supervised learning process.
First starting values are calculated for the weights according to Equation (j). The weight matrix is now no longer square, as in the Hopfield net, but rather rectangular. It has dimensions m x n.

(j)

The basic idea of learning with an ABAM net is that one can multiply an m x n matrix in two different ways: the standard way by multiplying by an m-dimensional vector, which results in an n-dimensional vector, or in the alternative transposed form, by multiplying by an n-dimensional vector to give an m-dimensional vector (Fig. 18).

Fig. 18. The learning process is an adaptive bidirectional associative memory (ABAM).

The learning process is as follows: An initial weight matrix W(0) is calculated from the predefined pairs {X(0), Y(0)}. From X(0) and W(0) the output values Y(1) are produced, which do not as yet correspond to the goal values. Therefore these Y(1) values are multiplied by the transposed matrix W(0)T to give a set of X(1) values. Applying Equation (j) to the pairs {X(1), Y(1)} we calculate a new weight matrix W(1). This process is repeated until the pair {X(t), Y(t)} corresponds to the predefined values {X(0), Y(0)}.
In the procedure the Y values are calculated in the usual way. Here, too, the individual x and y values are bipolar (+1, -1) [Eq. (k) and (l)].

(k)

(l)

As with the Hopfield net we also use simple pictures here as an illustration of how an ABAM may be applied. Pictures made of 5 x 5 fields, where each field may each be black (+1) or white (-1), will serve as input information. These can thus be represented by a 25-dimensional vector, so that 25 input units will be required. We use only five such pictures and identify them on the output side by a 5-dimensional vector. Each of the five patterns (pictures) is assigned one of the five positions in the vector, thus in the case of the first picture the first position is 1 and all the others are zero (10000), the second picture is represented by (01000) etc. (We are actually dealing with bipolar values +1 and -1 here, but for clarity's sake we use binary notation (1, 0) in the following sections). Figure 19 shows the five patterns and their identifications. The ABAM has a 25 x 5 architecture, and thus 25 x 5 weights.

Fig. 19. Five pictures used to train the ABAM, their symbols, and vector representations.

An ABAM was trained with these five pictures and the five-dimensional vector (consisting of four zeros and one 1 in each case). Then the ABAM's ability to associate distorted pictures with stored undistorted ones was put to the test. All possible 1-pixel errors were generated; in the case of five pictures each of 5 x 5 pixels this makes a total of 125 different distorted pictures. The ABAM identified the correct undistorted output picture in all cases.
Next the ability of the ABAM to combine individual pieces of information was tested: that is, can it recognize, when presented with pictures made of two patterns, which combination of two pictures it is, despite previously having been presented with the patterns only singly? Figure 20 shows all ten permutations of pairs for the five patterns, with the answer from the ABAM written underneath as a five-bit vector. In eight cases the ABAM returns a 1 in exactly the right two positions. Thus, for example, for the first combined pattern, which consists of the patterns 1 and 2, it returns the answer (11000), meaning that first and second patterns are contained in it. In the cases of the other two pattern combinations (1 + 3 and 3 + 5) the ABAM also recognizes which individual patterns are present in the combination, but makes the mistake of identifying a third pattern. If we generate these three-pattern combinations, as demonstrated in the bottom row of Figure 20, they differ very little from the two-pattern combinations (1 pixel difference). The ABAM's answers in these two cases are therefore not very far from the truth.

Fig. 20. All combinations of two pictures from Figure 19. The two cases where three bits were activated erroneously are compared with the combination of three pictures drawn in the bottom row (see text for details).

The ABAM's ability to combine two patterns has been discussed here in more detail for two reasons: Firstly, the ability to recognize that a piece of information is a combination of other pieces of information is, of course, an important capability of the human brain. Therefore if the ABAM can do this too, we have reproduced an important aspect of biological, neural information processing. Secondly, the ability to combine single pieces of information is required in many problem tasks in chemistry, especially in the study of relationships between structure and spectral data: If a neural network learns this relationship from pairs of spectra and structures, it should be able to derive all the component substructures from a new spectrum, even if it has never "seen" this particular combination of substructures previously.

4.3. The Kohonen Network

4.3.1. Principles

T. Kohonen [22][23] developed a neural network which of all models has the greatest similarity to its biological counterpart [24]. This is especially true of the way in which the brain processes sensory signals.
There is a broad strip of tissue in the cerebral cortex which specializes in the perception of touch and is known as the somatosensory cortex. Particular areas in this tissue are responsible for particular parts of the body; parts of the body that carry the most sensory receptors are assigned correspondingly large, contiguous areas, whereas areas of skin which are supplied with relatively few sensory nerves are assigned only small areas, even if the part of the body in question is actually very large by comparison. In addition, neighboring parts of the body are assigned neighboring regions in the somatosensory cortex, so that for the sense of touch there is a contorted mapping of the body surface onto the brain (Fig. 21).

Fig. 21. Maps of the human body (top) in the somatosensory cortex of the brain (bottom).

Kohonen introduced the concept of a "self-organized topological feature map" that is able to generate such mappings. Briefly, these are two-dimensional arrays of neurons that reflect as well as possible the topology of information, that is, the relationships between individual pieces of data and not their magnitude. Using the Kohonen model we are able to create a mapping of multidimensional information onto a layer of neurons, that preserves the essential content of the information (relationships). The whole process therefore represents a kind of abstraction. Figure 22 shows the two-dimensional organization of the neurons in a Kohonen net.

Fig. 22. Two-dimensional assignment of the neurons in a Kohonen network.

In this context mapping of information means that the similarity of a pair of signals is expressed in the proximity or "neighborhood relation" of the neurons activated by them: the more alike two signals are, the closer together the neurons they activate should lie. However, as we are talking here about topological and not Euclidean distance, a neuron in a square array (Fig. 23a) has eight neighbors in the first "sphere", since the neuron has eight direct neighbors. In a Kohonen network where the neurons are arrayed in a square the spheres of neighborhood grow through the network as shown in Figure 23b.

Fig. 23. Proximity relations of the neurons in a Kohonen network. a) First sphere of neighbors; b) growth of the spheres; c) neurons at the edge of the network.

We must take the discussion on the topology of the Kohonen net a little further before we can proceed to the learning algorithm because the topology is the decisive factor in a Kohonen net. If every neuron is to have the same number of neighbors, a square, planar array is poorly suited to the task because the neurons at the edges have fewer neighbors than those in the center of the net (Fig. 23c).
From a square or rectangular array of elements we can nevertheless easily construct a topology in which each element has exactly the same number of neighbors. In order to do this we must wrap the surface around on itself and connect ("glue together") its opposite edges. Thus from the surface we produce first a cylinder and then a torus, as shown in Figure 24.

Fig. 24. The transformation of a rectangular array of neurons into a torus in which each neuron has the same number of neighbors.

In a torus each element has the same number of neighbors: eight in the first circle, 16 in the second etc. Of course it is not easy to display in its entirety a mapping onto a torus. We shall therefore continue to represent Kohonen networks as flat surfaces in the knowledge that if one arrives at one of the edges the surface carries on over to the opposite edge (Fig. 25). Both the filled-in squares are therefore immediate neighbors. The same is true in the horizontal axis for those fields marked by crosses.

Fig. 25. Representation of the surface of a torus on a plane. The topmost edge wraps onto the bottom, and the left edge wraps onto the right.

The topology of the Kohonen net has been discussed a little more fully here in order to understand the examples better (see Sections 10.1 and 10.2).

4.3.2. The Learning Process

Learning in a Kohonen net is a competitive process: all the neurons compete to be stimulated by the input signal. The input signal is an object that is described by m single values and can therefore be interpreted as a point in an m-dimensional space, projected onto a plane. Only one single neuron is finally chosen as the "best" ("winner takes all"). Different criteria are applied to find the "best" neuron - the central neuron c. Very often the neuron whose weights are most similar to the input signal Xk is chosen [Eq. (m)].

(m)

For this central neuron c, the weights wjc, are corrected so that the output value becomes even more similar to the input signal. The weights on the other neurons are also corrected, albeit proportionally less the further they are from the strongly stimulated (central) neuron. Here it is the topological distance which counts, that is, which sphere of neighborhood (as reckoned outwards from the central neuron) includes the neuron under consideration (compare Fig. 23).
Then the process is repeated with the next input data. Every object (that is, every point in the m-dimensional space) stimulates a very specific neuron, so that every object is assigned a definite point in the Kohonen net.
A simple example, the mapping of the surface of a sphere onto a Kohonen net, will explain the workings and the results of the Kohonen net in more detail (Fig. 26). The spherical surface was divided into eight sectors, as shown in Figure 26b; a point on this surface is defined by its three coordinates (x, y, z values). An array of 15 x 15 = 225 neurons was used to make a Kohonen net, which therefore had three input units distributing their data over 225 neurons. Thus a total of 3 x 225 = 675 weights had to be determined.

Fig. 26. The representation of the surface of a sphere in a Kohonen network. The sphere's surface is divided into eight sectors. Each point on the surface of the sphere is characterized by its assignment to one of these sectors. The point marked in the sphere at the bottom belongs to sector number 4.

Two thousand points were chosen at random from the spherical surface, and their sets of x, y, and z coordinates used severally to train the Kohonen net. For the graphical representation of the net that had developed after the learning of these 2000 points, each point was identified by the number of the spherical sector from whence it came. (This information was not used in the learning process, however, but only at the end for identifying the points.) Because 2000 points have to be mapped onto 225 fields there are several points mapped onto each field. As it turns out, however, at the end of the learning process only points from the same sector of the sphere arrive at the same field, and points from neighboring regions of the spherical surface are projected onto neighboring fields. Figure 27 shows the resulting Kohonen net.

Fig. 27. Result of the projection of a spherical surface onto a Kohonen network.

Fields with the same number, that is, points from the same sector of the sphere, form contiguous areas in the Kohonen net. It must be remembered that the surface shown here actually forms a torus (Fig. 24); therefore points on the left-hand edge are continued over onto the right hand edge (Fig. 25). Thus the two fields on the left-hand edge marked with the number 8 do indeed have a direct connection to the remaining fields with the number 8. Some fields (neurons) in the Kohonen net were not assigned any points from the spherical surface. That the spherical surface was mapped onto the Kohonen net while preserving the neighborhood relations between the points on the sphere may also be seen from the details of the Kohonen net as shown in Figure 28. Sectors on the sphere which meet at lines of longitude or at the equator are also neighbors in their projection on the Kohonen net, and hold whole borders in common. In several regions four fields meet together; these regions or points correspond to the points where the coordinate axes break through the spherical surface. For example at the circle in Figure 28 (bottom row, center) the sectors 1, 2, 3 and 4 converge. This corresponds to the "North pole" of the sphere.

Fig. 28. Areas on the Kohonen map of Figure 27 where four sectors meet, highlighted by circles.

The mapping of a sphere onto a Kohonen net has been explained in this detail to show how a simple three-dimensional body can be mapped onto a plane. This illustrates well the way a Kohonen net functions as a topology-preserving map and lays the foundation for a good understanding of the application of Kohonen networks to examples from the field of chemistry.

4.4. Back-Propagation

The majority of neural net applications uses the "back-propagation" algorithm. This algorithm does not represent any particular kind of network architecture (a multilayered net is generally used) but rather a special learning process. Although this method was not introduced until 1986 by Rumelhart, Hinton, and Williams [17] it quickly gained widespread popularity and contributed decisively in the eventual triumph of neural networks. A few years ago a study carried out on publications about neural networks in chemistry showed that in 90% the back-propagation algorithm was used [2].
The attraction of learning through back-propagation stems from the fact that adjustments to the neural net's weights can be calculated on the basis of well-defined equations. Nevertheless this procedure for correcting errors has very little in common with those processes responsible for the adjustment of synaptic weights in biological systems.
The back-propagation algorithm may be used for one- or multilayered networks and is a supervised learning process. The input data are passed through the layers; the output data of a layer l, Outl form the input data Xl+1 of the layer l+1. The results for the input data should eventually be delivered by the final layer. This will, however, not be the case at first. The output data, Outlast, of the final layer are therefore compared with the expected values Y and the error determined. This error is now used to correct the weights in the output layer. Next the weights in the penultimate layer are corrected with regard to the error from the final layer. Thus the error is fed back layer by layer, from bottom to top, and used to correct the weights at each level (Fig. 29). The error therefore flows counter to the direction of the input data; hence the name back-propagation, or loosely formulated, "error-correction in reverse" [25]. In the following sections we demonstrate the basic features of the back-propagation algorithm. For a detailed explanation we refer the reader to the literatures [1][17][18][25].

Fig. 29. The learning process of the back-propagation algorithm. The weights are corrected by feeding the errors back into the network.

The back-propagation algorithm is intended to change the weights until the error in the output values Out is minimized; that is, they must correspond as closely as possible to the provided values Y.
In the case of the last layer the error can be determined directly because the value Y, which is the expected output value, is known. The weight adjustments for the final layer, Dwjilast, are determined by deriving the error e [Eq. (n)] according to the individual weights. The chain rule leads to Equations (o) and (p).

(n)

(o)

(p)

In the hidden layers the error is not directly known because it is not known what output values, Out1, should be produced by those layers. At this point we make the assumption that the error from the layer below was distributed evenly over all the weights in the layer above. This assumption enables the error in one layer to be calculated from the error in the layer below (which has just been calculated). This is the basis of the back-propagation algorithm: The error is carried back through the individual layers (hence "back-propagation of errors") thus enabling us to correct the weights in each layer.
All in all we derive the closed form (q) for correcting the weights of a single layer. Here h is a parameter, the learning rate, that determines how rapidly a neural net learns.

(q)

Usually a value between 0.1 and 0.9 is chosen for it. Frequently when correcting the weights, the changes in weights from the cycle before (previous) are also taken into account. This is accomplished by extending the Equation (q) by the contribution mDwjil(previous). The parameter m is the momentum term. It determines the extent to which the previous weight changes are taken into account; it gives the learning process a certain capacity for inertia. The smaller m is, the quicker previous changes in weights are forgotten. It can be shown that the sum of h and m should he about 1 [26].

Previous PageNext Page


Johann.Gasteiger@chemie.uni-erlangen.de