2. Neurons and Networks

In this review the term neural networks always refers to "artificial neural networks", because these were developed in order to emulate the biological neural networks of the human brain. However for simplicity the epithet "artificial" is omitted here.

2.1. A Model of a Neuron

Neural networks consist of subelements, the neurons, which are connected together to form a network. The artificial neuron is supposed to model the functions of the biological nerve cell. Although there are at least five physiologically distinct types of nerve cell, we need only present one type here (Fig. 2), since we discuss only the basic structure of a neuron; the physiological processes - and the chemical processes [3] that cause them - cannot be examined in more detail.
The nerve's cell body possesses a large number of branches, known as dendrites, which receive the signals and pass them on to the cell body. Here the signals are accumulated, and when a particular threshold limit has been exceeded, the neuron "fires". An electrical excitation is transmitted across the axon. At its end each axon has contact with the dendrites of the neighboring neurons; this contact point is called the synapse. Neurons are linked with each other across these synapses.

Fig. 2. Much simplified scheme of a nerve cell. The number of dendrites and the number of branches in the dendrites are much higher in reality.

The synapses, however, also present a barrier that alters the intensity of the signal during transmission. The degree of alteration is determined by the synaptic strength. An input signal of intensity x_i has an intensity of s_i after crossing synapse i of strength w_i [Eq. (a), Fig. 3]. The synaptic strength may change, even between one impulse and the next.

(a)

Fig. 3. Transformation of an input signal x_i on passage through a synapse of strength w_i.

Each neuron has a large number of dendrites, and thus receives many signals simultaneously. These m signals combine into one collective signal. It is not yet known exactly how this net signal, termed Net, is derived from the individual signals.
For the development of artificial neurons the following assumptions are made:

1.

The Net is a function of all those signals that arrive at the neuron within a certain time interval, and of all the synaptic strengths.

2.

The function is usually defined as the sum of the signals s_i, which in turn is given by the product of the input signals x_i (i=1, ... m) and the synaptic strengths w_i (i = 1, ... m), now referred to as weights [Eq. (b)]. Figure 4 shows the model of a neuron as developed up to this point.

(b)

Fig. 4. First stage of a model of a neuron.

The net signal Net is, however, not yet the signal that is transmitted, because this collective value Net can be very large, and in particular, it can also be negative. It is especially the latter property that cannot be a good reflection of reality. A neuron may fire or not, but what is the meaning of a negative value? In order to attain a more realistic model, the value of Net is modified by a transfer function. In most cases a sigmoid function, also known as a logistic or Fermi function, is used. With this transfer function the range of values for the output signal out [Eq. (c)] is restricted to between zero and one, regardless of whether Net is large or small or negative.

(c)

Most important, we now have a nonlinear relationship between input and output signals, and can therefore represent nonlinear relationships between properties, a task which can often only be carried out with difficulty by statistical means. Moreover, in a and J we now have two parameters with which to influence
the function of the neuron (Fig. 5).

Fig. 5. Influence of a (a) or J on the output signal out_j, defined as in Equation (c).

The transfer function completes the model of the neuron. In Figure 6a the synaptic strengths, or weights w are still depicted as in Figure 4; in the following figures they will no longer be shown, as in Figure 6b, but must of course still be used.

Fig. 6. Complete model of a neuron a) with and b) without explicitly defined synapse strengths w.

Symbols and Conventions

The literature on neural networks uses a confusing array of terminologies and symbols. In order that the reader may better compare the individual networks, a standard nomenclature will be followed throughout this article:

-

Magnitudes that consist of a single value (scalar magnitudes) will be represented by lower-case letters in italics x_i. (The only exception is Net which is capitalized in order to distinguish it from the English word net. Moreover, though Net and out are symbols for single pieces of data, they are written with three letters so as to be more easily readable.)

-

Data types which consist of several related values (vectors or matrices) are symbolized by a capital letter in bold italics: X.

-

An input object that is described by several single data (e.g., measured values from sensors) will thus be represented by X, whereas the individual values are given by x₁, x₂, ... x_m. A single input value from this series is specified with the index i, thus x_i. A single neuron from one group (layer) of n neurons will be labeled with the index j; the whole of the output signals from these n neurons will be denoted Out (out₁, out₂, ... out_n): The output signal from any one individual thus has the value out_j.

-

In a layer of n neurons receiving m input data there are n x m weights that are organized into a matrix W(w₁₁, w₁₂ ...w_nm). A single weight from this matrix will be labeled w_ji.

-

If there are more than one input objects, they will be distinguished by the index s: thus X_s; the individual data will then be x_si.

-

In a multilayered network the various layers will be labeled with the superscript l, e.g., out^l_j.

-

Iterations in a neural network are characterized by the superscripts t, which are written in parentheses, e.g., W^(t).

2.2. Creating Networks of Neurons

The 100-step paradox teaches us that the advantage of the human brain stems from the parallel processing of information. The model of a neuron that we have just presented is very simple, but even much more complicated models do not provide any great degree of increased performance. The essential abilities and the flexibility of neural networks are brought about only by the interconnection of these individual arithmetic units, the artificial neurons, to form networks.
Many kinds of networking strategies have been investigated; we shall present various network models and architectures in the following sections. Since the most commonly applied is a layered model, this network architecture will be used to explain the function of a neural net.
In a layered model the neurons are divided into groups or layers. The neurons of the same layer are not interconnected, but are only linked to the neurons in the layers above and below. In a single-layered network all the neurons belong to one layer (Fig. 7). Each neuron j has access to all input data X (x₁, x₂, ... x_i, ... x_m) and generates from these an output value which is specific to this neuron, out_j.
In Figure 7 the input units are shown at the top. They do not count as a layer of neurons because they do not carry out any of the arithmetic operations typical of a neuron, namely the generation of a net signal Net, and its transformation by a transfer function into an output signal out. In order to distinguish them from neurons, which are represented as circles in the following diagrams, input units will be shown as squares.

Fig. 7. Neural network with input units (squares) and one layer of active neurons (circles).

The main function of the input units is to distribute input values over all the neurons in the layer below. The values that arrive at the neurons are different, because each connection from an input unit i to a neuron j has a different weight w_ji representing a specific synaptic strength. The magnitudes of the weights have to be determined by a learning process, the topic of Section 4.
The output value out_j of a neuron is determined by Equations (d) and (e), which are generalizations of Equations (b) and (c). The index j covers all n neurons and the index i all m input values.

(d)

(e)

In a single-layered network the output signals out_j of the individual neurons are already the output values of the neural network.
Equations (d) and (e) suggest a more formal representation for the neuron and the neural network. The input values can be interpreted as a vector X (x₁, x₂, ... x_i, ... x_m) that is transformed by the matrix of weights W with elements w_ji and the transfer function into the vector of output values Out (out₁, out₂, ... out_j, ... out_n) (Fig. 8).

Fig. 8. Matrix representation of a one-layered network, which transforms the input data X into the output data Out by using the weights w_j_i.

Each neuron represents a column in the matrix in Figure 8. In this matrix representation it is emphasized that every input value is fed into every neuron. The implementation of the layered model as algorithms is also realized in the matrix representation.
A single layer of neurons is known as a perceptron model and offers only limited flexibility as yet for the transformation of input values into output values. These limitations can be overcome by using several single layers in succession.
In a multilayered model the architecture chosen usually connects all the neurons of one layer to all the neurons in the layer above and all the neurons in the layer below. Figure 9 shows a two-layered neural network. (As we mentioned previously, the input units do not count here, because they are not neurons but serve only to distribute the input values across the neuron layer below.) The network user cannot access the first layer of neurons, which is therefore known as a hidden layer; the neurons in it are called inner neurons.

Fig. 9. Neural network with input units and two layers of active neurons.

The output values Out¹ of the first layer of neurons are the input values X² of the second layer of neurons. Thus each neuron in the upper layer passes its output value on to every neuron in the layer below. Because of the different weights w_ji in the individual connections (synapses) the same output value Out¹ = X² has a different effect on each individual neuron [Eq. (d)]. The result of the neural network as a whole is only given by the last layer in the network (here Out²). Figure 10 shows a two-layer network in matrix notation.

Fig. 10. Matrix representation of a two-layered network.

Johann.Gasteiger@chemie.uni-erlangen.de


1.	The Net is a function of all those signals that arrive at the neuron within a certain time interval, and of all the synaptic strengths.
2.	The function is usually defined as the sum of the signals s_i, which in turn is given by the product of the input signals x_i (i=1, ... m) and the synaptic strengths w_i (i = 1, ... m), now referred to as weights [Eq. (b)]. Figure 4 shows the model of a neuron as developed up to this point.

-	Magnitudes that consist of a single value (scalar magnitudes) will be represented by lower-case letters in italics x_i. (The only exception is Net which is capitalized in order to distinguish it from the English word net. Moreover, though Net and out are symbols for single pieces of data, they are written with three letters so as to be more easily readable.)
-	Data types which consist of several related values (vectors or matrices) are symbolized by a capital letter in bold italics: X.
-	An input object that is described by several single data (e.g., measured values from sensors) will thus be represented by X, whereas the individual values are given by x₁, x₂, ... x_m. A single input value from this series is specified with the index i, thus x_i. A single neuron from one group (layer) of n neurons will be labeled with the index j; the whole of the output signals from these n neurons will be denoted Out (out₁, out₂, ... out_n): The output signal from any one individual thus has the value out_j.
-	In a layer of n neurons receiving m input data there are n x m weights that are organized into a matrix W(w₁₁, w₁₂ ...w_nm). A single weight from this matrix will be labeled w_ji.
-	If there are more than one input objects, they will be distinguished by the index s: thus X_s; the individual data will then be x_si.
-	In a multilayered network the various layers will be labeled with the superscript l, e.g., out^l_j.
-	Iterations in a neural network are characterized by the superscripts t, which are written in parentheses, e.g., W^(t).