Previous PageNext Page

4. One Molecule into One Network

4.1. Maps of molecular surfaces

4.1.1. Methods and results

A Kohonen network can be used to map a molecular surface into a two-dimensional plane. For this mapping of a molecular surface, points on this surface are chosen at random and their three Cartesian coordinates are taken as input into a KNN, with each neuron having three weights [26][27]. As the molecular surface is without beginning and without end, is was decided also to choose for projection a two-dimensional plane without beginning and without end, the surface of a torus. For visualization, the torus is cut along two perpendicular lines and the surface spread into a plane.
With a toroidal network, the maps can be shifted, mirrored and rotated against each other to achieve a similar position of their patterns. The surface of a molecule and the surface of a torus have a different topology and, therefore, this mapping process must lead to topological distortions that result in empty neurons. This feature of the mapping process in a Kohonen network has been analyzed and explained in detail [28].
Once the network has been trained, the entire dataset is sent again through the network and each neuron is colored with a property on the molecular surface that exists at that point(s) that is (are) mapped into the neuron considered [27]. In this coloring process, any molecular surface property can be chosen such as molecular electrostatic potential, hydrogen-bonding potential, or just a color identifying the surfaces of different types of atoms.
In order to give an idea of the correspondence between a 3D space and its 2D map, we show in Fig. 2 as an example the projected Kohonen maps of the van der Waals surface of corticosterone. The values of the electrostatic potential on the molecular surface (MEP) determine the colors of the map.
Corticosterone has two sites with a large negative value of the MEP, the carbonyl group at position 3 (4) and the side chain COCH2OH at position 17 (1) (Fig. 2). Consistent with this, the Kohonen map (Fig. 2) shows two spaces with a red-yellow color for these sites. The spatial distance of these groups is reflected by two different shapes of the projection of the MEP into the Kohonen network. The third site with a negative value of the MEP stems from the hydroxyl group at position 11 (3). A space with a yellow color is reserved in this map for this group. Furthermore, the large positive MEP area of corticosterone is below the D-ring and the side chain at position 17 (2). The projection of the MEP into the Kohonen map indicates the location of this space (violet color) close to the space of the negative MEP area of COCH2OH at position 17.

Fig. 2. Three-dimensional model of corticosterone, parallel projection of the electrostatic potential on the van der Waals surface of corticosterone and the corresponding Kohonen map indicating the MEP features corresponding to (1) the side chain at position 17 with a large negative value; (2) the area below the D-ring and the side chain at 17 with a large positive value; (3) the hydroxyl group at position 11ß; and (4) the C3-carbonyl group with a conjugated double bond at C4.

4.1.2. Visual comparison of Kohonen maps

The comparison of Kohonen maps of molecular surface properties offers a technique for the perception of similarities in ligands binding to the same receptor. Kohonen maps of the molecular electrostatic potential have been generated for four ligands that bind to the muscarinic receptors and four ligands that bind to the nicotinic receptors and are shown in Fig. 3.

Fig. 3. Kohonen maps of the molecular electrostatic potential of four muscarinic (top row: muscarine and the protonated forms of atropine, scopolamine, pilocarpine) and for nicotinic agonists (bottom row: protonated forms of nicotine, anatoxin-a, mecamylamine and pempidine).

Visual inspection of these maps clearly shows characteristics that are common to the four molecules binding to the muscarinic receptors and are not contained in the maps of the four ligands binding to the nicotinic receptors [20]. The nicotinic compounds, for their side, show common features different from those of the muscarinic ligands. Thus, inspection of these eight maps allowed a clear separation of molecules that either bind to the muscarinic or the nicotinic receptors.

4.1.3. Averaged maps

In the previous example, a visual comparison of the Kohonen maps of the molecular electrostatic potential was made, thus allowing one to differentiate ligands that bind to two different types of receptors. The question is now whether such a comparison can be put onto a more objective basis. This will be explored with 31 steroids for which the CBG affinities are known [21]-[23]. The distribution of the compounds in high-, intermediate- and low-affinity classes are defined in reference [24].
For each of the 31 steroids, a Kohonen network was trained, using the three Cartesian coordinates of points on the molecular surface as input to the network. The values of the MEP determine the colors of the map. For a more objective analysis, the averaged maps for the sets of high-, medium- and low-active compounds were generated (Fig. 4). For this purpose, each neuron n of the Kohonen maps of the single compounds was assigned a color index in a range of ten values representing the MEP of the potential the neuron obtained during the training process. Then the colors of the neurons in the averaged maps were obtained by averaging the color indices of the neurons in the single maps.

Fig. 4. The averaged maps of the electrostatic potential on the van der Waals surface of the sets of (a) high, (b) intermediate and (c) low active compounds of the CBG series.

The MEP pattern of the most polar area in the averaged map of the highly active compounds is the most pronounced one. In the three averaged maps, the pronunciation of the polar spaces decreases according to decreasing activity of the compounds. Therefore, a comparison of the maps of steroids with the averaged map allows one to establish whether a molecule belongs to the active or inactive CBG compounds. The averaged map of the highly active compounds can be used to build a pharmacophore model.

4.1.4. Maps as a two-dimensional representation of molecules

The investigation of the previous section can be taken one step further. If, indeed, the maps of the molecular electrostatic potential allow one to distinguish high-active compounds from low active compounds, why not take those maps as representations of molecules? In other words, we first train a, say, 20 x 20, Kohonen network with the three Cartesian coordinates of points of a molecular surface. Then, the entire dataset is sent again through the network and an extra layer of the network used for labeling the network is colored with the electrostatic potential of the points that are mapped into each neuron. (This is the procedure as outlined in section 4.1.1.) The values of the MEP (or the color) of the 20 rows of this label layer, each consisting of 20 values, are then concatenated to give a 400-dimensional vector. This vector is a two-dimensional representation of a molecule as it has been obtained by projecting a molecular surface into two dimensions and is used to train a second Kohonen network of size 5 x 5. The result is shown in Fig. 5.

Fig. 5. Mapping of a dataset of 31 steriods binding to the corticosteriod binding globulin (CBG) receptor into a toroidal 5 x 5 Kohonen network. Each steriod is labeled by its activity range: H = high, M = medium and L = low activity.

It can be seen that the steroids quite nicely separate into groups of compounds of high, medium and low activity. Only one compound of medium activity shows a collision with highly active compounds by being mapped into the same neuron.

4.1.5. Bioisosteric design

The bioisostere database by Istvan Ujvary (BIOISOSTER version 1.3), a database of analog design, including 1515 bioisosteric groups was analyzed. The question was if some coherency between the physico-chemical properties and the bioisosteric effect can be deduced by looking at the calculated Kohonen maps of these groups. Figure 6 shows an example of such a structure-pair in this database. The squares show those parts of the structures which are defined as bioisosteric groups.

Fig. 6. A structure-pair of the Bioisostere Database. The squares show the bioisosteric groups.

Several hundred pairs from this database were selected. The selection of these compounds was based on diverse structural fragment pairs as far as possible. From the selected database the bioisosteric groups are then cut out. The 3D structures of the selected fragments are calculated using the program CORINA [6]-[9]. The MEP were calculated on the van der Waals surface. Then, the Kohonen maps were calculated for each bioisosteric groups with a unique color plate. Figure 7 shows some examples of the calculated fragments. As shown here, the fragment pairs are structurally quite different, but their maps show a high similarity in the electrostatic potential patterns.

Fig. 7. Some examples of the bioisosteric groups and their calculated Kohonen maps.

This is an interesting result because it offers the possibility for selecting fragment pairs of this database, which can have a general validity, true bioisosteric groups. Therefore, we constructed a 3D database of several hundred fragments and functional groups including their corresponding Kohonen maps. The comparison of the electrostatic potential patterns of the maps of such a library can be used to cluster bioisosteric groups and, consequently, improve the efficiency of the 3D design of bioactive molecules.

4.2. Comparative maps

4.2.1. The template approach

A Kohonen network stores the information on an object that is used for training. This fact inspired the use of a Kohonen network trained with the molecular data of a given molecule as a reference molecule, a template, for the comparison with other molecules [30][31]. The general idea of such a comparative mapping is shown in Fig. 8.

Fig. 8. The idea of comparative Kohonen mapping of two molecules. A template butane molecule (a) trains (black arrow) a Kohonen network (c) allowing for 2D visualization of the surface of two methyl (CH3) and two methylene (CH2) groups (d). Different settings of the parameters (two examples) during training result in only slightly different template patterns (d). The same network (c), if used for processing the molecular data coming from propane (b), gives a comparative pattern of the propane molecule (e). Different settings of the training parameters allow for showing such aspects of similarity that comply with the simple analysis coming from a chemist (f) - i.e. propane lacks the entire terminal methyl group of butane that is taken in the square within upper formulas; if, however, the carbon atom of this methyl group is superposed on one of hydrogen atoms of the propane molecule, the difference will consist in three hydrogen atoms as indicated with squares in the bottom formulas.

The Cartesian coordinates of the points taken from the molecular surface of a butane molecule (a) are used to train a Kohonen network (c). In the example shown in Fig. 8, neurons of the map (d) are colored by giving the surface belonging to carbon atoms 1 to 4 of butane, and the hydrogen atoms bonded to these carbon atoms different shades of gray. The network (c) can now be used for the comparison of the surface of molecules other than butane. In our example, it was used for the simulation of a map of the propane molecule (b). Such a map (e) can be seen as a superposition of the compared molecule onto the template molecule, propane and butane in our case. A point from the compared molecule will find a neuron in the template network having weights quite similar to coordinates of the point from the surface of the compared molecule (cf. Eq. 1). However, neurons corresponding to those parts of the surface of the template molecule that have no counterpart on the surface of the compared molecule will not become exited and, thus, stay empty. In the comparative map that is obtained by filtering the compared molecule through the reference network of the template molecule, the empty neurons show up as white areas indicating where the surface of the reference molecule differs from the surface of the compared molecule.
Different settings for the parameters for training and testing of the Kohonen network program allow one to emphasize certain aspects of the molecular surface to a different extent. In particular, the value for the threshold that determines whether the input data of an object match with the weights of a neuron (cf. Eq. 1) can render the number of non-matching (empty) neurons and, thus, the white area in the comparative map larger or smaller. This is indicated in Fig. 8 where one setting (top map, Fig. 8e) somehow indicates the entire methyl group of butane to be lost in propane, whereas the second setting (bottom map, Fig. 8e) shows the major difference to reside in the three hydrogen atoms of the methyl group of butane being lost in propane.
The basis of the template approach is an analysis of the shapes of molecules and the quantification of a shape similarity or dissimilarity within a series of compounds using a reference molecule. This is of particular merit for the comparison of a series of biologically active compounds. The larger the difference in shape between the reference molecule and the compared molecule, the more empty neurons (blank areas) are obtained in the comparative map. A preceding superposition of the molecules is required for this approach.

4.2.2. Shape analysis of CBG ligands

Again, the dataset of 31 steroids binding to the CBG receptor [21] will be used to demonstrate the template approach for the analysis of molecular shape. The shape will be considered by using a reference molecule within a series of molecules to prepare a template network which then forms a basis for the comparison of the surface of the other molecules. A reference network was trained with the van der Waals surface coordinates of corticosterone, the compound having highest CBG activity. This compound supplies the reference Kohonen neural network (r-KNN), a template, while the analogs are filtered through this r-KNN to produce a series of individual comparative maps. The ring junction atoms 5, 8, 9, 10, 13 and 14 of the steroid system were used to produce the superposition of all molecules onto the template molecule.
The maps obtained in this procedure using the most active compound in this series, corticosterone, as the template molecule show that the compounds with low CBG activity have rather large white areas of empty neurons. In fact, it has been shown that the number of empty neurons can be taken as a quantitative measure of the similarity of the surface of two molecules [31]. Table 1 gives the number of empty neurons for the obtained maps; the molecules are arranged in order of decreasing activity.

Table 1 Number of empty neurons for the maps of CBG compounds (total number of neurons=2500)

Name of molecule

No. of
empty neurons

Name of molecule

No. of
empty neurons

Corticosterone

0

16a,17a-dihydroxyprogesterone

69

Cortisol

15

19-nortestosterone

357

11-deoxycortisol

52

Dihydrotestosterone

324

17a-hydroxyprogesterone

61

2a-methyl-9a-fluorocortisol

28

2a-methylcortisol

6

4-androstenedione

350

11-deoxycorticosterone

50

Androsterone

417

Cortisolacetat

13

Eticholanolone

749

Prednisolone

149

Pregnenolone

108

Progesterone

58

17a-hydroxypregnenolone

130

Epicorticosterone

46

Estriol

636

17a-methylprogesterone

113

Estrone

700

Cortisone

75

Estradiol

644

19-norprogesterone

152

Dehydroepiandrosterone

378

4-pregnene-3,11,20-trione

79

Androstanediol

344

Testosterone

296

5-andostenediol

332

Aldosterone

225

   

4.2.3. Backprojection of maps onto molecular shapes

The identification of the pharmacophore in an assembly of structurally diverse ligands is quite a difficult task, especially when the structure of the target molecule, the receptor, is not known. It will be demonstrated how Kohonen networks can be utilizied for this problem by elucidating the pharmacophore of allosteric modulators of muscarinic receptors, a group of drugs under development, which can enhance the activity of an antagonist in a very specific manner [32]. In a previous publication [33], it was shown how the pharmacophore can be mapped out with a small number of allosteric modulators. Here, we limit the discussion even further by only exploring alcuronium as the most potent compound, characterized by an almost rigid structure, and the flexible W84 as a representative of a wider range of hexamethonium and bispyridinium compounds [32]-[35].
Two different conformations of W84 were explored, the extended form 4e, obtained from CORINA, and a slightly distorted sandwich form, 4d, found by molecular dynamic calculations and subsequent optimization by the semiempirical AM1 method. For alignment, the rigid alcuronium was chosen as a template. Both conformations of W84 were superimposed onto alcuronium first using the positively charged nitrogens, because they were assumed to be the most important feature for the first step of ligand receptor recognition. In addition, both aromatic rings were matched onto each other. The superposition of the extended, linear conformation of W84 (4e) and alcuronium (1) with the fixpoints of two positively charges nitrogens resulted in protruding phthalimido rings at both ends of alcuronium (Fig. 9a). The alignment of alcuronium, 1, and the distorted sandwich conformation of W84 (4d), revealed a much better fit (Fig. 9b).

Fig. 9. Superposition of the skeleton of the extended form of W84 (4e) and the distorted sandwich form of W84 (4d) onto alcuronium (1).

In order to find out the similarities in the molecular surfaces and, thus, properties of these molecules, the surfaces were sent into Kohonen neural networks and colored by labeling the neurons according to the kind of atom the corresponding points belonged to (atomic surface assignment: ASA). The ASA maps showed that, firstly, the maps of alcuronium and that of the distorted sandwich conformation of W84 are similar, whereas the one with the extended form of W84 is quite different.
For a more quantitative comparison of the 3D shape of the molecules, a template approach was made sending both conformations of W84 through the Kohonen network of alcuronium as reference compound. The maps obtained for the surface of the molecules show a rather large number of empty neurons reflecting that the shape of the molecules is fairly different from the shape of the large reference molecule alcuronium.
There is an even more illustrative method for showing the correspondence of two molecular surfaces. The map of the second molecule obtained by sending it through the Kohonen network of the first, the reference molecule, can be projected back onto the three-dimensional surface of the reference structure alcuronium. Fig. 10 shows the 3D models of the surface of alcuronium 1 with a backprojection of the Kohonen map of the extended conformation and that of the distorted sandwich-like conformation of W84, 4e and 4d, respectively. Those places that have empty neurons in the template maps are indicated by a black open mesh on the surface of alcuronium.

Fig. 10. Backprojection of the Kohonen maps shown in Fig. 28a-c onto the molecular surface of alcuronium 1.

This way of representation impressively exhibits the similarities between alcuronium and the distorted sandwich conformation of W84 (4d). The extended conformation of W84 fills only the center of the alcuronium surface. In contrast, the distorted sandwich-like conformation of W84 covers much more area of the surface of alcuronium. Moreover, the essential features, both aromatic skeletons and both positive charges, color the surface exactly at the same places for this conformation and alcuronium.
Taken together, the following conclusions can be drawn [30]: firstly, the pharmacophore consists of two positively charged nitrogens in a distinct distance from each other and two heterocyclic, aromatic rings, both closely located in the hydrophobic central chain; secondly, the distorted sandwich geometry appears to be the conformation W84 takes up upon binding to the allosteric binding site; and thirdly, electrostatic interactions have been found to be primarily responsible for the molecular recognition between receptor and ligand.

4.2.4. Descriptors from comparative maps

The more the surfaces of a template (reference) and a compared molecule differ from each other, the higher is the number of empty neurons. Therefore, this value can be taken as a measure describing the difference in the geometry of the molecules input. This approach can be taken one step further also to describe differences in molecular surface properties. In principle, any molecular surface property can be taken but we limit the discussion to the molecular electrostatic potential (MEP). The entire spectrum of the electrostatic potential is divided into ranges (e.g. 10 ranges) each indicated by a specific color. Therefore, the map is coded by a matrix with elements that take discrete values from 1 to 10, while 0 codes empty neurons. Clearly, also the real values of the electrostatic potential could be used, but we wish to keep the discussion here simple.
Figure 11b shows an example of the comparative MEP maps of two similar molecules shown in Fig. 11a. Figure 11c compares the histograms of the occurrence of the neurons colored with the respective colors 0-10.

Fig. 11. Comparative patterns (b) of two molecules (a) and a histogram comparing the frequency of the occurrence of different colors within the maps (c).

We can define descriptors for the differences in color profiles of the maps. By comparing the occurrence of neurons having the same color (range of MEP):

(2)

with ki=1 for neurons colored with the respective color coded by i and ki=0 for all other colors, while the matrix coding a feature map is of size n x n.
The difference between the map of the template and the compared molecule is given by:

(3)

with the index, T, denoting the template, and M the molecule being compared. We can also include the range of the electrostatic potential coded by a certain color and obtain a modified Eq. 4:

(4)

where ci is a value (1-10) defining the range of the electrostatic potential coded by the respective color i and (ci)jk denotes the components of the matrix, of size n x n, coding this respective color within the feature map.
These descriptors can be calculated for a single color or for a group of different ranges of colors - e.g., EP1-3 will give a sum of the EP1+EP2+EP3.
A related global EP descriptor was used by Barlow [36]. In contrast, the EP parameter calculated for a narrow range of EP will bear only the information on the polar character neglecting the shape of the molecules.

4.3. Quantitative structure-activity studies

One of the basic approaches toward modelling SAR and QSAR relationships involves the comparison of a series of bioactive molecules. Table 2 gives an overview of the series of analogs analyzed in previous publications by quantitative structure-activity studies using descriptors developed above (section 4.2.4).
For more details, the reader is referred to the original publications. The major conclusion to be drawn is that the template approach provides a basis for the quantification of the shape and electrostatic effects of molecular surfaces, allowing the calculation of descriptors that are useful for the development of quantitative structure-activity relationships.

Table 2 The SAR and QSAR models obtained using the template Kohonen maps of the molecular electrostatic potential

Study

Compounds/activity

Model/statistical characterization

Descriptor

Reference

1.

Steroids/CBG

Qualitative

NE

[30]

2.

Histamine analogs/
H2 activities

Qualitative

EPalla

[36]

3.

Ryanodine derivatives/
binding to membrane proteins

Qualitative

NEb

[37]

4.

Nitroanilines/
sweetness activity

r = 0.99, n = 9

EP7-10

[31]

5.

Nitro- and cyanoanilines/
sweetness activity

r = 0.96, n = 18

EP1-10

[31]

6.

Ethylcarboxylates/
Taft’s ES constant

r = 0.94, n = 22

NE

[31]

7.

Steroids/CBG

r = 0.89, n = 31
r = 0.92, n = 31

-c

[38]

8.

Steroids/TBG

r = 0.92, n = 20d

-c

[39]

9.

Arylsulfonylalkanoic acids/
sweetness activity

r = 0.94, n = 9e

-c

[40]

a The comparison is performed by subtraction of the MEP matrices, a number of ranges of EP is not given.
b NE is the number of empty neurons.
c The comparison is performed by a single instar neuron.
d The induced fit between the TBG receptor and molecules stimulating are simulated using the hypermolecule technique.
e A method allows for the determination of active pharmacophoric conformation, the statistical characterization considers this conformation.

Previous PageNext Page


Johann.Gasteiger@chemie.uni-erlangen.de