Analytica Chimica Acta 348 (1997) 409-418

General Type of a Uniform and Reversible Representation of Chemical Structures1

Jure Zupan*, Marjana Novic

National Institute of Chemistry, Ljubljana, Hajdrihova 19, SLO-1115 Ljubljana, Slovenia,

Received: 1 June 1996, received in the revised form 4. December 1996; accepted 31. December 1996


In any type of modelling (be classical or by artificial neural networks) involving chemical structures and their corresponding properties, the first problem encountered is the representation of chemical structures. A good structure representation should have different code for each 3-D structure (uniqueness), it should have the same number of variables for all structures, it should be reversible, and should be translation and rotational invariant. In the present contribution we are discussing a new method for representing chemical structures which, at least in principle and within limitations bound to the precision and resolution of the projection, fulfils all mentioned requirements with the exception (in some cases) of the rotational invariance. The discussed representation is based on the projections of atoms on the sphere with an arbitrary radius. The new structure representation of a molecule with N atoms is defined as n-dimensional vector S = (s1,s2,, with each component defined as a cumulative intensity si, at a given point i on the circle with and arbitrary radius. The cumulative intensity si (the i-th point on the circle at angle ji.) is a sum of N contributions I(i,rj,jj) from each atom j in the molecule.

with i=1...n

The intensity function I(i,rj,jj) can be any bell shaped function. In our case we have chosen the Lorentzian shape with maximum at the angle jj, maximal intensity proportional to the rj, and having the width, sj, related to the type of the atom. The new proposed "spectrum-like" representation is additive with the respect to the constituent atoms of a given structure and can be easily decoded. Because the representation is additive it allows to subtract the a part of the "spectrum-like" representation which belongs to the structurally identical skeletons of all molecules in the study.

Keywords: Chemical code; Structure representation; Projection; Uniformity; Reversibility, Kohonen neural network

