1 The Philosophy of EROS

1.1 Introduction

1.1.1 A Knowledge-Based System

The EROS system (Elaboration of Reactions for Organic Synthesis) simulates organic reactions. It can be used to explore the different reaction pathways that given starting materials can follow to provide products. EROS generates one reaction step after another, thus producing sequences of parallel and consecutive reactions. A major advantage is that in this process EROS exhaustively explores all possible reaction pathways never becoming tired or stopping when it has found one feasible reaction pathway.
It has to be realized that EROS is a knowledge-based system, i.e., it can only generate those reaction types that it knows from. And it can only make decisions between various reaction pathways when rules for the evaluation of reactions have been given to EROS. The evaluation can range from simple rules on which atom and bond types may be involved in a certain reaction type all the way to mathematical functions allowing the calculation of absolute reaction rates.
Clearly, the development of a comprehensive scheme for the evaluation of the entire range of organic chemical reactions is quite an endeavor and will take some time to achieve. The set-up of EROS for the evaluation of organic reactions, however, is hierarchical in nature: starting from rather general rules, a more in-depth evaluation of a particular reaction type can be added later as more knowledge on a specific type of organic chemistry is developed.
The knowledge base defining the kind of chemistry, the kind of organic reactions, EROS can be applied to is kept separate from the program system proper, allowing a flexible development of the chemistry accessible to EROS and easy exchange between different knowledge bases.

1.1.2 History of Development

EROS can look back at a long history of development spanning now more than two decades. The first version was reported in 1978 after five years of development work.^[1] Already at that time, we relied on a formal treatment of chemical reactions as electron and bond shifting patterns. The first versions of EROS were applicable to two types of problem cases, to reaction simulation (forward search) and to synthesis design (backward search). The differences in the two types of search problems resided largely in the way the generated chemical reactions were evaluated. With continuing program development and increased sophistication of the system it became clear that the two types of application, reaction prediction and synthesis design, should no longer be handled by a single system.^[2] For synthesis design a new approach was taken leading to the development of the WODCA system (Workbench for the Organization of Data for Chemical Applications).^[3]
From then on, the further development of EROS concentrated on the simulation of the course and products of chemical reactions. A major step was made with EROS version 6 when the knowledge base of EROS was separated from the system proper.^[4]
An offspring of EROS was developed for the simulation of mass spectra, MASSIMO (MAss Spectra SIMulatOr).^[5] Work on this specific system eventually convinced us to redesign the EROS system from scratch building on a new representation of chemical structures. Furthermore, new concepts were developed and incorporated into the new EROS version 7 that allow it to handle different ways of running chemical reactions encompassing also the features previously contained in MASSIMO. The EROS 7 version has become operative in the second half of 1998 and has recently presented.^[6]
During this long period of development of EROS, the implementation was realized in different programming languages. The first versions had been coded in PL/1, then came FORTRAN 77, whereas version 7 is now coded in the object-oriented-language C++ with the knowledge base written in the scripting language Tcl.

1.2 Basic Concepts

1.2.1 The Way to Run a Reaction

Organic reactions can be carried out under a variety of conditions:
- laboratory synthesis
- technical processes in continuous or batch mode
- combinatorial chemistry
- degradation of chemicals in the environment
- metabolism of nutrients and drugs
- fragmentation and rearrangement of ions in the mass spectrometer

EROS can be applied to all those different ways of running a reaction. This was achieved by introducing new concepts such as reactors, phases, and modes.
These concepts will be explained later in chapter 1.3 in more detail, but some brief remarks seem to be appropriate here. A reactor can be a three necked flask or a mass spectrometer; it is defined as a place where reactions occur at the same time. EROS can handle situations consisting of sequences of reactors in a single program run. For example, one reactor can perform a synthesis, and the second reactor is used to model the mass spectra of the various products obtained in the first reactor.
A reactor can consist of several locations, several phases, such as an organic and an aqueous phase with transitions between the phases to be considered. The phases can as well be different compartments of the human body such as the intestinal tract, the blood, the tissue, and the kidney.
There are different ways of how the starting materials are combined in an EROS run which are largely dictated by their concentration. At high concentrations the dimerisation of a starting material might occur whereas at very low concentration the starting materials will react only with compounds that have a high concentration such as the solvent or air. These situations are handled by specifying the mode of a phase.
If enough knowledge on a reaction type is available the evaluation of a reaction can be driven all the way to the calculation of relative or absolute reaction rates. If this is possible, at the end of an EROS run an integration of the partial differential equations automatically derived from the reaction network and the reaction rates will be performed allowing one to make predictions on the development of the various products over the time.

1.2.2 Outline of the EROS System

Figure 1-1 shows the basic outline of the EROS system.

Figure 1-1. Basic outline of the EROS system.

The modules shown in Figure 1-1 are clearly separated from each other communicating through well-defined interfaces. Information between the various building blocks of the EROS system is passed in the form of ASCII files in a format (CTX = clear text) developed in our group quite some time ago.
Information on chemical structures in the CTX format can easily be interconverted with other standard structure exchange formats such as MDL SDfile, SMILES string, JCAMP-CS, SYBYL Molfile etc. Routines for this interconversion are available in the group.
Input to the EROS system can be made by any molecule editor that produces a standard structure exchange format such as the graphical editors ChemDraw, the CACTVS editor csed, etc. Usually, the EROS system is provided with the CACTVS editor csed (see Appendix 7.1).
The results of an EROS run are usually shown as structure diagrams or reaction equations by converting the structure information of a CTX file into a graphical form by programs such as the CACTVS browser csbr.
The relationship between individual reaction steps, i.e., which reactions run in parallel and which reaction follow each other can be visualized by the CACTVS tree cstr (see Appendix 7.2).
The knowledge base of the EROS system is twofold. One is procedural in nature consisting of a variety of empirical methods for the calculation of physicochemical effects such as heats of reaction, charge distribution, inductive, resonance or polarizability effect.
The other knowledge base gives information on the types of reactions EROS can be applied to. It consists of a header that contains information on the number of reactors, phases, and the modes. This is followed by rules on the various reaction types that will be used in the simulation of reactions by EROS. These rules specify the bond and electron shifting pattern of a reaction type, and the kind of atoms and bonds involved in such an electron reorganization. Furthermore, it can contain methods for the evaluation of such a reaction type: from no evaluation at all (useful for the generation of all possible reactions or isomers) through neural networks for deciding between reaction alternatives, all the way to mathematical functions for the calculation of absolute rate constants.

1.2.3 Reaction Generation

A major characteristic of EROS is that reactions are handled as formal bond and electron shifting patterns. A chemical reaction breaks bonds between atoms and makes new ones. The specific nature of the atoms and bonds involved in this reorganization of bonds make up the large variety of reaction types. This handling of chemical reactions is quite analogous to the way organic chemists specify reaction mechanisms by drawing curved arrows for the shifting of electrons. An example is the reaction scheme shown in Figure 1-2a

Figure 1-2. A general reaction scheme (a), and two instances: general hydrolysis (b), and amide hydrolysis (c).

In the course of a reaction as shown in Figure 1-2 two bonds are broken and two bonds are made. It is estimated that nearly 50% of all organic reactions follow this scheme: addition to double bonds, elimination reactions, nucleophilic aliphatic substitutions as well as electrophilic aromatic substitutions all break two bonds and make two new bonds. Observe, that nothing is said here about the timing of these events; it can be a concerted reaction or a stepwise process.
When no restrictions are imposed on this reaction scheme both conceivable alternatives for making two bonds will be generated (Figure 1-2a).
In order to define more specific reaction types, constraints on the types of atoms that can be involved in such a bond rearrangement scheme can be imposed. Thus, Figure 1-2b shows the case of a general hydrolysis with atoms K and L now being H and O, respectively, and requiring an additional hydrogen atom to be bonded to the oxygen atom.
Furthermore, rules can be imposed onto which atom will be bonded in the course of the reaction to the hydrogen atom and which to the oxygen atom. Such rules could, for example, be derived from simple electronegativity considerations,
Even more specific reaction types can be obtained when additional restrictions are imposed onto the atoms I and J. Thus, when I is required to be a nitrogen atom and J to be an sp² carbon atom having as an additional neighbor a doubly bonded oxygen atom, the case of hydrolysis of an amide is obtained (Figure 1-2c).
All these restrictions can be specified in the rules for a reaction contained in the external reaction rule file (see Section 1.3).

1.2.4 Evaluation of Reactions

The restrictions on the types of atoms and bonds at the reaction center basically are yes/no decisions on whether a reaction can occur or not. More sophisticated evaluation procedures are based on physicochemical properties of the atoms and bonds of the reaction scheme, calculated by rapid empirical procedures.
These methods have been developed over the last 15 years, are described in the literature, and are contained in the program package PETRA (Parameter Estimation for the Treatment of Reactivity Application). Specifically, they involve the calculation of charge distribution,^[7][8] estimations of the magnitude of the inductive effect ^[9], as well as of the resonance stabilization of charges produced on heterolysis,^[10] and of the influence of polarizability on charge stabilization. ^[11] Extensive correlations with physical and chemical data have shown the significance of these calculated values for the physicochemical effects.^{[12,13,14,15,16,17,18]}
These numerical values calculated for the physicochemical effects exerted onto the atoms and bonds involved in the bond rearrangement scheme can then be used for the assignment of a reactivity index, a numerical value for the ease of a reaction to occur. The rule base can contain a mathematical function for the calculation of a reactivity value from the physicochemical descriptors for a certain reaction type. Such functions may have been derived by a statistical analysis of reactivity values for a set of reaction instances. Insertion of the physicochemical values of a given reaction instance into such a mathematical function leads to a specific reactivity value. Instead of explicit mathematical functions, neural networks can be appended to the rule base, both for classifying a reaction as reactive or nonreactive or for calculating a numerical reactivity value.

1.2.5 Further Development

Clearly, the major efforts in the further development of EROS have to go into the extension and refinement of the knowledge base on chemical reactions.
Furthermore, an editor is planned to facilitate the definition of reaction rules.

1.3 Reaction Rule File

The kind of chemistry EROS can handle is laid down in the file that describes the reaction setup and the reaction types that are incorporated into the EROS system. The specific implementations of EROS, and their further development, will largely concentrate on the extension and refinement of this reaction rule file. Therefore, a basic understanding of the concepts and the specific status of the reaction rule file in use is important. System managers and experienced users should have some knowledge on how to interpret the reaction rules in order to be able to develop reaction rules of their own.
In this chapter, we will outline the major features of this knowledge base. Further details are then given in the next two chapters. The reaction rule file is written in the scripting language Tcl.^[19] It consists of a rule header that contains information pertinent to the entire EROS run and specifying how reactions are performed, how many reactors or phases are used, etc. This is followed by the reaction rules specifying the various reaction types that EROS can work with (Figure 1-3). Each reaction type may contain restrictions to limit the scope of its application and procedures for the evaluation of the reactions.

Figure 1-3. Basic set-up of a reaction rule file.

1.3.1 How to Run Reactions: The Rule Header

It has already been said that EROS can be applied to a wide variety of ways for running a reaction, from laboratory synthesis, through combinatorial chemistry to mass spectra simulation.
In order to achieve this, specific concepts have been defined and incorporated into the implementation of the EROS system.

1.3.1.1 Reactors

Definition: A reactor is a place (vessel, etc.) where reactions occur at the same time.
Note, that a reactor is defined by time not just as a physical container. If the way a reaction is run changes, a new reactor has to be introduced. Thus, if a reaction is carried out by adding starting materials over a certain period of time and then the mixture is stirred for an additional period, the system has to be modeled by two reactors, one for the period of addition of the compounds, and a second reactor for the period of continuous stirring without further addition of compounds.
Two reactors are also needed when an organic reaction is followed by aqueous work-up (Figure 1-4).

Figure 1-4. The running of a reaction which is followed by an aqueous work-up has to be modeled by two reactors.

Another case for the simulation of reactions by two reactors is given when the products of a reaction are analyzed by a mass spectrometer: The first reactor is used for modeling the reaction, the second reactor for the simulation of mass spectra (Figure 1-5).

Figure 1-5. A reaction that is followed by GC-MS.

1.3.1.2 Phases

Definition: A phase is a place where a reaction is run that is clearly separated from another such place.
A phase is usually characterized by a homogenous concentration of starting materials. A reactor can consist of one or several phases. In the latter case, transitions between phases have to be considered.
The simulation of a reaction in a flask containing an organic and an aqueous phase has to be modeled by two phases (Figure 1-6). The transfer of each compound between the two phases has to be considered and is handled as a reaction with a rate corresponding to the rate of diffusion.

Figure 1-6. A stirred tank reactor consisting of two phases.

Another situation with a reactor consisting of two phases is given when the metabolism of a drug in the blood serum and the subsequent excretion of the drug and its metabolites is modeled (Figure 1-7).

Figure 1-7. The metabolism and excretion of a drug is modeled by a reactor consisting of two phases.

A more elaborate set-up has to be chosen when further details of the events occurring in the distribution and metabolism of a compound in the body should be considered (Figure 1-8). The various compartments of the body important for the pharmacokinetics are represented by phases.

Figure 1-8. The compartments for the pharmacokinetics of a drug as phases and the transitions between them.

A cascade of stirred tank reactors is modeled by one reactor consisting of several phases because the reactions in the various phases are occurring at the same time. Input and output to the phases have to be considered (Figure 1-9).

Figure 1-9. A sequence of three (physical) stirred tank reactors (STR) (a) modeled by three phases of a single reactor (b).

Phases play an important role in the modeling of combinatorial chemistry experiments. The various sets of starting materials are assigned to different phases that are specified as having the mode INERT (see section 1.3.1.3) as no reactions are allowed for the compounds assigned to these phases. Basically, these phases are taken as storage devices where single compounds from each set of compounds can be drawn to react with other compounds in subsequent phases. The number of phases in a combinatorial chemistry experiment is given by the number of different sets of starting materials plus the number of reaction steps that have to be performed.
Thus, the combinatorial synthesis of esters from a set of acid chlorides and of alcohols requires three phases (Figure 1-10). The set of acid chlorides is assigned to phase 1, and the alcohols are assigned to phase 2. Then, one after another, one acid chloride is taken from phase 1, an alcohol is taken from phase 2, and both compounds are allowed to react to an ester (and HCl) and are then stored in phase 3.

Figure 1-10. The combinatorial synthesis of esters from acid chlorides and alcohols.

The synthesis of tripeptides from activated amino acids (such as esters) and amino acids accordingly has to be handled by four phases (Figure 1-11). The first phase stores the activated amino acids, the second the amino acids. The third phase is used to take dipeptides, and phase 4 to store the tripeptides which result from the reaction of the dipeptides again with activated amino acids.

Figure 1-11. The combinatorial synthesis of tripeptides.

1.3.1.3 Modes

The starting materials of a reaction can be combined in a variety of ways that are strongly influenced by the concentration of the species involved. The concentrations govern the kinetic mode, whether monomolecular or bimolecular reactions can occur. Various settings for the mode of a phase can be specified to take care of this situation and combine the starting materials in the desired fashion. The same is true for all subsequent reaction steps, taking the products of the previous reaction steps as starting materials for the next one and combining them in the fashion specified by the selected mode.
Note, that the combinations of starting materials specified by the setting of the mode parameter will only be explored whether they react with each other. This is not to say that reactions between these combinations of starting materials will indeed be generated. For, the reaction types contained in the reaction rules decide, in the end, whether a reaction is generated. If a combination of starting materials does not contain any of the reaction centers required by the reaction rules, no reaction can be obtained.

Mode = MIX
In this mode, all combinations of starting materials are explored in the generation of reactions. If three starting materials, A, B, and C are given, the following reactions will be investigated:

Figure 1-12. Combination of starting materials in the mode MIX.

Clearly, this mode has to be chosen, when the starting materials are given at high concentrations. Note, that no combinations of three starting materials will be investigated as the simultaneous reaction of three molecules is rather unlikely.

Mode = MIX_NO_A_A
With this mode, reactions between molecules of the same sort will not be investigated. Thus, with the three starting materials, A, B, and C, the following combinations (Figure 1-13) will be explored.

Figure 1-13. Combination of starting materials in the mode MIX_NO_A_A.

This mode comes into play when the concentration of starting materials is at some intermediate value, making the reaction between molecules of the same sort somehow more unlikely.

Mode = MONOMOLEC
In this mode only monomolecular or pseudo-monomolecular reactions will be generated. In the case of three starting materials, A, B, and C, only the following three reactions (Figure 1-14) will be generated.

Figure 1-14. Reactions explored in MONOMOLEC mode.

This is the situation with highly diluted solutions. Furthermore, reactions with compounds that are specified as being in high excess such as the solvent, water, oxygen, etc. can be explored. As an example, this mode should be chosen when the degradation of compounds in the environment or the metabolism of a drug is explored. This mode leads to a remarkable speed-up of an EROS run as fewer combinations of compounds and fewer reaction centers have to be analyzed.
Each product of a reaction is individually processed to subsequent reaction steps; no reactions between products are investigated. Thus, a tree of reaction steps will be generated as shown in Figure 1-15.

Figure 1-15. Handling of consecutive reactions in the MONOMOLEC mode.

Such a handling of reaction steps is required in the simulation of mass spectra as the high vacuum prevents bimolecular reaction of the products of a fragmentation with each other.
In pseudo-monomolecular reactions, such as the degradation of chemicals in the environment, the reaction steps are handled in an analogous manner, as shown in the following Figure 1-16.

Figure 1-16. Handling of consecutive pseudo-monomolecular reactions.

The scheme shows the fate of a compound A, and its degradation products, P, Q, etc., under hydrolysis and reduction (e.g., reductive dealkylation).

Mode = TUBE
In a laminar tube reactor the products of a reaction are held together and can further react with each other. However, no reactions with the starting materials or the products of other reactions are allowed. The mode = TUBE achieves such a behavior and the following tree of reaction steps is generated (Figure 1-17).

Figure 1-17. Handling of consecutive reactions in the TUBE mode.

Note the difference to the mode MONOMOLEC: in the mode TUBE, the products Q + R may react with each other whereas in the mode MONOMOLEC this possibility is not explored. Another difference is the fact, that with the mode TUBE no kinetics are available.
In the case of a turbulent flow through the tube the reactions and kinetics are the same as for a stirred tank reactor. Then the time for the tank reactor represents the distance in the tube reactor with turbulent flow.

Mode = SURFACE
Reactions can occur at the interface of two phases, one molecule from one phase reacts with one molecule from another phase. This mode is also used for modeling combinatorial chemistry experiments: Two phases each contain a set of molecules; reactions are then generated by drawing one molecule after another from the first set and have it consecutively react with each one of the molecules of the second set in phase 2.

Figure 1-18. Reactions in the SURFACE mode: phase 1 has the mode SURFACE,
phase 2 and 3 the mode INERT.

In order to achieve this result, one phase has to be specified with the mode SURFACE, the other as INERT (see below). The results of these reactions are then stored in a third phase.

Mode = INERT
Phases can also be assigned as mode INERT. Then, no reactions are generated in this phase, but such a phase can be used for storing molecules. This feature can be used in combination with a phase having the mode SURFACE for modeling combinatorial chemistry experiments (see also above).
This is explained in Figure 1-19 with the combinatorial synthesis of esters from acid chlorides and alcohols, already mentioned in connection with Figure 1-10.

Figure 1-19. The assignment of modes to the phases of a combinatorial chemistry experiment: phase 1 has the mode INERT, phase 2 the mode SURFACE,
and phase 3 the mode INERT.

1.3.1.4 Kinetic Modeling

Most organic compounds have a variety of functional groups and, therefore, many reaction pathways are open to ensembles of starting materials. Among these different reactions that reaction will win, will be pursued, that is the fastest one. Therefore, a full modeling of a reacting system should account for the kinetics of the processes.
Clearly, a full kinetic modeling is in most cases beyond our present insight into chemical reactivity. However, by careful analyses of the experimental data and evidence, for quite a few reaction types estimates of relative, or sometimes even absolute ^[20], reaction rate constants can be achieved.
When evaluation mechanisms for estimating reaction rates are included in the rule files, equations for the rates of different reaction channels are obtained. These partial differential equation can then be integrated to monitor the development of the products over the time (see Figure 1-20).

Figure 1-20. The development of products in the degradation of atrazine in soil.

Four different methods are available in the EROS system for the overall evaluation of reaction sequences: the first three for the integration of differential equations, the last one for the evaluation of probabilities of reaction sequences
- the GEAR algorithm ^[21]
- the Runge-Kutta-method ^[22]
- the Runge-Kutta-Merson-method
- probability evaluation
The GEAR algorithm is slower than the other two methods but more robust. Usually, it will be the method of choice.
Quite often, the estimation of reaction rates is not possible. As an alternative, probabilities for the different reaction pathways to occur can be given. This is particularly true for the simulation of mass spectra where methods for the evaluation of the probabilities for the different fragmentations or rearrangements of cations and radical cations have been developed.
Based on these probabilities of individual steps, probabilities for entire sequences of steps can be calculated. In the simulation of mass spectra these probabilities are then used for the estimation of peak intensities. The probability kinetics can only be used for monomolecular reactions.

1.3.2 Which Reactions: The Reaction Rules

1.3.2.1 Structure Representation

It has become standard practice to represent chemical structures in the form of a connection table (CT), by lists of the atoms and bonds in a molecule. From the very beginning of the development of EROS we have augmented this information by a list of the free electrons. Thus, we accounted for all valence electrons in a molecule and could also model reactions that involve a shift from free to bonding electrons and vice versa.
Ubiquitous as a connection table is used for structure representation, it should not be overlooked that such a representation also has its limitations. For, in fact, a connection table is basically a valence bond (VB) structure and must fail where a species cannot reasonably well be represented by a single VB structure.
This is true for organometallic structures and for electron deficient molecules such as the boranes. Most organic structures can sufficiently well be represented by a single VB structure; in cases like benzene additional rules can be utilized to take care of cyclic conjugation. However, also in organic species there are situations where a CT fails: A connection table cannot distinguish between a singulett or triplet carbene, cannot handle ionization of a -bond, and can only insufficiently represent radical cations.
Just to give an example: The oxygen atom of an enol ether has two free electron pairs, one in conjugation with the double bond, the other orthogonal to it. A connection table cannot distinguish between these two lone pairs. However, it makes quite a difference whether an electron is taken out from the conjugated electron system or from the isolated lone pair.

Figure 1-21. A connection table representation of enol ether (VB) augmented with a specification of the number of free electrons cannot distinguish between
the two types of orbitals on the oxygen atom (MO).

We became painfully aware of the deficiencies of a connection table in handling radical cations in the course of the development of the MASSIMO system for the simulation of mass spectra.
We therefore developed a novel structure representation that overcomes the deficiencies of a connection table.^[23]
Molecules are handled as species consisting of atoms that are held together by electron systems containing a specified number of electrons distributed over a fixed number of atom centers. Various types of electron systems are handled:
-systems consisting of two atoms and containing two electrons (normal -bonds), -systems consisting of two atoms and containing one electron (ionized -bond), -systems consisting of three atoms and containing two electrons (electron deficient three-center bonds such as those in boranes).
-systems with one, two, three etc. atoms containing no, one, two, three, etc. electrons, (empty -orbitals, radicals, free electron pairs, -bonds, conjugated systems) and coordinative bonds.
Details on how this representation can be used for the coding of boranes, organometallic complexes, carbenes, radical cations, etc. can be found in ref.23 and in the Ph.D. thesis of Susanne Bauerschmidt on the internet (http://www2.chemie.uni-erlangen.de/services/dissonline/
data/dissertation/Susanne_Bauerschmidt/html).
Two examples, an enolether and furane, are given to explain the new representation form.

Figure 1-22. Connection table and MOSES representation of an enol ether and of furane.

With this novel representation the different nature of the two lone pairs on the oxygen atom is distinguished both in an enol ether and in furane. This allows one also to account for two types of ionization at the oxygen atom, from two types of electron systems.
It should be realized that this novel structure representation corresponds more to the description of a molecule by molecular orbitals, hence its name MOSES: Molecular Orbitals: Structures as Electron Systems.
The entire EROS system version 7 is founded on the novel MOSES representation which has been implemented in the object oriented language C++. However, interconversion routines have been incorporated into the EROS system to also access it by a traditional connection table. For, a VB structure is quite a reasonable representation for most organic structures.
Thus, although all internal structure manipulations, in particular the generation of reactions, are made on the MOSES data structure, reaction schemes can be specified as bond and free electron shifting patterns in a VB notation or as changes in electron systems in a MOSES notation.

1.3.2.2 Reaction Generation

As has been detailed in the previous section, reactions can be specified as patterns of bond and free electron shifting schemes changing the connection tables of the starting materials (Figure 1-23) or as changes in electron systems working on the MOSES representation as indicated in Figure 1-24.

Figure 1-23. Reactions described as changes in the connection tables of the ensemble of starting materials.

Figure 1-24. Reactions described as changes in the electron systems of the starting materials.

A major characteristic of the EROS system is that reactions are handled in a formal manner, as electron shifting patterns. Organic reactions cover a limited number of such shifting schemes; it is the different nature of the atoms and bonds involved in those reaction schemes that account for the large variety of reaction types.
The atoms and bonds involved in the electron rearrangement make up the reaction center, or, as it is called in the EROS system, the reaction substructure. The reaction rule has to specify the atoms and bonds (or electron systems) that are part of the reaction substructure. First, the number of atoms of the reaction substructure have to be given and how they are bonded to each other. Then, constraints on the nature of the atoms may be given, either as a list of specific atoms, such as O, N, S, Cl, Br, or I, or as individual atoms only, such as only C. Furthermore, an atom may also be restricted to be at a certain hybridization state, such as C sp². Constraints can also be given for the bonds. Thus, it may be required that a bond will be broken only if it is part of a multiple bond.
Constraints can be specified not only for the atoms and bonds of the reaction substructure (reaction center) but also for the atoms bonded to the reaction substructure and the bonds between those atoms. Figure 1-25 shows reaction types that have more and more restrictions and thus become more and more specific.

Figure 1-25. A very general reaction scheme (a) becomes more and more focused as more specific restrictions are imposed onto the reaction substructure and its neighborhood.

The reaction scheme of Figure 1-25a is very general breaking any combination of two bonds. Clearly such a reaction scheme should only be used in exceptional cases (such as the generation of isomers starting from a given molecule) as usually for too many reactions including many unreasonable ones will be generated.
The restrictions on the types of atoms shown in Figure 1-25b for the atoms I, J, K, and L of Figure 1-25a lead to a reaction type that, among others, covers all nucleophilic substitutions (aliphatic and aromatic). The reaction type of Figure 1-25c is even more specific until constraints onto atoms adjacent to the reaction substructure (Figure 1-25d) limit the reaction type to the hydrolysis of amides.

Prof. Dr. J. Gasteiger
Computer Chemie Centrum, Org. Chem., Uni. Erlangen
Nägelsbachstraße 25
D-91052 Erlangen

Gasteiger@CCC.Chemie.Uni-Erlangen.DE