- Substructure searches provide an additional method to search for
suitable starting materials in a catalog of chemicals. The focus of
a substructure search is the design of the synthesis of a
combinatorial library for a lead structure (see Tutorial
B), whereas similarity searches play an important role during
the design of a synthesis for a single target structure (see
Tutorial A). Thus, both methods
complement each other depending on the application of WODCA.
A substructure search in a catalog of chemicals is useful if someone
is interested in all available starting materials containing a
certain structure fragment (the substructure). In contrast to
similarity searches the user has to define this substructure by
himself/herself. The definition of a substructure query allows the
specification of open sites
or atom lists for a certain
position in a chemical structure. While a substructure search always
analyses the entire molecular structure of a compound from the
catalog of chemicals, a similarity search considers only the largest
fragment which results from the application of a similarity
criterion to the compound from the catalog of chemicals.
Substructure searches are implemented in the WODCA system as an
external tool (CACTVS Substructure Search).
- After a representative of a lead structure of a combinatorial
library has been disconnected to give a set of precursors during the
retrosynthetic analysis with WODCA, a substructure search can
provide the structural variation for each of these precursors. Thus,
substructure searches are useful to find a series of representatives
of certain classes of compounds in a catalog of available starting
materials that can act as precursors for the synthesis of an entire
library of compounds.
The following list describes some typical features of a substructure
search in general terms:
- In the following, the various features provided for the
definition of a substructure query are explained.
- An open site allows any type of atom to be attached to a
position of the query structure. For example, if a carbon atom
within a substructure query carries three substituents as well as
one open site, the forth substituent of the carbon atom can be any
element (inclusive hydrogen) attached to the carbon atom. Further
examples are given in Figure 1.
Figure 1: Definition of various substructure
queries with open sites. On the right-hand side some examples for
hits and non-hits are given
- It is possible to define a list of atom types instead of a
single atom for a certain position of the substructure query. Such
atom lists define a set of atoms which are allowed (positive atom
list) or which are forbidden (negative atom list) for a
certain position of the substructure during the substructure search.
Furthermore, it is also possible to allow any atom type at a certain
position of a substructure (Figure 2).
Figure 2: Definition of various substructure
queries with atom lists. On the right-hand side some examples for
hits and non-hits are given
- Different specifications can be defined for each atom of a
substructure query. If none of these specifications is set, the
default settings are used in which each specification is not
considered during the substructure search.The following list shows
some examples of settings for atom specifications:
- In order to specify bonds during the definition of a
substructure query the following settings are provided:
- If none of these specifications is set, the default settings are
used in which each specification is not considered during the
- In many cases, the definition of open sites on atoms linked by a
single bond influences the bond order of this bond as well, since
two open sites on adjacent atoms can be combined to a double bond.
The treatment of bond orders during a substructure search can be
controlled by the CACTVS Substructure Search tool. Depending
on the setting of the switch Bond Order in the main
window of the CACTVS Substructure Search tool the
following three cases have to be discussed. In case A, the
switch Bond Order is activated (default setting) which means
that bond orders in the substructure query are considered, in case B
and C the switch is disabled.
Figure 3: Influence of bond orders and open sites
on a substructure search. Bond orders are considered.
Case A: In Figure 3, the atoms
directly connected to the bond marked as bold are all defined. In
other words, there are no open sites at the atoms which form the
bond. In this case, the exact bond order is considered during the
superimposition of the substructure query and the compounds from the
catalog of starting materials. As we can see from the result of the
substructure search, the bond marked in bold has to be a single
bond. Thus, 1-methyl-cyclohexene (3) or
5-methyl-cyclohexadiene (4) are not considered as hits. Only
cyclohexane derivatives like 4-methyl-cyclohexanol (1) or
2-methyl-cyclohexanone (2) are found by the substructure
Figure 4: Influence of bond orders and open sites
on a substructure search. Bond orders are not considered.
Case B: Now, the switch for the consideration of bond orders
is disabled. Since two open sites at adjacent atoms linked by a
single bond represent both a saturated single bond and an
unsaturated double bond, 3-methyl-1-cyclohexene (1) is found
as well as 3,5-dimethyl-2-cyclohexen-1-one (2) by the
substructure search (Figure 4). Since the number of hydrogen atoms
on the bond marked in bold is defined, 1-methyl-cyclohexene (3)
and toluene (4) are excluded from the list of hits.
Figure 5: Influence of bond orders and open sites
on a substructure search. Bond orders are not considered.
Case C: The substructure query in Figure 5 contains open
sites on the atoms of the bond marked in bold. Thus, the result of
the substructure search is completely different since all possible
combinations of superimpositions of single bonds, double bonds, and
aromatic bonds are allowed.
- The previous discussion has shown how the global setting for the
switch Bond Order influences the results of a substructure
search. Other global settings for the treatment of tautomers and
chiral compounds are also provided and will be explained later (see
section Option Panel).
- A substructure search is initiated from the main
window of WODCA. By clicking with the left mouse button on the
command Substructure Search in the searches
menu the external CACTVS tool for substructure searches is
started (Figure 6). The current compound and the information which
catalog of chemicals is currently loaded are automatically
transferred to the tool for substructure searches.
Figure 6: CACTVS Substructure Search tool for the
application of substructure searches
- Figure 7 shows all important window elements of the graphical
user interface of the CACTVS Substructure Search tool for the
definition of a substructure query. Each window element is explained
in the following.
Figure 7: Window elements of the CACTVS
Substructure Search tool
- The molecule canvas (Figure 7, A) is the most important
window element of the CACTVS Substructure Search tool. It
allows the definition of a substructure query which is then
displayed in this window area. Initially, a molecular structure is
always displayed with all hydrogen atoms to indicate that there are
no open sites already defined. By changing this molecular structure
the substructure query can easily be defined.
Definition of open sites. The easiest
way to define an open sites on a certain atom of the query structure
is just to delete a hydrogen atom or another terminal atom which is
linked to this atom. Thus, each atom which is deleted from the query
structure represents an open site. Before an atom can be removed it
is necessary to switch the canvas
mode to eraser mode (Figure 7, B). In eraser mode, an
atom can be deleted just by a single mouse click with the left mouse
button on its element symbol. The atom is then removed and an open
site is defined on its corresponding position. For example, the
nitrogen atom and a bond of the ring system are marked by white
arrows (see Figure 7). After removing the hydrogen atoms from the
nitrogen atom and from the two carbon atoms of the indicated bond,
the nitrogen atom carries one open site and both carbon atoms of the
bond carry two open sites.
Not only hydrogen atoms can be removed by a single mouse click but
also any kind of atoms in the structure query, for example carbon
atoms (1st column, Figure 8). It is also possible to remove bonds
(2nd column, Figure 8) or entire atom groups (3rd column, Figure 8)
from the structure query. If a single mouse click on the center atom
of an atom group (e.g. carbon in the methylene group) is performed,
the center atom as well as all hydrogen atoms of this group will be
removed (3rd column, Figure 8). Each of these operations define
additional open sites in the direct neighborhood to the elements
deleted. The numbers in Figure 8 indicate the numbers of open sites
on each atom.
Figure 8: Definition of open sites by deleting
atoms, bonds, and groups from the molecular structure. The numbers
indicate the numbers of open sites on the corresponding atom.
Canvas mode. The canvas mode can be
controlled by two buttons in the upper right corner of the graphical
user interface of the CACTVS Substructure Search tool (see
Figure 7, B). The button marked with a pencil activates the
drawing mode. The button with the rubber icon is used to switch the
canvas to eraser mode. If the CACTVS Substructure Search tool
is started for the first time, it is automatically switched to
eraser mode (default setting).
- Eraser mode. In eraser mode it is possible to remove
atoms, bonds, or entire groups by a single mouse click from the
molecular structure displayed in the molecule
canvas. The eraser mode is useful for the modification of a
query structure, e. g. for the definition of open
Drawing mode. In drawing mode it is possible to draw
atoms, to link atoms, and to change the bond order of existing
bonds. Thus, the drawing mode can be used for the definition of
a new substructure query or for the modification of an existing
- Drawing of atoms and bonds. Before a new atom can be
created, an atom type has to be chosen from the element panel in the
main window of the CACTVS Substructure Search tool (Figure 7,
C). To draw a new atom which is linked to an existing one
just click with the left mouse button onto an atom and keep the
mouse button pressed. A grid around the atom is displayed which
shows positions of additional atom to be set (Figure 9: (1)).
Move the mouse pointer to one of the grid items and release the
mouse button. A new atom is created which is linked by a single bond
to the atom considered (Figure 9: (2)).
Figure 9: Drawing a new atom
Changing the bond order. If the drawing mode of the molecule
canvas is activated, the bond order can be increased by one by a
single click with the left mouse button onto the bond. A single bond
will be changed then to a double bond, a double bond will be
transformed to a triple bond, and a triple bond will be returned to
a single bond. All this will be done only if allowed on terms of
Definition of stereochemistry. A bond can be represented as a
solid line, as a broken wedge (in both directions), as a solid wedge
(in both directions), and as a broken line. These different
representations of a bond define the stereochemistry of the
corresponding atoms. The representation of a bond can be changed by
clicking with the right mouse button onto the bond. The
representation changes then from one mode to the next mode.
Changing an atom type. The element panel contains a list of
frequently used atom types which can be selected either for the
drawing of new atoms or for the modification of existing atoms.
Select an element symbol from the element panel and click on an atom
of the query structure to change the atom type to the selected one.
Other atom types at the element panel. If other atom types
are needed which are not contained in the element panel, click on
the button with the PSE icon (button at the bottom of the element
panel). The PSE panel is then displayed. Select a required atom type
and it will be become part of the element panel.
Changing the grid. The grid panel (Figure 7, D) is
used for switching the grid to another geometry. The default setting
for the grid is a hexagonal geometry. That means, if someone follows
the grid items suggested during the drawing of a molecular structure
a cyclohexane structure can be created (see Figure 10).
Figure 10: Drawing atoms and bonds within a
hexagonal grid and other grids provided
Alternative grid geometries are provided by the grid panel which is
shown in Figure 11. The grid geometry can be changed by a single
mouse click with the left mouse button.
Figure 11: The grid panel
- The SMILES panel of the CACTVS Substructure Search tool
(Figure 6 and Figure 7, E) represents the substructure query
from the molecule canvas in SMILES notation (further information
about the SMILES notation is available at:
If the substructure query is modified in the molecule canvas, the
SMILES panel will immediately be updated. Furthermore, it is
possible to define or modify the substructure query directly in the
SMILES panel. After clicking in the entry field of the SMILES panel
it is possible to add or delete characters of the SMILES string. The
input is terminated by pressing the Return-key on the
keyboard. The substructure query is then immediately updated in the
molecule canvas. Be aware of the following feature of the SMILES
panel: Each character of the SMILES string is interpreted in
parentheses, e.g. the SMILES string 'CCC' is interpreted as
'[C][C][C]' when the input is finished. That means that no hydrogen
atoms will be automatically added to any atom of the structure.
Thus, use the definition of hydrogen atoms in SMILES notation to add
hydrogen atoms on a certain position of the structure. Click on the
Add H button right to the entry field to add hydrogen to all
atoms with free sites. The Del H button is used to delete all
hydrogen atoms from the structure query.
- The option panel (Figure 7, F) provides some global
settings to control the result of a substructure search.
Tautomers. If this option is set, the CACTVS Substructure
Search tool tries to consider all tautomeric forms of a query
during the substructure search. Since it is quite difficult to
determine exactly the tautomeric form of a substructure query
containing open sites or atom lists, the result of such a search may
sometimes be surprising.
Stereochemistry. If the substructure query contains chiral
centers the Stereo option at the Option panel can be
used. If this option is set, all stereo centers and their
descriptors are determined and then considered in the substructure
search. Furthermore, the descriptors for each bond (E/Z) are derived
(if possible). During the search only structures with stereo
descriptors matching the query are found as hits.
Bond order. In the default settings of the CACTVS
Substructure Search tool, this option is already set. Thus,
during the substructure search the bond orders are considered (see
section Bond Order).
Overlap. It is possible to define more than one separate
substructure query in the molecule canvas. During a substructure
search only those compounds from the catalog of chemicals matching
each of these substructure fragments are considered as hits. If the
Overlap option is set, such multiple substructure queries are
allowed to overlap during the superimposition with the compounds
from the catalog of chemicals.
- Performing a substructure search. After a substructure
query is completely defined, the Search button in the command
panel can be pressed to start the substructure search (Figure 7, G).
The substructure search is then performed in the catalog of
chemicals which is currently loaded in WODCA. At the end of a search
the message at the Match List icon
is replaced by the number of hits found during the search.
Location of Search. It is possible to perform the
substructure search not only in a catalog of chemicals but also in a
One of the following items Search in Catalog (default
setting), Search in Last Match List, and Exclude from
Last Match List can be selected in the option menu from the
command panel .
The search options Search in last Match List and
Exclude from Last Match List are useful if the result of a
substructure shall be restricted by revising the substructure
query and repeating the substructure search only in a match list.
- The menu bar of the CACTVS Substructure Search tool
contains a File menu and an Edit menu. In the
following all functions of these menus are explained.
File Menu. The File menu contains read and write
commands for the substructure query specified by the user.
If the Read Query ... entry is selected, a dialog box
appears which contains a list of files and subdirectories in the
current directory. Select a file by a single click with the left
mouse button and press the Read button to load the
corresponding structure query into the CACTVS Substructure
Search tool. If a molecule file contains more than one
molecule entry it is possible to switch between each of these
entries by the cascaded Buffer menu (see below).
order to save the current substructure query use the Write
Query ... option. The substructure is then saved with
all its open sites, attributes, and specifications. Since a
special file format for substructures (*.cbin) is used, the file
cannot be read by WODCA.
If a molecule file with more than one structure is read into
the CACTVS Substructure Search tool (as described above),
the cascaded Buffer menu controls which structure is
loaded into the molecule
- Edit Menu. The Edit menu provides some global
operations for the substructrue query displayed in the molecule
- The Undo command neutralizes the last operation
performed in the molecule canvas (e.g. drawing or deleting of
atoms and bonds, setting of specifications for atoms and bonds).
The current substructure query can be removed by the Clear
Canvas / Delete Query command.
The Add All Hydrogen Atoms command adds hydrogen
atoms to each atom with open valences. Thus, after this
operation has been applied to the substructure query the
structure carries no open sites anymore. This function is
identical to that of the Add H button in the SMILES
The Delete All Hydrogen Atoms command removes each
hydrogen atom of the current structure query in the molecule
canvas. On positions where a hydrogen atom has been removed an
open site is created. This function is identical to that of the
Del H button in the SMILES panel.
The Beautify command recalculates the coordinates of the
current substructure query and replots the structure in the
- As described before, it is possible to provide further
specifications for atoms and bonds of the substructure query.
- If an atom of the substructure query in the molecule canvas is
selected by a double click with the left mouse button on its element
symbol, a window called Flags and Search Specs for Atom appears
that allows one to modify the atom selected. At this state of
development the design of the window is not quite finished. Thus,
some window elements and their functions are not usable in the
current version. In Figure 12 the most important window elements are
marked by a box. In the following, a short description for each of
these elements is given.
Figure 12: The pop-up window for the specification
of atom types. The numbers correspond to the explanation in the text
Atom type. If the atom type is set to
Element (1) the
atom type that has been originally drawn at this position is defined
by the query structure during the substructure search. Element is
the default setting for the atom type. If the atom type is switched
to List (2)
the user can define a list of elements instead
of a single atom type for a certain position of the substructure
query. The list of elements in the entry field
has to be input as element symbols separated
by an empty space. As described in chapter How
to Define a Substructure it is possible either to define a
positive atom list (definition of atoms at a certain position) or to
define a negative atom list (exclusion of atoms at
a certain position). For example, the atom list 'C N O' defines that
one of the elements carbon, nitrogen, and oxygen, has to be at a
certain position of the molecular structure to be a hit in a
substructure search. By the same token, if these atom types have to
be excluded from a certain position of the molecular structure, each
element of the list has to carry an exclamation mark as a prefix,
e.g. '!C !N !O'. Furthermore, some predefined atom lists are
provided which can be used as templates (3):
Any defines that there has to be an atom on a certain
position but it can be of any kind. Thus, the list contains the
entire periodic table. HDonor and HAcceptor define a
list of atoms which are expected to be either a donor of a hydrogen
atom or an acceptor of a hydrogen atom, respectively. The option
Insulator is quite similar to Any atom type but it
does not allow the atom to be part of a conjugated electron system.
A range of hydrogen atoms can be defined on a certain atom by the
entry field Hydrogen Count (4)
. For example, if an atom should carry one or two hydrogen atoms,
the numbers '1-2' have to be input in the entry field. In the same
manner, the valence range (5), the number of
ligands without hydrogen atoms (6)
or included hydrogen atoms can be defined. If an atom should be part
of a ring system, a range for the ring size can be specified (7).
If the atom should be part of a chain, the Chain button of
the button panel (8) has
to be pressed. If the Cyclic button is pressed, the ring size
is defined to be greater or equal to three. The default setting of
the button panel is Don't Care. Thus, an atom can be either
located in a ring system or in a chain The Ligand Fuzz entry
field has no function, yet.
Search flags. A number of search flags
for an atom can be specified (9). If the Not-to-match
flag is activated the atom considered is excluded from the
superimposition of the query structure with the compounds from the
catalog of chemicals during the substructure search. Additional
flags allow the specification that an atom is part of an aliphatic
structure fragment or of an aromatic system. The Match Stereo
flag enforces that the the stereochemistry at an atom is considered
in a substrucutre search. If the Match Charge flag is set,
the charge of an atom on a certain position has to be the same both
in the query structure and the compounds from the catalog of
chemicals. The Unsaturated flag is useful if an atom carries
open sites but should not match to saturated atoms. This flag is not
useful in cases where the atom is already saturated. The Must Map
flag has no function, yet.
The Set button (10)
is used to transmit all specifications that have been defined in the
window to the atom considered. After pressing the Set button
the window is closed. The Cancel button (11)
closes the window without any changes of the query structure.
- If a bond of the substructure query in the
molecule canvas is selected by a double click with the left mouse
button on a bond, a window called Flags and Search Specs of Bond
appears that allows one to modify the bond selected. At this
state of development the design of the window is not quite finished.
Thus, some window elements and their functions are not usable in the
current version. In Figure 13 the most important window elements are
marked by a box. In the following a short description for each of
these elements is given.
Figure 13: The pop-up window
for the specification of bond types. The numbers correspond to the
explanation in the text below
Bond order. A range for the bond order can be set in this
entry field, e.g. '1-2'. The default setting is 'as entered' (1)
which corresponds to the bond order which is defined in the molecule
canvas during the drawing of the substructure query.
Bond enviroment. If the bond should be located in a ring
system a range for the ring size can be defined in the Ring Sizes
entry field (2). If the bond
should be part of a chain the Chain button of the button
panel (3) has to be used. If the
Cyclic button is pressed, the ring size is defined to be
greater or equal to three. The default setting of the button panel
is Don't Care. Thus, a bond can be either part of a ring
system or a chain.
Search flags. A bond can be set to be aliphatic or aromatic
(4). If the Stereo flag
(5) is set the stereo descriptor
of this bond has to be identical to those in the hits found by the
substructure search. The Single/Aro option as well as the the
Double/Aro (6) option is
useful if the global setting Bond
Order in the option panel
is switched off. If the RingMatch flag (7) is activated both
the considered bond of the substructure query as well as the
corresponding bond in the compound from the catalog of chemicals
have to be part of the same numbers of rings. Thus, a bond in
cyclohexane cannot be superimposed onto the central bond of decaline
since the former bond belongs to a single ring system, the latter
one belongs to both ring systems of decaline.
The Set button (8) is
used to transmit all specifications that have been defined in the
window to the bond considered. After pressing the Set button
the window is closed. The Cancel button (9)
closes the window without any change of the query structure.
- Before a substructure search can be performed, a query compound
has to be defined and a catalog of chemicals has to be loaded into
WODCA. If a query compound is already saved in a CTX structure file,
it can be loaded into WODCA with the help of the file
menu. Otherwise, it can directly be exported from the CACTVS
Molecule Editor into WODCA. The file
menu is also useful to load a catalog of chemicals. Whether
a catalog of chemicals is already loaded into WODCA, or not, is
indicated by the Catalog
icon in WODCA's
The next step is to click with the left mouse
button on the command Substructure Search ... in the searches
menu. The window of the CACTVS Substructure Search
tool appears with the current compound displayed in the molecule
canvas. Before a substructure search can be started the
substructure query has to be defined (see also How
to Define a Substructure). After pressing the Search
button on the right-hand side at the bottom of the window, WODCA
searches in the catalog of chemicals for compounds that contain the
substructure of the query. If a substructure search was successful,
WODCA lists all the compounds found in the WODCA
console and the Match
List icon in WODCA's
Information Area indicates the number of hits. A single mouse
click on the Match
List icon opens the CACTVS Match List Browser to view the
molecular structures of the hits.
- Last change: 2000-06-27