Help File and Information
Dates and Features
- 2002-04-20: Choice of aromatic vs. Kekule representation of aromatic rings added
- 2002-03-21: Major upgrade - now allows additional input and output formats, both in single structure and multistructure.
- 2001-07-24: Add the SD file upload function for bigger datasets
- 1999-10-22: Launch of Online Service.
Input (SMILES strings and single- and multi-structure SD, PDB, MOL files etc.)
The upper left input field accepts SMILES strings as structure specifications. If
you are familiar with the syntax, you can type in simple queries manually.
However, most of the time you will want to use some graphical structure editor.
If your favorite desktop molecule editor supports Copy&Paste of
SMILES strings, you can simply
use this editor, copy the structure as a SMILES string to the clipboard, and
paste it into the entry field. Editors which support this operation include
ChemWindow and ChemDraw.
Or you can use the Java Molecular Editor provided by clicking on the
Start Structure Editor button.
For larger datasets, you should use the multistructure SD file upload function on
the upper right hand side. Files in PDB and MOL format (and, in fact, in any format
CACTVS recognizes) are also accepted.
The resulting SMILES strings will be returned as a text file or in the format specified.
The system will automatically add hydrogens to your input structure(s) according to
standard valences before generating SMILES string(s). This will prevent very strange looking,
and probably not intended, SMILES strings from being generated (such as [C][C][C][C]O[C]O[C][C][C][C]
vs. CCCCOCOCCCC). If you ever truly need SMILES strings for structures with
explicitly missing hydrogens, please contact us and we may add this as an option.
The service has several output options. You can choose between Unique SMILES
(USMILES) displayed on-screen or saved to a text file,
MDL SD and MOL file format and PDB file format. For the last three formats, it is possible
to select between 2D or 3D coordinates. The 3D coordinates will be computed
with the program CORINA of Prof. Gasteiger, Erlangen.
If the input file contains a single structure, the output
will also be single structure. Multiple structure input formats will
generate multiple structure output for those formats that support this.
Otherwise, only the first structure will be used. SD files will contain a
UNIQUE_SMILES field for unique SMILES and an USER_SUPPLIED_SMILES field for the user-supplied
SMILES (if avalaible)
Even within Unique SMILES, you have the choice between aromatic and Kekule representation
of aromatic rings, which produce non-identical USMILES strings.
Example: NSC# 5 is
Choose whichever suits you better. The
Enhanced NCI Database Browser
uses the Kekule representation for output SMILES format.
SMILES and Unique SMILES Definition
A (incomplete) SMILES string definition can be found
A SMILES Tutorial
can be found on Daylight's Web site. USMILES is briefly mentioned
The best reference is probably still the 1989 publication
Please note that the definition of USMILES has been changed by Daylight since 1989,
but has not been published. USMILES generated here will therefore be different from
Daylight-generated ones for some compounds (an informal test showed this to be the
case for approximately 30% of a typical organic compound set).
Should the current USMILES definition become available, we will have a look at
updating our algorithms.
Technology and Acknowledgments
The CGI script connected to the form runs on a non-GUI version of a class of
progammable (in Tcl/Tk and Extensions)
general-purpose chemical structure handling programs of the
Using the powerful scripting language interface of these programs,
it is possible to implement nearly every graphical or structure handling
application very rapidly. The CACTVS program was developed by
3D atomic coordinates are computed by the algorithm of the
CORINA program, which is used here as a dynamically loaded module.
We thank Peter Ertl
Novartis Crop Protection AG for kindly allowing us to use the
JME molecular editor.
and CC to Marc C. Nicklaus,
if you have any questions or comments.
Last Change: 2002-04-29,