NCI Database Browser Help File and Information
This is a summary description of the functions available in the NCI Screening
Data 3D Miner of NCI Open Database structures at this Web site.
- System Requirements
- The new database interface is a rather complicated
project, and will require you to use a reasonably recent browser (Netscape,
Mozilla or IE, on any platform). A few advanced features will only work
For the 3D applet you will need the JAVA plug-in and the JAVA3D extensions.
It is recommended to use Java 1.3.1 plug-in versions or higher (It seems
that version 1.4 has problems with Netscape). You can use all versions
of JAVA3D. However, we recommend to use the OpenGL versions of JAVA3D.
- Multiple Panes
- This tool provides a completely revamped navigation
interface which allows you to switch between different result windows
("panes") at will. Anytime you click on one of the buttons on the top
navigation bar, the current content of that pane will display. However,
some of those panes are not accessible at all times - for example, you
will not be able to open the Hitlist and Detail display panes if you
have not obtained any query results. Most of the panes will be opened
in the same browser window to save the user from a proliferation of
newly created windows (as could happen in the old service). The Help
text pane (the file you're reading right now) is an exeption: it opens
its own window so that you can read the help text while looking at the
part of the service it describes.
The field below the navigation bar is a status area.
After you have started an operation, information about the status
of that operation will be displayed in that area. After a few seconds,
the window will automatically switch back to display the global database
The actual content (i.e. input forms, query results,
visualizations) will appear in the third and largest area in the lower
part of the browser window.
Caution: Resizing the browser window may
does not apply to minimization of the window via the upper right hand
button and subsequent restoring of the window to its previous size,
but to any other resizing such as (accidentally) dragging the window
border. We therefore recommend starting the session in a fully maximized
welcome page (http://cactus.nci.nih.gov/) and start a new session.
Also, do not use the Back and Forward buttons of
your browser instead of the navigation bar buttons. This will work
in many instances, but can have unexpected consequences, such as resetting
the entire session (especially if you accidentally back or forward
out of this site).
- Links to the Enhanced NCI
- From the Data Display Settings page and by pressing
the detail button within the applet, you can directly transfer your
structure to the NCI Enhanced Database browser that will display all
available detail information about the specific compound. Furthermore
you will find a variety of other databases and computational services
that can be accessed from the browser.
Query Form and Structure Editor
- Basic Query Specification
- A database query is built by selecting a query type
from the popup menu to the left of the four rows below and specifying
a parameter in the entry field to the right of the same line. Rows where
no data is input in the entry field are completely ignored, regardless
of the selected query type. Many of the principal query methods have
additional parameters. The option menu below the data input field is
automatically updated to reflect the available options whenever you
change the principal query method. Some of the more advanced query methods
will pop up separate input forms, which will write some gibberish in
the data input field when they are closed. You are not supposed to edit
After filling in all relevant form elements, press
one of the buttons labeled Start Search. Depending on the selected
output format and whether any records meeting your criteria were found,
an answer page or a structure file is generated. If you selected the
output format Simply Count Hits (Entire DB), you will receive
the resulting count in the status window.
- If this field is checked, the query result of the
associated input field is inverted, i.e. only records which do not
contain a specific substructure, or contain data of the selected type
or range, are considered hits.
- Boolean Operations
- You can fill any subset of the four basic query specification
data entry rows. On the Connect query fields by line, you can
specify the boolean operator with which you want to connect the Query
Type rows if you specified more than one. By default, this is the logical
AND. You cannot select different operators to connect different subsets
of rows; however, since you can specify lists of query values (and value
ranges), and the implicit connection between list items is a logical
OR, you can build complex queries quite easily. These mechanisms should
provide for the vast majority of the searches most users will typically
conduct. If you need to perform more complex searches, you can use the
Hitlist Manager mechanism. The XOR mode lets
records pass where an odd number of criteria are fulfilled. This
is mostly useful in connection with Hitlist Management.
The order of the queries is not important. The database
will optimize it and use, for example, fast queries for the presence
of a data field to filter those records which are submitted to a more
demanding substructure match procedure.
- Limits to Hits and
- The maximum number of hits which can be retrieved
or tabulated can be specified in the Max. Number of Hits field.
It is recommended to use only a Max of 10-30, because the number of
resulting datapoints may increase dramatically.
- NSC Number Searches
- You can type in lists of individual numbers or open
(e.g. '-100') or closed (e.g. '120-130') number ranges. If more than
one number range is given, hits are produced from records which match
any of the numbers or number ranges.
The currently highest NSC number is just above 700,000.
If you don't find an entry for a given NSC number within this range,
this can have two reasons: First, you may have hit on one of the non-open
("discreet") compounds in the NCI Database; secondly, large stretches
of NSC numbers were set aside in the past but then never really used.
Particularly the range 400000-600000 is sparsely populated.
- CAS Number Searches
- One or more CAS numbers, with or without hyphens,
are accepted for this type of search. If more than one number is specified,
hits are produced from records which match any of the specified CAS
Please be aware that only about half of the compounds
in the database have a CAS number associated with them. This does
not necessarily mean that they do not possess a CAS number;
it just means that none was entered when the compound was originally
keyed in. On the other hand, there definitely are compounds
in the database that truly do not have any CAS number. Many of the
samples that NCI received were, e.g., from ongoing research projects,
and these compounds were not necessarily published or patended - so
they may never have entered the Chemical Abstract Registry.
- Molecular Formula Searches
- The general formula range syntax is symbol?low
count??-??high count?, repeated for every specified element and
written in arbitrary order. So C7-8 is seven to eight carbons,
C-7 is up to seven carbons, C7- is seven or more carbons,
C or C1 are exactly one carbon, C- is any number
of carbons, including none. There are two types of formula searches.
If you allow other elements, any number of elements which were
not mentioned in the query formula are allowed. The other type disallows
any additional elements, so your formula must be fully specified, including
hydrogen atoms. Two-letter elements must be written with the second
letter in lowercase, otherwise Cu (copper) and CU (one carbon, one uranium)
would not be distinguishable.
It is also possible to use sums and differences
of elements. For example, the query C4(F+Cl+Br+I)2 will retrieve
all C4-compounds with any combination of exactly two halogens.
- Molecular Weight Searches
- This type of query accepts one or more molecular
weights or weight ranges in gr/mol. Ranges are processed with full precision,
but single weights are compared with rounded weight numbers.
- Atom and Ring Counts, Donor/Acceptor
- Once more, ranges or single numbers are permitted
with these search options. The atom count is the total number of atoms,
including hydrogen. The ring count is the number of ESSR rings in the
structure. An ESSR ring is any ring which does not share three consecutive
atoms with any other ring in the structure. This filter is also applied
to fused rings such as naphthalene - according to this convention,
three rings (two phenyl fragments and the 10-membered envelope) result.
Biphenyl will yield only a count of two rings.
The definitions of donor and acceptor atoms and
rotatable bonds are somewhat flexible, but should match common practice.
If in doubt, extend the range and see whether you get extra hits which
are interesting. The rotatable bond count excludes all bonds where
the rotation is possible, but does not have a major impact on the
shape of the molecule. For example, all terminal or linear bonds are
Please be aware that the definition of flexibility
that underlies the rotatable bonds count here, and the definition
of flexibility that was used by the program Catalyst (MSI)
in calculating the conformers whose number is reported in the Detail
window (and which can be searched for with the 3D pharmacophore search)
have nothing to do with each other. Issues like terminal groups, hydrogens,
large ring flexibilities play a role here. You may therefore encounter
cases where the number of rotatable bonds (CACTVS) vs. the number
of conformers (Catalyst) seems non-intuitive if not inconsistent.
- The complexity rating of the compounds is a rough
estimate of how complicated a structure is, seen from both the point
of view of the elements contained and the displayed structural features
including symmetry. The value is computed using the Bertz/Hendrickson/Ihlenfeldt
formula (1). It is a floating point value, ranging
from 0 (simple ions) to several thousand (complex natural products).
The most complex compound in this database is NSC 277816 (C128H164BrN25O86P12)
with a complexity rating of about 10,515. The average complexity of
the structures in this database is about 402.
- Name Fragment Searches
- About 45,000 compound names are associated with the
structures from the original NCI database, and for most other compounds,
an IUPAC name was computed by the ACD/Name (V4.0) program from ACD/Labs.
Generally, because of the usual problems with structure naming conventions,
name search is of somewhat limited value. Only the original NCI name
set contained common names and sometimes trade names. If there happens
to be an NCI name for a structure, often it has not one but multiple
names. Name searches are automatically performed on the full name set,
and for a hit it is sufficient for any single name to yield a positive
result. Search is always case-insensitive and ignores whitespace. We
support six different kinds of searches. A full name search must match
the name, either with or without numbers and punctuation. Simple substring
search is as simple as it sounds. The default name substring search
will ignore all punctuation and numbers in the name. The second variety
of substring search will also ignore punctuation, but preserve and match
digits in the compound names. Shell syntax works like the command line
in a Unix Bourne shell. The special characters and character sequences
'*' (zero or more arbitrary characters), '?' (single arbitrary character)
and '' (character range) are recognized. Note that this search (in
contrast to substring and regular expression search) is anchored, i.e.
if your query value string does not start with '*', the first character
of a structure name must match the first query character. Regular expression
search is even more powerful, but also rather complicated. With this
search, you can, for example, specify that there should be either
'fluoro' or 'chloro' to the right of another fragment. Refer to a Unix
manpage (for example, for the commands sed or egrep) to
get more information about this topic if you are not familiar with it.
- PASS Searches -- Predictions
of Biological Activities
- You can search for specific ranges in predictions
for a very large number of biological activities. The program PASS
(Prediction of Activity Spectra for Substances) was used to calculate
predictions for up to 565 different activities for nearly all the structures
in the database. PASS calculates the probability for both activity
and inactivity of the compound for a given mechanism. These comprise
specific enzymatic inhibitory potencies, therapeutic uses for various
diseases, toxicities, and others. Counting the activity and inactivity
predictions separately (they can be searched for separately), a total
of 64,188,212 predicted values are offered on this site. Because the
training set that underlies PASS is large but still limited (on the
order of 35,000 compounds), the program cannot reliably predict each
activity for every compound in the database. Here is the list
of the activities, together with the number of compounds for which each
activity was predicted.
If you select the Query Type PASS Prediction
Range..., a popup window will appear that will allow you to select
the activity for which you want to search. There should be a scroll
bar at the right side of the list. If your browser doesn't show it,
enlarge the popup window manually. At the top of the window, you can
select the Query Probability type to search for, Activity
or Inactivity. You can only select one activity or inactivity
at a time. If you want to conduct combined searches, such as "activity
[probability] > 0.8" AND "inactivity [probability] < 0.2", you have
to use separate query input lines in the Query Form. Since the predictions
are calculated as probabilities, you have to use number ranges between
0.0 and 1.0.
We have observed the possibility, under certain
circumstances, that the PASS selection popup window will not come
up any more, even if you reset the Query From. If this happens to
you, simply re-enter the server URL (e.g. http://cactus.nci.nih.gov
for the U.S. mirror) in your web browser window, and start the search
session from there anew.
It is obviously totally impossible for us to test
even a small subset of these predictions for all the NCI compounds
ourselves. If you use this feature of our service, we would therefore
be interested in hearing about success (and also not-so-great-success)
stories, which we would compile and post, e.g., on this server. This
need not include disclosure of individual compounds tested, but merely
of the success rate of the predictions for the activity analyzed.
You can e-mail Marc Nicklaus
and/or Prof. Vladimir Poroikov
with results and questions.
In this vein, we want to emphasize that these values
are predictions, to which all the usual caveats pertinent to
QSAR-type calculations should be applied. The user should never make
the mistake to assume that a specific prediction for a single compound
means that this molecule has this activity. The PASS predictions
can only be responsibly used in a statistical manner for sets of compounds,
and should be treated as scientific "food for thought."
- The effect value query
- Beside the three first query fields there is a specific
query field related to the screening values - growth inhibition, cytostatic
effect and cytostatic effect. You can use this field to define ranges
for all or any cell lines. This query will force a pre-search on the
SQL database containing the screening results and therefore may take
up to 30 seconds computing time.
- Structure Input
- There are several possibilities to input a structure
for full-structure, substructure or similarity search. The upper three
input fields accept SMILES
strings as structure specifications. If you are familiar with the syntax,
you can type in simple queries manually. However, most of the time you
will want to use some graphical structure editor. If your favorite desktop
molecule editor supports Copy&Paste of SMILES strings, you can simply
use this editor, put the structure on the clipboard as a SMILES string
and paste it into the entry field. Editors which support this operation
include ChemWindow and ChemDraw.
As a third option, you can start a Java editor by
clicking on the Start Editor button below any input field.
You must use a WWW browser with Java support (Netscape, Internet Explorer)
for this to work, and you must have Java enabled, which is an option
in the browser configuration panel. The input frame will switch to
the editor panel. Read the editor
instructions to learn how to use the program. Structures are exported
from the editor by clicking on the Transfer to Form button
on the editor panel, or by using the navigation bar to switch back
to the query input panel. The editor remains associated with the last
input field where you pressed the Start Editor or Transfer
to Form buttons. If your current search option is not structure-based,
the query method will automatically change to substructure search
upon structure import.
We thank Peter
Ertl from Novartis Crop Protection AG for kindly allowing us to
use this remarkable applet.
- Java Editor Comments
- We are now using the 2000/10 version of the JME Java
editor. It has much enhanced capabilities - for example, now you can
input disconnected structures (use the 'New' button), and you can number
the atoms (use the '123' button). Numbering is very helpful for the
input of 3D query constraints. Just draw your structure any way you
like it, and then number the atoms which participate in 3D constraints.
These can be specified on the input fields to the right. The 'Qry' button
pops up a window which allows you to input many more atom and bond properties.
Please read the editor documentation.
- Supported SMILES features
- All standard SMILES features, including stereochemistry
and isotope labeling, are supported. However, since there are neither
stereochemical descriptors nor isotope labeling in the database, these
search features are disabled and stereo descriptors or isotope specifications
will be ignored. Some basic SMARTS extensions are also recognized, most
notably the R, a, A, X, V and H descriptors for bracketed atoms. All
such decriptors are or'ed, boolean attribute link logic (indicated by
the characters , ; &) is ignored. For example, to specify that a
nitrogen atom should not be part of a ring, you could use a SMILES descriptor
'[N;R0]'. The special bond symbol '~' forces the bond to match an aromatic
bond. Otherwise, aromatic bonds without any additional search attributes
will match single and double bonds from both the substructure and structure
side. The exclamation mark '!' used as a bond symbol is a 'non-bond'
which must not be present in the database structure for the substructure
- A set of check boxes is available on the editor panel
to globally modify the structure search parameters.
First, you have a choice whether matched substructures
should be highlighted in the displayed result structures or not. Highlighting
applies both to 2D plots and 3D displays in Chime or as VRML file.
Note that, if multiple substructures are combined by an OR statement,
only the first successful substructure match is actually performed
on that record and subsequently displayed, even if additional fragments
would also match. Highlighting is activated by default.
If you allow multi-fragment overlap, substructures
which consist of disconnected fragments may overlap when matching
the target structures. By default they will not, so that if you specify
two nitro groups as substructure, only compounds with two or more
nitro groups are found. Note that this feature applies only to substructures
which where entered in a single input field as an entity. If you specify
two substructures on two different fields, their match relationship
is not influenced by the setting of this switch.
The third option is whether to suppress the matching
of aromatic bonds on plain single or double bonds with no auxiliary
attributes. By default, aromatic bonds will match such bonds, provided
that no other attributes (such as 'not in a ring') prevent the match.
If you desire the behavior of NCI's older DIS system, which will match
aromatic bonds in the database structures only on aromatic bonds in
your query, you should activate this switch.
Finally, the option for the enforcement of ring
embedding equality means that the ring count of the bonds of a substructure
must match the ring count of the database structures. If this switch
in on, a simple phenyl fragment will not match naphthalene (only benzene,
or biphenyl). Also, it implies that all bonds in your substructure
which are not in a closed ring can only match non-ring bonds in the
database molecules. The same effect could be achieved by explicitly
specifying for each bond that it must not be in a ring, but this global
option is often more convenient.
- Structure Search
- The basic structure query types in this database
are full-structure search, substructure search and similarity
search. Full-structure search is fastest, substructure searches
can take up to a few minutes depending on the character of your query
structure. Hydrogens will be added automatically for all searches except
substructure search, where you will have to specify them explicitly.
You should know that adding explicit hydrogens to all sites where you
do not want any substituents will both focus your search and speed it
up. The similarity searches operate on the Tanimoto distance of the
substructure filtering screen bitvectors. For full-structure search,
you have the choice between looking for the complete structure (e.g.
salts plus specific counterion) or any isolated molecule in the record.
If the record contains only one molecule, which is true for the large
majority of the database entries, these two search types deliver identical
- Tautomer-Tolerant Searches
- This database now also supports tautomer-tolerant
queries for substructure and full-structure search. You can draw any
tautomeric form of your query structure, and if the button is checked,
the database will retrieve all compounds which are tautomers of your
input form, regardless of internal coding. Note that in the case of
substructures, you have to draw tautomeric hydrogen atoms explicitly.
Inputting, for example, the enol of acetone without a hydrogen at the
oxygen, or the keto form without any hydrogen at one of the carbons
will not yield the expected results, since the open valences could be
occupied by ligands which lock the form (say, some silicone group).
If there are no potential tautomeric atoms, the search will proceed
as if the box had not been checked. If there are such systems, screening
will be performed less aggressively, and the match procedure adapted
to allow positional variations of the hydrogens in the tauto systems.
This will cost some 30% of extra computer time.
- 3D Pharmacophore Searches
- You can conduct a 3D pharmacophore search in this
database. Using the program Catalyst
by MSI, up to 25 conformations were
calculated for those compounds in the open NCI database that Catalyst
could handle. Catalyst conformers have been included for 211,857 compounds.
To prepare a query for a 3D pharmacophore search,
you can either create a query file externally and submit it to this
service, or you can use the Local Query Parameters area of the Editor
pane. The first possibility is probably the somewhat easier way at
this time to enter more complex queries.
To create a query file, you can use programs such
as Catalyst or ISIS/Draw etc. and generate a file in .mol format.
Most of the additional features in query files are supported, such
as exclusion spheres, centroids, points on lines, angles, planes...
Once you have this file available on the machine from which you started
the Browser, go to the bottommost query line, select the option Substructure
and/or 3D Search..., click on the Browse button to the
right of it, and select the query file on your machine. Then start
To generate a query, proceed along the lines of
the following examples. From any of the query input lines, call up
the Editor pane. To generate a query that consists of a triangle of
1. select O from the list of elements, place it on the drawing area;
2. click on the NEW button at the top of the JME Structure
3. place another O atom;
4. repeat steps 2 and 3;
5. click on the 123 button;
6. click on the three placed O atoms: this will generate atom numbers;
7. in the Local Query Parameters area, enter "1 2" in the topmost
8. in the Value Range field below it, enter (e.g.) "2.5-3.5";
9. repeat steps 7 and 8 with the values (e.g.) "2 3", "3.5-4.5" and
"1 3", "4.5-5.5";
10. click on the button (below the Editor area) Transfer to Query
Now, in the query line you used, you should see, in the Query Data
Value field, the entry "[OH2:1].[OH2:2].[OH2:3]". This would search
for three water molecules -- which is probably not what you want.
(The Editor automatically adds hydrogens to all unfilled valences.)
Go into this field, and manually edit out the hydrogens, so that you
have the string "[O:1].[O:2].[O:3]". Now start the search (after possibly
adding other search criteria). The constraints you specified are transferred
to the search engine behind the scenes.
You should make sure that the ensemble of constraint
values you're entering amounts to a meaningful 3D arrangement of atoms.
For example, the values used above are a triangle with side lengths
of 3, 4, and 5 Angstroms, resp., with a 0.5 Angstrom tolerance for
each side. Values of 3, 4, and 10 Angstrom, on the other hand, do
not produce a valid triangle, and thus do not result in any hits.
Once you have obtained hits from your search, the
best way to view the results is probably to choose, from the Detail
pane, the Visualization option Chime Display/All Conformers.
This will show you all conformations calculated by Catalyst, with
the one that was found to match the 3D query highlighted by a light
red background. (Once the search algorithm has found one match for
a molecule, it will not look for additional conformers that could
potentially also match the query.) Superimposition of the query onto
the displayed conformers is planned for the future but not yet implemented.
This capability is not a replacement for full-fledged,
dedicated 3D pharmacophore search programs. One of its main limitations
is obviously that it doesn't allow one to conduct any conformational
search on-the-fly -- there is only a fixed set of pre-calculated conformers
available. On the other hand, this allows for a very rapid searching
-- few of the more sophisticated programs will return a hit set from
a 250,000-compound database within a few seconds.
- Plain HTML Table
- The standard tabular output, displayed in the Data
Display Settings pane, includes the NSC number, formula, CAS number,
number of names available for the structure, and one sample name. Note
that the NSC number is a live hyperlink, which will lead you to the
detail display of the NCI Enhanced Database browser.
- HTML Table with
- Like Plain table but also images of some random structures.
- HTML Table with
- Like Plain table. However, a image with the structure
of the corresponding compound will be displayed additionaly.
- Sorting the Structure Hit
- At the bottom of the Query Form, a menu lets you select
the sorting order of hitlists. It is only used when more than one result
record is produced. The default sort order are the NSC registry numbers
in ascending order, but you can also select atom counts, structural
complexity, molecular weight (all in ascending order), similarity the
the query structure (in descending order) and the effect values like
gi50, lc50 and tgi (averaged or maximun values). Note that similarity
sorting can only be used in conjunction with a similarity query. Otherwise
the default NSC ordering is used. On the Data Display Settings
page the user can decide to use the sort order also in the applet.
Data Display Settings
The Data Display Settings Panel consists if three subpanels
- the Display Settings panel, the Include Data Panel and the Structure Hitlist
Panel. The Structure Hitlist panel is the result of the database query.
- The Display Settings Pane
- This pane allows some general option settings like
the display style and the sort order of the compounds. Furthermore,
it allows the selection of the JAVA plug-in version. However this option
should only be used if Internet explorer can not identifiy the right
- The Include Data Pane
- This pane is the most important pane. It controls
which datapoints will be displayed within the applet. The user may select
the cell lines, the compounds from the structure hitlist pane, the available
concentrations and also the effect values like gi50, lc50 or tgi that
should be included in the dataset. The number of datapoints will be
approximated. However, the real number of resulting datapoints may be
much smaller. The user can also decide which additional molecular properties
should be available for mining within the applet.
- The Structure Hitlist Pane
- This list contains the NSC number, CAS number (if
available), molecular formula, number of names and one representative
name, if any names are available. However it also can include other
properties, if these properties have been choosen for sort order. Clicking
on the NSC numbers will open the NCI Enhanced browser and provides detail
information about the corresponding compound.
The leftmost column of the compound listing contains
checkboxes. These checkboxes control if a specific compund should
be included in the visualization datset. By clicking the checkboxes
the approximated datapoint number will recalculated.
- The Applet
- For a detailed explanation of applet functionalities
see the applet help.
About the Database
- Data Origin
- The service is based on the a MySQL database that
contains the May2002 release of the DTP screening data containing gi50,
lc50 and tgi values of 41.000 structures and on the database of the
Enhanced NCI Database Brwowser. All searches are done by combining both
databases. The original structure data and screening results are all
maintained by NCI's Developmental Therapeutics
Program. Additional information and downloadable files (such as
the Standard Agent Database and the Mechanism of Action Database) can
be obtained from that site.
- Database Size and Content
- The Enhaced NCI Database Browser database contains
250,251 open records. The MySQL database contains screening results
of 41.000 structures. Some of the 41.000 structures are not included
in the release 2 of the enhaced database and will not be chown in search
results. This will be changed after the release of the mark III of the
enhanced NCI browser. Every record contains at least the NSC number
and the chemical structure. Records without a chemical structure, which
exist in the NCI DIS system, have not been included.
The database in its first release contained 216,089
names (of 45,229 compounds) coming from the original DTP tables, 44,804
AIDS antiviral screening results, 41.000 anti-tumor, and 122,631 CAS
numbers from the original DTP sources.
About the Software and Hardware
- Required Browser Software
browser. If you want to use the Java structure editor, your browser
manually from the browser configuration panel. Besides that, any reasonably
modern browser (Netscape, Mozilla and IE in version 4 or higher) should
be able to use this service. The browser must have the JAVA plug-in
and also the JAVA3D extensions. You should have at least 32 MB main
memory to avoid the risk of crashing with this particular display style.
The visualization applet takes advantage of 3D capabilities of your
graphic hardware. If you have a 3D card like Geforce2 you can display
a higher number of datapoints within the applet.
- Server Software Environment
- This database was implemented exclusively using software
of the CACTVS
chemical structure processing toolkit. Secondary, derived information
(GIF images etc.) is dynamically computed when the query is run. The
CACTVS toolkit has extensive scripting capabilities, employing TCL
as language core with sophisticated chemical command enhancements. All
response pages are generated by a single, compact CACTVS/TCL CGI script
of about 3550 lines. If you are interested, have a look at the script
source. The database is currently stored in the compact flat-file
CACTVS/BASE streamable scan format. It is a single, easily transferrable
3 GB file. Direct scans for simple query data such as NSC numbers is
less than a second for the 250,251 compounds in the open database (*).
The database was generated from NCI DIS database MDL Molfile dumps and
various text files containing auxiliary information such as names. The
total conversion time for the full open database is about 16 hours on
a 800 MHz Pentium III Linux system.
- Server Hardware
- Currently, the database is served from the US mirror
by a dual-CPU 500 MHz Pentium III Linux SCSI system. The European mirror
employs an Silicon Graphics Origin 200 R10000 180 MHz single-processor
(*) Note: The Linux system still has some performance
problems because the 2.4 kernel (absolutely needed because the scan
file is larger than 2 GB) displays severe performance problems in
repositioning the file pointer on such big files. We are trying to
work around this problem. This problem has been solved.
1. J.B. Hendrickson,
P. Huang, A.G. Toczko, Molecular Complexity - A Simplified Formula Adapted
to Individual Atoms. J. Chem. Inf. Comput. Sci. 27, 63-67
W.D. Ihlenfeldt, Computergestützte Syntheseplanung durch Erkennung
synthetisch nutzbarer Möglichkeit von Molekülen. Dissertation,
TU Munich 1991.
This service was implemented by Frank Oellien (Homepage)
in the course of a continuing collaboration with the CADD
Group of the Laboratory
of Medicinal Chemistry, Division of Basic Sciences, NCI, NIH, Frederick,
USA, headed by Marc
C. Nicklaus. The support of many collaborators is kindly acknowledged.
You are welcome to mail
me (Frank Oellien) and/or Marc
Nicklaus for comments, questions, suggestions and bug reports.