NCI Screening Data 3D Miner

Subtopics:
-

NCI Database Browser Help File and Information
This is a summary description of the functions available in the NCI Screening Data 3D Miner of NCI Open Database structures at this Web site.
General Stuff

System Requirements

The new database interface is a rather complicated project, and will require you to use a reasonably recent browser (Netscape, Mozilla or IE, on any platform). A few advanced features will only work on Netscape. Both JavaScript and Java must be enabled on your browser. For the 3D applet you will need the JAVA plug-in and the JAVA3D extensions. It is recommended to use Java 1.3.1 plug-in versions or higher (It seems that version 1.4 has problems with Netscape). You can use all versions of JAVA3D. However, we recommend to use the OpenGL versions of JAVA3D.

Multiple Panes

This tool provides a completely revamped navigation interface which allows you to switch between different result windows ("panes") at will. Anytime you click on one of the buttons on the top navigation bar, the current content of that pane will display. However, some of those panes are not accessible at all times - for example, you will not be able to open the Hitlist and Detail display panes if you have not obtained any query results. Most of the panes will be opened in the same browser window to save the user from a proliferation of newly created windows (as could happen in the old service). The Help text pane (the file you're reading right now) is an exeption: it opens its own window so that you can read the help text while looking at the part of the service it describes.
The field below the navigation bar is a status area. After you have started an operation, information about the status of that operation will be displayed in that area. After a few seconds, the window will automatically switch back to display the global database status.
The actual content (i.e. input forms, query results, visualizations) will appear in the third and largest area in the lower part of the browser window.
Caution: Resizing the browser window may reset all entries and even lead to subsequent JavaScript errors. This does not apply to minimization of the window via the upper right hand button and subsequent restoring of the window to its previous size, but to any other resizing such as (accidentally) dragging the window border. We therefore recommend starting the session in a fully maximized window. If you encounter the JavaScript error, simply go back to the welcome page (http://cactus.nci.nih.gov/) and start a new session.
Also, do not use the Back and Forward buttons of your browser instead of the navigation bar buttons. This will work in many instances, but can have unexpected consequences, such as resetting the entire session (especially if you accidentally back or forward out of this site).

Links to the Enhanced NCI Database Browser

From the Data Display Settings page and by pressing the detail button within the applet, you can directly transfer your structure to the NCI Enhanced Database browser that will display all available detail information about the specific compound. Furthermore you will find a variety of other databases and computational services that can be accessed from the browser.

Query Form and Structure Editor

Basic Query Specification Procedure

A database query is built by selecting a query type from the popup menu to the left of the four rows below and specifying a parameter in the entry field to the right of the same line. Rows where no data is input in the entry field are completely ignored, regardless of the selected query type. Many of the principal query methods have additional parameters. The option menu below the data input field is automatically updated to reflect the available options whenever you change the principal query method. Some of the more advanced query methods will pop up separate input forms, which will write some gibberish in the data input field when they are closed. You are not supposed to edit such content.
After filling in all relevant form elements, press one of the buttons labeled Start Search. Depending on the selected output format and whether any records meeting your criteria were found, an answer page or a structure file is generated. If you selected the output format Simply Count Hits (Entire DB), you will receive the resulting count in the status window.

Negate

If this field is checked, the query result of the associated input field is inverted, i.e. only records which do not contain a specific substructure, or contain data of the selected type or range, are considered hits.

Boolean Operations

You can fill any subset of the four basic query specification data entry rows. On the Connect query fields by line, you can specify the boolean operator with which you want to connect the Query Type rows if you specified more than one. By default, this is the logical AND. You cannot select different operators to connect different subsets of rows; however, since you can specify lists of query values (and value ranges), and the implicit connection between list items is a logical OR, you can build complex queries quite easily. These mechanisms should provide for the vast majority of the searches most users will typically conduct. If you need to perform more complex searches, you can use the Hitlist Manager mechanism. The XOR mode lets records pass where an odd number of criteria are fulfilled. This is mostly useful in connection with Hitlist Management.
The order of the queries is not important. The database will optimize it and use, for example, fast queries for the presence of a data field to filter those records which are submitted to a more demanding substructure match procedure.

Limits to Hits and Execution Time

The maximum number of hits which can be retrieved or tabulated can be specified in the Max. Number of Hits field. It is recommended to use only a Max of 10-30, because the number of resulting datapoints may increase dramatically.

Query Types

NSC Number Searches

You can type in lists of individual numbers or open (e.g. '-100') or closed (e.g. '120-130') number ranges. If more than one number range is given, hits are produced from records which match any of the numbers or number ranges.
The currently highest NSC number is just above 700,000. If you don't find an entry for a given NSC number within this range, this can have two reasons: First, you may have hit on one of the non-open ("discreet") compounds in the NCI Database; secondly, large stretches of NSC numbers were set aside in the past but then never really used. Particularly the range 400000-600000 is sparsely populated.

CAS Number Searches

One or more CAS numbers, with or without hyphens, are accepted for this type of search. If more than one number is specified, hits are produced from records which match any of the specified CAS numbers.
Please be aware that only about half of the compounds in the database have a CAS number associated with them. This does not necessarily mean that they do not possess a CAS number; it just means that none was entered when the compound was originally keyed in. On the other hand, there definitely are compounds in the database that truly do not have any CAS number. Many of the samples that NCI received were, e.g., from ongoing research projects, and these compounds were not necessarily published or patended - so they may never have entered the Chemical Abstract Registry.

Molecular Formula Searches

The general formula range syntax is symbol?low count??-??high count?, repeated for every specified element and written in arbitrary order. So C7-8 is seven to eight carbons, C-7 is up to seven carbons, C7- is seven or more carbons, C or C1 are exactly one carbon, C- is any number of carbons, including none. There are two types of formula searches. If you allow other elements, any number of elements which were not mentioned in the query formula are allowed. The other type disallows any additional elements, so your formula must be fully specified, including hydrogen atoms. Two-letter elements must be written with the second letter in lowercase, otherwise Cu (copper) and CU (one carbon, one uranium) would not be distinguishable.
It is also possible to use sums and differences of elements. For example, the query C4(F+Cl+Br+I)2 will retrieve all C4-compounds with any combination of exactly two halogens.

Molecular Weight Searches

This type of query accepts one or more molecular weights or weight ranges in gr/mol. Ranges are processed with full precision, but single weights are compared with rounded weight numbers.

Atom and Ring Counts, Donor/Acceptor Counts, etc.

Once more, ranges or single numbers are permitted with these search options. The atom count is the total number of atoms, including hydrogen. The ring count is the number of ESSR rings in the structure. An ESSR ring is any ring which does not share three consecutive atoms with any other ring in the structure. This filter is also applied to fused rings such as naphthalene - according to this convention, three rings (two phenyl fragments and the 10-membered envelope) result. Biphenyl will yield only a count of two rings.
The definitions of donor and acceptor atoms and rotatable bonds are somewhat flexible, but should match common practice. If in doubt, extend the range and see whether you get extra hits which are interesting. The rotatable bond count excludes all bonds where the rotation is possible, but does not have a major impact on the shape of the molecule. For example, all terminal or linear bonds are excluded.
Please be aware that the definition of flexibility that underlies the rotatable bonds count here, and the definition of flexibility that was used by the program Catalyst (MSI) in calculating the conformers whose number is reported in the Detail window (and which can be searched for with the 3D pharmacophore search) have nothing to do with each other. Issues like terminal groups, hydrogens, large ring flexibilities play a role here. You may therefore encounter cases where the number of rotatable bonds (CACTVS) vs. the number of conformers (Catalyst) seems non-intuitive if not inconsistent.

Complexity
The complexity rating of the compounds is a rough estimate of how complicated a structure is, seen from both the point of view of the elements contained and the displayed structural features including symmetry. The value is computed using the Bertz/Hendrickson/Ihlenfeldt formula (1). It is a floating point value, ranging from 0 (simple ions) to several thousand (complex natural products). The most complex compound in this database is NSC 277816 (C₁₂₈H₁₆₄BrN₂₅O₈₆P₁₂) with a complexity rating of about 10,515. The average complexity of the structures in this database is about 402.

Name Fragment Searches

About 45,000 compound names are associated with the structures from the original NCI database, and for most other compounds, an IUPAC name was computed by the ACD/Name (V4.0) program from ACD/Labs. Generally, because of the usual problems with structure naming conventions, name search is of somewhat limited value. Only the original NCI name set contained common names and sometimes trade names. If there happens to be an NCI name for a structure, often it has not one but multiple names. Name searches are automatically performed on the full name set, and for a hit it is sufficient for any single name to yield a positive result. Search is always case-insensitive and ignores whitespace. We support six different kinds of searches. A full name search must match the name, either with or without numbers and punctuation. Simple substring search is as simple as it sounds. The default name substring search will ignore all punctuation and numbers in the name. The second variety of substring search will also ignore punctuation, but preserve and match digits in the compound names. Shell syntax works like the command line in a Unix Bourne shell. The special characters and character sequences '*' (zero or more arbitrary characters), '?' (single arbitrary character) and '[]' (character range) are recognized. Note that this search (in contrast to substring and regular expression search) is anchored, i.e. if your query value string does not start with '*', the first character of a structure name must match the first query character. Regular expression search is even more powerful, but also rather complicated. With this search, you can, for example, specify that there should be either 'fluoro' or 'chloro' to the right of another fragment. Refer to a Unix manpage (for example, for the commands sed or egrep) to get more information about this topic if you are not familiar with it.

PASS Searches -- Predictions of Biological Activities

You can search for specific ranges in predictions for a very large number of biological activities. The program PASS (Prediction of Activity Spectra for Substances) was used to calculate predictions for up to 565 different activities for nearly all the structures in the database. PASS calculates the probability for both activity and inactivity of the compound for a given mechanism. These comprise specific enzymatic inhibitory potencies, therapeutic uses for various diseases, toxicities, and others. Counting the activity and inactivity predictions separately (they can be searched for separately), a total of 64,188,212 predicted values are offered on this site. Because the training set that underlies PASS is large but still limited (on the order of 35,000 compounds), the program cannot reliably predict each activity for every compound in the database. Here is the list of the activities, together with the number of compounds for which each activity was predicted.
If you select the Query Type PASS Prediction Range..., a popup window will appear that will allow you to select the activity for which you want to search. There should be a scroll bar at the right side of the list. If your browser doesn't show it, enlarge the popup window manually. At the top of the window, you can select the Query Probability type to search for, Activity or Inactivity. You can only select one activity or inactivity at a time. If you want to conduct combined searches, such as "activity [probability] > 0.8" AND "inactivity [probability] < 0.2", you have to use separate query input lines in the Query Form. Since the predictions are calculated as probabilities, you have to use number ranges between 0.0 and 1.0.
We have observed the possibility, under certain circumstances, that the PASS selection popup window will not come up any more, even if you reset the Query From. If this happens to you, simply re-enter the server URL (e.g. http://cactus.nci.nih.gov for the U.S. mirror) in your web browser window, and start the search session from there anew.
It is obviously totally impossible for us to test even a small subset of these predictions for all the NCI compounds ourselves. If you use this feature of our service, we would therefore be interested in hearing about success (and also not-so-great-success) stories, which we would compile and post, e.g., on this server. This need not include disclosure of individual compounds tested, but merely of the success rate of the predictions for the activity analyzed. You can e-mail Marc Nicklaus and/or Prof. Vladimir Poroikov with results and questions.
In this vein, we want to emphasize that these values are predictions, to which all the usual caveats pertinent to QSAR-type calculations should be applied. The user should never make the mistake to assume that a specific prediction for a single compound means that this molecule has this activity. The PASS predictions can only be responsibly used in a statistical manner for sets of compounds, and should be treated as scientific "food for thought."

The effect value query

Beside the three first query fields there is a specific query field related to the screening values - growth inhibition, cytostatic effect and cytostatic effect. You can use this field to define ranges for all or any cell lines. This query will force a pre-search on the SQL database containing the screening results and therefore may take up to 30 seconds computing time.

Structure Input

There are several possibilities to input a structure for full-structure, substructure or similarity search. The upper three input fields accept SMILES strings as structure specifications. If you are familiar with the syntax, you can type in simple queries manually. However, most of the time you will want to use some graphical structure editor. If your favorite desktop molecule editor supports Copy&Paste of SMILES strings, you can simply use this editor, put the structure on the clipboard as a SMILES string and paste it into the entry field. Editors which support this operation include ChemWindow and ChemDraw.
As a third option, you can start a Java editor by clicking on the Start Editor button below any input field. You must use a WWW browser with Java support (Netscape, Internet Explorer) for this to work, and you must have Java enabled, which is an option in the browser configuration panel. The input frame will switch to the editor panel. Read the editor instructions to learn how to use the program. Structures are exported from the editor by clicking on the Transfer to Form button on the editor panel, or by using the navigation bar to switch back to the query input panel. The editor remains associated with the last input field where you pressed the Start Editor or Transfer to Form buttons. If your current search option is not structure-based, the query method will automatically change to substructure search upon structure import.
We thank Peter Ertl from Novartis Crop Protection AG for kindly allowing us to use this remarkable applet.

Java Editor Comments

We are now using the 2000/10 version of the JME Java editor. It has much enhanced capabilities - for example, now you can input disconnected structures (use the 'New' button), and you can number the atoms (use the '123' button). Numbering is very helpful for the input of 3D query constraints. Just draw your structure any way you like it, and then number the atoms which participate in 3D constraints. These can be specified on the input fields to the right. The 'Qry' button pops up a window which allows you to input many more atom and bond properties. Please read the editor documentation.

Supported SMILES features

All standard SMILES features, including stereochemistry and isotope labeling, are supported. However, since there are neither stereochemical descriptors nor isotope labeling in the database, these search features are disabled and stereo descriptors or isotope specifications will be ignored. Some basic SMARTS extensions are also recognized, most notably the R, a, A, X, V and H descriptors for bracketed atoms. All such decriptors are or'ed, boolean attribute link logic (indicated by the characters , ; &) is ignored. For example, to specify that a nitrogen atom should not be part of a ring, you could use a SMILES descriptor '[N;R0]'. The special bond symbol '~' forces the bond to match an aromatic bond. Otherwise, aromatic bonds without any additional search attributes will match single and double bonds from both the substructure and structure side. The exclamation mark '!' used as a bond symbol is a 'non-bond' which must not be present in the database structure for the substructure to match.

Structure Match Options

A set of check boxes is available on the editor panel to globally modify the structure search parameters.
First, you have a choice whether matched substructures should be highlighted in the displayed result structures or not. Highlighting applies both to 2D plots and 3D displays in Chime or as VRML file. Note that, if multiple substructures are combined by an OR statement, only the first successful substructure match is actually performed on that record and subsequently displayed, even if additional fragments would also match. Highlighting is activated by default.
If you allow multi-fragment overlap, substructures which consist of disconnected fragments may overlap when matching the target structures. By default they will not, so that if you specify two nitro groups as substructure, only compounds with two or more nitro groups are found. Note that this feature applies only to substructures which where entered in a single input field as an entity. If you specify two substructures on two different fields, their match relationship is not influenced by the setting of this switch.
The third option is whether to suppress the matching of aromatic bonds on plain single or double bonds with no auxiliary attributes. By default, aromatic bonds will match such bonds, provided that no other attributes (such as 'not in a ring') prevent the match. If you desire the behavior of NCI's older DIS system, which will match aromatic bonds in the database structures only on aromatic bonds in your query, you should activate this switch.
Finally, the option for the enforcement of ring embedding equality means that the ring count of the bonds of a substructure must match the ring count of the database structures. If this switch in on, a simple phenyl fragment will not match naphthalene (only benzene, or biphenyl). Also, it implies that all bonds in your substructure which are not in a closed ring can only match non-ring bonds in the database molecules. The same effect could be achieved by explicitly specifying for each bond that it must not be in a ring, but this global option is often more convenient.

Structure Search Types

The basic structure query types in this database are full-structure search, substructure search and similarity search. Full-structure search is fastest, substructure searches can take up to a few minutes depending on the character of your query structure. Hydrogens will be added automatically for all searches except substructure search, where you will have to specify them explicitly. You should know that adding explicit hydrogens to all sites where you do not want any substituents will both focus your search and speed it up. The similarity searches operate on the Tanimoto distance of the substructure filtering screen bitvectors. For full-structure search, you have the choice between looking for the complete structure (e.g. salts plus specific counterion) or any isolated molecule in the record. If the record contains only one molecule, which is true for the large majority of the database entries, these two search types deliver identical results.

Tautomer-Tolerant Searches

This database now also supports tautomer-tolerant queries for substructure and full-structure search. You can draw any tautomeric form of your query structure, and if the button is checked, the database will retrieve all compounds which are tautomers of your input form, regardless of internal coding. Note that in the case of substructures, you have to draw tautomeric hydrogen atoms explicitly. Inputting, for example, the enol of acetone without a hydrogen at the oxygen, or the keto form without any hydrogen at one of the carbons will not yield the expected results, since the open valences could be occupied by ligands which lock the form (say, some silicone group). If there are no potential tautomeric atoms, the search will proceed as if the box had not been checked. If there are such systems, screening will be performed less aggressively, and the match procedure adapted to allow positional variations of the hydrogens in the tauto systems. This will cost some 30% of extra computer time.

3D Pharmacophore Searches

You can conduct a 3D pharmacophore search in this database. Using the program Catalyst by MSI, up to 25 conformations were calculated for those compounds in the open NCI database that Catalyst could handle. Catalyst conformers have been included for 211,857 compounds.
To prepare a query for a 3D pharmacophore search, you can either create a query file externally and submit it to this service, or you can use the Local Query Parameters area of the Editor pane. The first possibility is probably the somewhat easier way at this time to enter more complex queries.
To create a query file, you can use programs such as Catalyst or ISIS/Draw etc. and generate a file in .mol format. Most of the additional features in query files are supported, such as exclusion spheres, centroids, points on lines, angles, planes... Once you have this file available on the machine from which you started the Browser, go to the bottommost query line, select the option Substructure and/or 3D Search..., click on the Browse button to the right of it, and select the query file on your machine. Then start the search.
To generate a query, proceed along the lines of the following examples. From any of the query input lines, call up the Editor pane. To generate a query that consists of a triangle of oxygen atoms,
1. select O from the list of elements, place it on the drawing area;
2. click on the NEW button at the top of the JME Structure Editor;
3. place another O atom;
4. repeat steps 2 and 3;
5. click on the 123 button;
6. click on the three placed O atoms: this will generate atom numbers;
7. in the Local Query Parameters area, enter "1 2" in the topmost Atoms field;
8. in the Value Range field below it, enter (e.g.) "2.5-3.5";
9. repeat steps 7 and 8 with the values (e.g.) "2 3", "3.5-4.5" and "1 3", "4.5-5.5";
10. click on the button (below the Editor area) Transfer to Query Form.
Now, in the query line you used, you should see, in the Query Data Value field, the entry "[OH2:1].[OH2:2].[OH2:3]". This would search for three water molecules -- which is probably not what you want. (The Editor automatically adds hydrogens to all unfilled valences.) Go into this field, and manually edit out the hydrogens, so that you have the string "[O:1].[O:2].[O:3]". Now start the search (after possibly adding other search criteria). The constraints you specified are transferred to the search engine behind the scenes.
You should make sure that the ensemble of constraint values you're entering amounts to a meaningful 3D arrangement of atoms. For example, the values used above are a triangle with side lengths of 3, 4, and 5 Angstroms, resp., with a 0.5 Angstrom tolerance for each side. Values of 3, 4, and 10 Angstrom, on the other hand, do not produce a valid triangle, and thus do not result in any hits.
Once you have obtained hits from your search, the best way to view the results is probably to choose, from the Detail pane, the Visualization option Chime Display/All Conformers. This will show you all conformations calculated by Catalyst, with the one that was found to match the 3D query highlighted by a light red background. (Once the search algorithm has found one match for a molecule, it will not look for additional conformers that could potentially also match the query.) Superimposition of the query onto the displayed conformers is planned for the future but not yet implemented.
This capability is not a replacement for full-fledged, dedicated 3D pharmacophore search programs. One of its main limitations is obviously that it doesn't allow one to conduct any conformational search on-the-fly -- there is only a fixed set of pre-calculated conformers available. On the other hand, this allows for a very rapid searching -- few of the more sophisticated programs will return a hit set from a 250,000-compound database within a few seconds.

Output Options

Plain HTML Table

The standard tabular output, displayed in the Data Display Settings pane, includes the NSC number, formula, CAS number, number of names available for the structure, and one sample name. Note that the NSC number is a live hyperlink, which will lead you to the detail display of the NCI Enhanced Database browser.

HTML Table with samples

Like Plain table but also images of some random structures.

HTML Table with images

Like Plain table. However, a image with the structure of the corresponding compound will be displayed additionaly.

Sorting the Structure Hit Lists
At the bottom of the Query Form, a menu lets you select the sorting order of hitlists. It is only used when more than one result record is produced. The default sort order are the NSC registry numbers in ascending order, but you can also select atom counts, structural complexity, molecular weight (all in ascending order), similarity the the query structure (in descending order) and the effect values like gi50, lc50 and tgi (averaged or maximun values). Note that similarity sorting can only be used in conjunction with a similarity query. Otherwise the default NSC ordering is used. On the Data Display Settings page the user can decide to use the sort order also in the applet.

Data Display Settings

The Data Display Settings Panel consists if three subpanels - the Display Settings panel, the Include Data Panel and the Structure Hitlist Panel. The Structure Hitlist panel is the result of the database query.

The Display Settings Pane

This pane allows some general option settings like the display style and the sort order of the compounds. Furthermore, it allows the selection of the JAVA plug-in version. However this option should only be used if Internet explorer can not identifiy the right plug-in version.

The Include Data Pane

This pane is the most important pane. It controls which datapoints will be displayed within the applet. The user may select the cell lines, the compounds from the structure hitlist pane, the available concentrations and also the effect values like gi50, lc50 or tgi that should be included in the dataset. The number of datapoints will be approximated. However, the real number of resulting datapoints may be much smaller. The user can also decide which additional molecular properties should be available for mining within the applet.

The Structure Hitlist Pane

This list contains the NSC number, CAS number (if available), molecular formula, number of names and one representative name, if any names are available. However it also can include other properties, if these properties have been choosen for sort order. Clicking on the NSC numbers will open the NCI Enhanced browser and provides detail information about the corresponding compound.
The leftmost column of the compound listing contains checkboxes. These checkboxes control if a specific compund should be included in the visualization datset. By clicking the checkboxes the approximated datapoint number will recalculated.

3D Visualization/Mining

The Applet

For a detailed explanation of applet functionalities see the applet help.

About the Database

Data Origin

The service is based on the a MySQL database that contains the May2002 release of the DTP screening data containing gi50, lc50 and tgi values of 41.000 structures and on the database of the Enhanced NCI Database Brwowser. All searches are done by combining both databases. The original structure data and screening results are all maintained by NCI's Developmental Therapeutics Program. Additional information and downloadable files (such as the Standard Agent Database and the Mechanism of Action Database) can be obtained from that site.

Database Size and Content

The Enhaced NCI Database Browser database contains 250,251 open records. The MySQL database contains screening results of 41.000 structures. Some of the 41.000 structures are not included in the release 2 of the enhaced database and will not be chown in search results. This will be changed after the release of the mark III of the enhanced NCI browser. Every record contains at least the NSC number and the chemical structure. Records without a chemical structure, which exist in the NCI DIS system, have not been included.
The database in its first release contained 216,089 names (of 45,229 compounds) coming from the original DTP tables, 44,804 AIDS antiviral screening results, 41.000 anti-tumor, and 122,631 CAS numbers from the original DTP sources.

About the Software and Hardware

Required Browser Software

In order to access this server, you must use a JavaScript-capable browser. If you want to use the Java structure editor, your browser must also support Java. Both Java and JavaScript may have to be enabled manually from the browser configuration panel. Besides that, any reasonably modern browser (Netscape, Mozilla and IE in version 4 or higher) should be able to use this service. The browser must have the JAVA plug-in and also the JAVA3D extensions. You should have at least 32 MB main memory to avoid the risk of crashing with this particular display style. The visualization applet takes advantage of 3D capabilities of your graphic hardware. If you have a 3D card like Geforce2 you can display a higher number of datapoints within the applet.

Server Software Environment

This database was implemented exclusively using software of the CACTVS chemical structure processing toolkit. Secondary, derived information (GIF images etc.) is dynamically computed when the query is run. The CACTVS toolkit has extensive scripting capabilities, employing TCL as language core with sophisticated chemical command enhancements. All response pages are generated by a single, compact CACTVS/TCL CGI script of about 3550 lines. If you are interested, have a look at the script source. The database is currently stored in the compact flat-file CACTVS/BASE streamable scan format. It is a single, easily transferrable 3 GB file. Direct scans for simple query data such as NSC numbers is less than a second for the 250,251 compounds in the open database (*). The database was generated from NCI DIS database MDL Molfile dumps and various text files containing auxiliary information such as names. The total conversion time for the full open database is about 16 hours on a 800 MHz Pentium III Linux system.

Server Hardware

Currently, the database is served from the US mirror by a dual-CPU 500 MHz Pentium III Linux SCSI system. The European mirror employs an Silicon Graphics Origin 200 R10000 180 MHz single-processor server.
(*) Note: The Linux system still has some performance problems because the 2.4 kernel (absolutely needed because the scan file is larger than 2 GB) displays severe performance problems in repositioning the file pointer on such big files. We are trying to work around this problem. This problem has been solved.

References
1. J.B. Hendrickson, P. Huang, A.G. Toczko, Molecular Complexity - A Simplified Formula Adapted to Individual Atoms. J. Chem. Inf. Comput. Sci. 27, 63-67 (1987); and
W.D. Ihlenfeldt, Computergestützte Syntheseplanung durch Erkennung synthetisch nutzbarer Möglichkeit von Molekülen. Dissertation, TU Munich 1991.

Contact
This service was implemented by Frank Oellien (Homepage) in the course of a continuing collaboration with the CADD Group of the Laboratory of Medicinal Chemistry, Division of Basic Sciences, NCI, NIH, Frederick, USA, headed by Marc C. Nicklaus. The support of many collaborators is kindly acknowledged.
You are welcome to mail me (Frank Oellien) and/or Marc Nicklaus for comments, questions, suggestions and bug reports.

Frank Oellien
last modified: 12.09.2004 11:17 PM