Utilities
Metaphorics Logo

Utility Programs


 
Introduction
Overview
Installation
Quickstart
Hints and tips
Utilities
PVM version

The utils subdirectory contains various useful utility programs for dealing with the DockIt output in CEX format. Included is the source code for these programs so that they can be modified. Most of the programs require the CEX distribution tree (included in the DockIt distribution as a tar.gz file). Before making the programs, use gunzip and tar to unpack the CEX distribution and then define the CX_ROOT environment variable to point to the root of the CEX tree. The utility programs also exist in the bin subdirectory as prebuilt executables.

Programs:

cex2tdt

Converts CEX files to Daylight tdt files for use in Daylight tools and database servers. Input is a CEX file on stdin and output is a tdt file on stdout. Multiple conformations in the CEX stream are output as $D3D records in the tdt.

cex2pdb and cex2pdb_split

cex2pdb is a CEX utility program (the source is included in the CEX tar.gz file). It converts between CEX and PDB format as a filter (so stdin is a CEX file and stdout is a PDB format file). Each ligand docked conformation on input is put out as a separate PDB molecule. It is possible to map CEX properties onto PDB fields using the -m flag. For example, the cex2pdb_split script maps the molecule name field (name) onto the PDB molecule name record and the dockscore into a PDB REMark record. Examine cex2pdb_split to see how it is done. One thing which might be confusing about CEX format is the way tag names are handled. Each CEX property is first defined in the CEX stream by a $D record. Each tag has a short internal name and a longer external name as well as some properties about what the type of the property is and a string description of the property. So, for example, the molecular name field is defined by the CEX record:

$D<NAM>
/_P<name>
/_V<Name>
/_S<1>
/_L<STRING>
/_X<Name as arbitrary ASCII text>
|

which defines the NAM tag which is also known by the external name "name". While CEX uses the NAM tag internally, programs use "name" for that property. So to map the CEX molecular name to the PDB name record would require the following arguments to cex2pdb:

cex2pdb -m molname name <input.cex >output.pdb

tdt2cexid [-n nametag]

Converts from a tdt file to a CEX file. Includes a $NAM property from the tdt file as a "name" tag in the CEX file. This tag is carried through all the DockIt programs so it can be used after docking to connect the docked ligands with their database identifiers. If you wish to use a different tag from the tdt file as the name property, use the -n flag to specify it. If stereochemistry is available in the tdt file (either in an isomeric smiles or from 3D coordinates) it is preserved in the CEX output file and in the resulting docked ligands. Input is a tdt file on stdin and output is a CEX file on stdout.

pdb2cex [options] [-i in.pdb [-o out.cex]]

Convert PDB format input to CEX format output.

Options:
-fd

write dump format (eot newlines)

-fl

write list format (eod and eot newlines)

-fr

write raw format (no newlines)

-i in.pdb

specify .pdb input file

-o out.cex

specify .cex output file

-t datatypes.cex

specify .cex datatypes file

Input and output default to stdin and stdout if not specified. PDB datatypes are generated if no datatypes file is specified.

cexsplit prefix [-f listfile] [-m]

One file is created for each molecule object in the input CEX file on stdin. The output files are named prefix0000.cex, prefix0001.cex, etc. A subset of the input CEX objects can be selected by using the -f filename option. filename contains a list (one per line) of sequence numbers of the CEX objects (molecules) which you wish to split out. So a file containing

5
10
11

could be used to select only those molecules from the input CEX stream, which would then be put out into files prefix0000.cex, prefix0005.cex, etc.

By default, each conformation in the CEX input stream is treated separately in splitting up the file. However, if the -m flag is given, the splitting is done on a molecule by molecule basis, so each output molecule might have multiple conformations.

cexsort scorename

Sorts the CEX stream on stdin according to the integer value of the property specified. For example:

cexsort dockscore <docked.cex >docked.sorted.

Input molecules for which the property is not defined are not output. Note that the entire input file is sorted in memory so if you wish to only pick a part of the file out you should use cextop or consensus which are much more efficient for taking a subset of a large file.

cextop # scorename [-a] [-u]

Picks the best (lowest scoring) # molecules from the CEX stream on stdin according to the integer value of the specified property and outputs them to stdout. For example:

cextop 100 dockscore <docked.cex >best.cex

picks the best 100 dockings according to the DockIt score. The default mode for cextop picks the best n scoring dockings regardless of whether or not they come from the same ligand molecule. If you want to select the n top scoring molecules, the -u flag will only take the top scoring docking from each molecule so that each docked molecule selected will come from a different ligand. The -a flag does what -u does but puts out all the conformations for each ligand selected.

consensus # [-c #] [-a] [-u]

Used as (e.g.):

consensus 10 -c 2 <in.cex >out.cex

puts out all the dockings which are in the best 10 according to any two out of the three scoring functions and puts them into file out.cex. -c takes an integer argument of 1, 2 or 3. The default behavior is to apply the consensus criteria to each ligand individually. Thus, the command above would examine each ligand in turn and output each conformation which was in the top 10 scores for at least two of the three scoring functions . This tends to filter out conformations which score anomalously high in one scoring function. By using the -a flag, the consensus criterion is applied to all ligands at the same time. Thus,

consensus 100 -a <in.cex >out.cex

would output all the ligand docked conformations which are in the top 100 scores (for at least 2 out of 3 of the scoring functions since -c 2 is the default) for all the ligands in the file in.cex taken together. This can be used to extract the best overall docked ligands from an input set. If you just want to get a list of which ligands score best, the -u flag will cause consensus to output only one arbitrary conformation for each ligand which meets the consensus criterion. Thus the -u flag is typically used if you want to extract, for example, the SMILES or compound identifiers from the output in order to make a list. consensus -u doesn't necessarily put out the "best" conformation for each ligand since, when using several scores, the concept of "best" is not well defined.

cex_split_scores # [-u] -f prefix

Example:

cex_split_scores 100 -f prefix <in.cex

puts out the best 100 dockings in in.cex according to DockIt score, plpscore and pmfscore and puts the resulting molecules into the files prefix.dock, prefix.plp and prefix.pmf. The -u flag causes only the best score (for each of the scoring functions) to be used for each ligand so only one copy of each ligand will be put out in each output file, even if there is more than one docked conformation for that ligand which falls in the top n.

cex_split_scores, cextop and consensus use a priority queue algorithm and so should be relatively efficient even for large input files. For picking subsets from a large input file, cextop or consensus will be much more efficient than sorting the whole input file.

summarize

Shell script which summarizes and sorts the output from docking.

babelcx

babelcx is a modified version of the babel program for converting molecular structure formats. The source is provided in babelcx.tar.gz. It has been modified to read and write CEX format. Use -icex or -ocex to read or write CEX files. This program might be of particular use in converting various connection table formats (like MDL Mol files or Tripos Sybyl Mol2 files) in CEX input files for ligands. You will need to then run the ligands through the atom type and charge assignment programs (see the setup script for an example of how to do this). Note that babelcx is of less use in processing the output from DockIt since it isn't set up to save the various CEX properties (like multiple docked conformations, scores and rms values) into other molecular structure formats. Of particular note, babelcx only outputs the first conformation found for each ligand in a CEX file. The post-processing programs cextop and consensus put out CEX files with only one conformation per ligand (even if this results in repetition of the ligand information) so they can be processed by babelcx. However, in general, it is better to use the cex2tdt program to convert the DockIt output if you want to preserve all the information (for example, to put the information into a database).

© Metaphorics, LLC
info@metaphorics.com