- cex2tdt
Converts CEX files to Daylight tdt files for use in
Daylight tools and database servers. Input is a CEX file on stdin and
output is a tdt file on stdout. Multiple conformations in the CEX
stream are output as $D3D records in the
tdt.
- cex2pdb and cex2pdb_split
cex2pdb is a CEX utility program (the source is included
in the CEX tar.gz file). It converts between CEX and PDB format as a
filter (so stdin is a CEX file and stdout is a PDB format file). Each
ligand docked conformation on input is put out as a separate PDB
molecule. It is possible to map CEX properties onto PDB fields using
the -m flag. For example, the cex2pdb_split script maps the molecule
name field (name) onto the PDB molecule name record and the dockscore
into a PDB REMark record. Examine cex2pdb_split to see how it is
done. One thing which might be confusing about CEX format is the way
tag names are handled. Each CEX property is first defined in the CEX
stream by a $D record. Each tag has a short internal name and a
longer external name as well as some properties about what the type of
the property is and a string description of the property. So, for
example, the molecular name field is defined by the CEX record:
$D<NAM>
/_P<name>
/_V<Name>
/_S<1>
/_L<STRING>
/_X<Name as arbitrary ASCII text>
|
which defines the NAM tag which is also known by the
external name "name". While CEX uses the NAM tag
internally, programs use "name" for that property. So to map
the CEX molecular name to the PDB name record would require the
following arguments to cex2pdb:
cex2pdb -m molname name <input.cex
>output.pdb
- tdt2cexid [-n
nametag]
Converts from a tdt file to a CEX file. Includes a $NAM
property from the tdt file as a "name" tag in the CEX file.
This tag is carried through all the DockIt programs so it can be used
after docking to connect the docked ligands with their database
identifiers. If you wish to use a different tag from the tdt file as
the name property, use the -n flag to specify it. If stereochemistry
is available in the tdt file (either in an isomeric smiles or from 3D
coordinates) it is preserved in the CEX output file and in the
resulting docked ligands. Input is a tdt file on stdin and output is a
CEX file on stdout.
- pdb2cex [options] [-i
in.pdb [-o
out.cex]]
Convert PDB format input to CEX format output.
Input and output default to stdin and stdout if not
specified. PDB datatypes are generated if no datatypes file is
specified.
- cexsplit prefix [-f listfile] [-m]
One file is created for each molecule object in the input CEX file on stdin.
The output files are named prefix0000.cex, prefix0001.cex, etc. A
subset of the input CEX objects can be selected by using the -f
filename option. filename contains a list (one per line) of sequence
numbers of the CEX objects (molecules) which you wish to split out. So
a file containing
could be used to select
only those molecules from the input CEX stream, which would then be
put out into files prefix0000.cex, prefix0005.cex, etc.
By default, each conformation in the CEX input stream is
treated separately in splitting up the file. However, if the -m flag is
given, the splitting is done on a molecule by molecule basis, so each
output molecule might have multiple
conformations.
- cexsort scorename
Sorts the CEX stream on stdin according to the integer value of
the property specified. For example:
cexsort dockscore <docked.cex >docked.sorted.
Input molecules for which the property is not defined are not output.
Note that the entire input file is sorted in memory so if you wish to
only pick a part of the file out you should use cextop or consensus
which are much more efficient for taking a subset of a large file.
- cextop # scorename [-a] [-u]
Picks the best (lowest scoring) # molecules from the CEX stream
on stdin according to the integer value of the specified property
and outputs them to stdout. For example:
cextop 100 dockscore <docked.cex >best.cex
picks the best 100 dockings according to the DockIt score. The
default mode for cextop picks the best n scoring dockings regardless
of whether or not they come from the same ligand molecule. If you
want to select the n top scoring molecules, the -u flag will only
take the top scoring docking from each molecule so that each docked
molecule selected will come from a different ligand. The -a flag does
what -u does but puts out all the conformations for each ligand
selected.
- consensus # [-c #] [-a] [-u]
Used as (e.g.):
consensus 10 -c 2 <in.cex
>out.cex
puts out all the dockings which are in the best 10 according to any
two out of the three scoring functions and puts them into file
out.cex. -c takes an integer argument of 1, 2 or 3. The default
behavior is to apply the consensus criteria to each ligand
individually. Thus, the command above would examine each ligand in
turn and output each conformation which was in the top 10 scores for
at least two of the three scoring functions . This tends to filter out
conformations which score anomalously high in one scoring function. By
using the -a flag, the consensus criterion is applied to all ligands
at the same time. Thus,
consensus 100 -a <in.cex
>out.cex
would output all the ligand docked conformations which
are in the top 100 scores (for at least 2 out of 3 of the scoring
functions since -c 2 is the default) for all the ligands in the file
in.cex taken together. This can be used to extract the best overall
docked ligands from an input set. If you just want to get a list of
which ligands score best, the -u flag will cause consensus to output
only one arbitrary conformation for each ligand which meets the
consensus criterion. Thus the -u flag is typically used if you want to
extract, for example, the SMILES or compound identifiers from the
output in order to make a list. consensus -u doesn't necessarily put
out the "best" conformation for each ligand since, when
using several scores, the concept of "best" is not well
defined.
- cex_split_scores # [-u] -f
prefix
Example:
cex_split_scores 100 -f prefix
<in.cex
puts out the best 100 dockings in in.cex according to DockIt score,
plpscore and pmfscore and puts the resulting molecules into the files
prefix.dock, prefix.plp and prefix.pmf. The -u flag causes only the
best score (for each of the scoring functions) to be used for each
ligand so only one copy of each ligand will be put out in each output
file, even if there is more than one docked conformation for that
ligand which falls in the top n.
cex_split_scores, cextop and consensus use a priority
queue algorithm and so should be relatively efficient even for large
input files. For picking subsets from a large input file, cextop or
consensus will be much more efficient than sorting the whole input
file.
- summarize
Shell script which summarizes and sorts the output from
docking.
- babelcx
babelcx is a modified version of the
babel
program for converting molecular structure formats. The source is
provided in babelcx.tar.gz. It has been
modified to read and write CEX format. Use -icex or -ocex to read or write
CEX files. This program might be of particular
use in converting various connection table formats (like MDL Mol files or
Tripos Sybyl Mol2 files) in CEX input files for ligands. You will need to
then run the ligands through the atom
type and charge
assignment programs (see the setup script for an example of how
to do this). Note that babelcx is of less use in processing the output from
DockIt since it isn't set up to save the various CEX properties (like multiple
docked conformations, scores and rms values) into other molecular structure
formats. Of particular note, babelcx only outputs the first conformation
found for each ligand in a CEX file. The post-processing programs
cextop and consensus
put out CEX files with only one conformation per ligand (even if this results
in repetition of the ligand information) so they can be processed by babelcx.
However, in general, it is better to use the cex2tdt
program to convert the DockIt output if you want to preserve all the
information (for example, to put the information into a database).