Selection Clauses |
Magnet Documentation SEA Reference Selections Expressions Actions Examples | Each selection is made up of various phrases which combine to specify a set of atoms to which the other clauses apply. Currently, the right selection is required although it may be made optional in other implementations. An empty selection clause (e.g. {}) selects no atoms but the rule is triggered as if one atom matches (so a rule with two empty clauses will be matched once). The all() phrase stands alone in a selection clause and causes all atoms in the corresponding molecule to be selected. The currently implemented phrases are:
chain, residue, resname and atname are synonyms for the respective plural forms. A QLIST is a list of quoted terms (e. g. "A", " ") which specifies one or more chain identifiers. Note that in PDB records these chain identifiers are one character long (but there is no restriction to this in the SEA language). An NLIST is a list of positive integers, including ranges (e. g. 1,4,5-9,10). There are a few unusual cases in PDB files which are not yet handled. Insertion codes can occur in residue numbers (e.g. 10A) and are recognized by the SEA grammar but not currently by the DockIt PDB parser. This will be fixed in an upcoming DockIt release. In rare cases, there are PDB entries with negative residue numbers. These are not currently handled by the SEA grammar (but may be in a future release). The current work-around for these residue number issues is to renumber the residues to be increasing integers. The near phrase is valid only for the right hand selection and is implemented strictly to improve computational efficiency. For example, near(4.2) prevents the subsequent selection phrases from being applied to any atoms in the right hand molecule (generally the receptor which is a large molecule) which are further than 4.2 angstroms from any atom in the left hand selection result. When the subsequent selection phrases would potentially apply to a large number of atoms, use of the near phrase can result in significant speedups. With the exception of the smarts phrase, the selection phrases listed above imply no particular connectivity. Thus, without a smarts phrase, each atom which meets the selection criteria is taken one at a time for subsequent expressions and actions. The smarts phrase allows for specification of substructures using the full smarts syntax. Each atom in a smarts pattern match can be subsequently referred to in the actions clause. In addition, the atoms specified by preceding selection phrases restrict what atoms can be matched by the smarts. A vector binding (named SEL) is defined by the atoms which satisfy the preceding selection phrases (chain, etc). This binding can be used in the smarts pattern to restrict the matching atoms. For example, the smarts phrase smarts("[$SEL]") would mean that each atom which satisfies the selection clause would be taken one at a time for further processing in the expression and action clauses. This is also the default in the absence of a smarts phrase. Some further examples to clarify the selection language:
residue(10) resname("ASN") smarts("[C;$SEL](=[O;$SEL])[N;$SEL]")
matches only the side-chain amide in ASN 10 while
residue(10) resname("ASN") smarts("[C;$SEL](=O)N")
will also match the main chain amide between ASN 10 and residue 11 since only the carbon atom is required to be in ASN 10. Note that the first atom in the smarts is always required to satisfy the rest of the selection phrases so the selection clause
residue(10) resname("ASN") smarts("C(=O)N")
is equivalent to the second example above, not the first. The selection clauses
residue(10) resname("ASN")
residue(10) resname("ASN") smarts("*")
residue(10) resname("ASN") smarts("[$SEL]")
are all equivalent and all select each atom of ASN 10, one atom at a time. The atom names are matched after stripping leading and trailing white space. This is somewhat incompatible with how atom names are represented in PDB files since the standard for PDB files specifies that the atom name will contain the element symbol right justified in the first two places. Thus, "CA " and " CA " are different in that the former represents a calcium atom whereas the latter represents a carbon alpha. Thus is will be necessary to use other information to disambiguate those two cases in the current version of SEA. This could be done with residue names, for example, or by using smarts. Thus
{atname("CA") smarts("C")}
and
{atname("CA") smarts("[Ca]")}
would apply to the two different cases. The define statement can be used to create other vector bindings which can be used in subsequent rules. For example:
define hets "[N,O]"
smarts("[$hets]")
Besides being a notational convenience, defining vector bindings can also improve performance since the vector binding is matched once and then the matched atoms are used in the subsequent rules. The begin statement allows action clauses to be executed before processing starts on any ligands. This is primarily of use in using printf() functions to put out headers on the output, before any other output is created. | |||