Skip to content

10 precautions while doing molecular modeling and avoiding further mistakes

by Kundan Ingale on August 12th, 2011

Molecular modeling puzzleComputational techniques in drug discovery and development process are rapidly gaining popularity, implementation and appreciation. Computational and experimental techniques have important roles in drug discovery and development and represent complementary approaches.

The latest technological advances (QSAR/QSPR, SBDD, combinatorial library design, cheminformatics & bioinformatics); the growing number of chemical and biological databases; and an explosion in currently available software tools are providing a much improved basis for the design of ligands and inhibitors with a desired specificity. Expertise to handle these technologies can be encountered only after persistent efforts and knowing basics of this field. Here I am listing few of my of early days try outs and learning while doing molecular modeling.

1. Selection of correct PDB structure
Searching for a protein structure from the protein data bank you may run into several challenges, for example,  structures from different species, particular those determined by crystallography, only include information about part of the functional biological assembly. The most significant challenge is the selection of structure truly predictive of the disease under investigation.

Selection of PDB for a target where there is more than one structure available within a metafold, domains comprising an entire chain in the PDB can be preferred over domains from mulitmers and multi-domain proteins. Resolution as a measure of underlying data quality in a diffraction experiment, should also be considered for selection of PDB structure.

2. Cleaning of the selected PDB structure
The typical structure file from the PDB is not suitable for immediate use in molecular modeling calculations. A typical PDB structure file consists only of heavy atoms and may include co-crystallized ligand, water molecules, metal ions, and cofactors. Often, PDB files may have missing atoms, missing residues or incomplete residues which may or may not be a part of site of interest.
Special care must be taken in this case, as many softwares read only the ATOM and HETATM records, not the SEQRES records, and so will not handle missing structures (like Hydrogens, heavy atoms, residues or loops).

3. Elimination of water molecules
The inclusion of specific crystallographic water molecules has been reported to improve the accuracy of the cross-docking predictions for some specific complexes.
Removal of all the waters molecules within the binding pocket to avoid user-derived bias towards specific bound complexes can avoid false positive results.

4. Identification and extraction of correct Co-crystal ligand
When enzyme structures are determined by X-ray crystallography or NMR, the resulting structures may or may not have an incorporated ligands. Instead these ligands are often inhibitors or substrate analogues mainly used for identification of binding site. As the case may be ligand components shown can be modified protein residue, ligands, metal ions, and cofactors.
Deleting the unwanted components is necessary before using the structure, to avoid unwanted interactions.

5. Sequence alignment for homology modeling
Sequence alignments are accurate for proteins of high sequence similarity and become unreliable as approach the so-called ‘twilight zone’ where sequence similarity gets indistinguishable from random.
It is necessary for the user to carefully examine the alignment to see if it makes some biological sense.
Knowledge on the functional regions of the sequence being aligning, can be used for assessing the quality of the alignment. The functional regions are often more or less conserved between the relatively closely related sequences. As the case may be, quite a few gaps should be present into closely conserved areas, and most of the gaps should be present into less well conserved areas.

6. Loop modeling
Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins without the use of a structural template.
Knowledge-based approach benefits enormously from the steadily growing PDB, and has been shown to be a strong competition for ab-initio loop prediction.

The loop conformations extracted from the database are ranked by a combined quality score, that considers sequence similarity, bumps with the rest of the structure, and the fit to the terminal loop anchor points.

7. Geometry Check and Correction for the Protein structure/Homology Model
During rectification of protein structure the side chains might not be in the correct geometry, and may have bad contacts with surrounding atoms or residues. Addition, modification or deletion of residues during structure refinement may possibly led to deviation form favored geometry. It is necessary to correct the side-chain geometry to a more reasonable form.

8. Ligand structures for docking studies
To give the best docking results, the structures that are docked must be good representations of the actual ligand structures as they would appear in a protein-ligand complex. The ligand must be in correct 3D format compatible to the program.  The ligand structures provided for the docking studies should be with correct bond lengths and bond angles because ligand flexible docking only modifies torsional angles.

9. Post-Docking Energy Minimization(calculating Binding Energy)
The free energy of binding is the change in free energy that occurs on binding,

ΔGbinding = Gcomplex – Gseparated

where Gcomplex and Gseparated are the free energies of the complex and non interacting protein and ligand respectively.
The ligand poses generated during docking are rarely exactly at a local minimum and optimization of complex allows the relaxation of the protein to certain extent which can account for the conformational changes that happen in the protein structure on binding of the ligand.
Aggregation of the binding site with the ligand in a complex is way of estimating free energy of a complex without affecting the geometry of ligand during post-docking energy minimization.

10. Application of Docking to virtual screening
Virtual screening (VS) is a computational technique used in drug discovery research. It involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme. Docking combined with a scoring function can be used to quickly screen large databases of potential drugs in silico to identify molecules that are likely to bind to protein target of interest.

Application of ADME filter for the virtual screen during docking helps in eliminating ligand which may exhibit poor ADME properties in-vivo.

I hope budding modeler would find these precautions helpful in avoiding many unwanted results.  Happy modeling!


Kundan Ingale

I am Medicinal Chemist exploring applications of CADD. Working as Application Scientist, at VLife Sciences Technologies Pvt. Ltd. I am closely associated with technology development for computer aided drug discovery and its application.

More Posts - Website

  1. Chandrika permalink

    For Point 3, retaining unnecessary water molecules in the active site during docking can actually result in false negatives, due to steric hindrance by water molecules to some conformations/poses of the ligand.

  2. GopiMohan permalink

    Point 10: ADME/T should be also included in VS, because toxic compounds later can lead toits withdrawal from the later stage of clinical trials

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS

Follow Me