Over an year ago we introduced scripting facility in VLifeMDS, allowing end-user customizations of VLifeMDS platform [see: http://blog.vlifesciences.com/?p=73]. Using scripting through Python, and over 70+ and growing functions provided by VLifeMDS platform, our customers have befitted immensely, especially when customization on the base platform were needed. One of the recent such instances was the revamping of the IM-TECH Toxipred on-line server (http://crdd.osdd.net/oscadd/toxipred/), which uses VLifeMDS platform at the backend for a range of descriptor calculations.
Textual scripting, though providing absolute control over what you want to do with the VLifeMDS platform, is not intuitive for a large number of users who have less inclination towards programming. This set us out to develop an environment, within VLifeMDS platform, which would be GUI driven and more intuitive for a large section of users, and at the same time provide an easy interface for all the users to quickly customize and search for functionalities. One of the best ways to do this is use the concept of Workflows (http://en.wikipedia.org/wiki/Workflow). The scientific community at large have developed a number of generic workflow management systems (for instance KNIME, Project Trident). Most of these systems are available to use by the scientific community and one can easily use scripting interfaces provided with VLifeMDS to write plugins for these generic workflow management systems. Being generic, however, does not make these easily configurable for specific needs. To simplify these issues, we have build a new integrated, domain specific, workflow system for VLifeMDS platform, and is available to all the customers running version 4.2.x of the platform.This reminder of the post is intended to give a brief overview of how to use built-in workflows in VLifeMDS to simplify the way you work, typically when multi-stage processing is required.
A workflow example
Lets take an example of cleaning a PDB file. Cleaning a PDB file typically involves a number of steps, for simplicity lets narrow this down to only the following:
- Read in the PDB file
- Add hydrogens
- Retain only one or more chains
- Fix incomplete residues
- Complete missing residues
- Save the file (typically as .mol2) format
Performing the above activities using VLifeMDS standard GUI or for that matter other molecular modeling packages involve shuffling through a number of dialogs. These are however repetitive activities and can be composed as a workflow, provided the above steps are packaged as workunits that can be connected by a user.
The above figure shows the same steps depicted as a workflow in VLifeMDS. Notice that the visual representation is much easier to comprehend at one shot and gives an overview of the activities happening to accomplish the whole process of PDB cleaning. Notice that the workunits above are colored depending on if the user is required to provide any additional parameters. A light orange color with thick red dotted lines indicates that there are user-editable parameters for this workunit that necessarily needs to be provided by the user for the execution to proceed. A white workunit indicates that there are no user-editable parameters for this workunit. Further a light green colored workunit indicates that there are user-editable parameters for this unit, but each of these parameters have some preset default values.
Domain specific language for workflows in VLifeMDS
Workflows in VLifeMDS uses a custom domain specific language rather than use a generic workflow management system. This allows for strict data type checking together with some extent of semantics checking when connecting the individual work units. It also allows for self-describing workunits that can be automatically connected without the user worrying about which inputs to connect to which outputs of the next workunit.
Search: its tags all the way
All workunits are tagged. When you build a workflow using these workunits, these are automatically tagged. Tags in VLifeMDS workflows are central to how you search and organize the workflows you have created. The user can add or remove tags as needed.
Overview of the interface
The above image gives a glimpse of UI for workflows in VLifeMDS.
Quick video tour
A quick video tool is available at Youtube: http://www.youtube.com/watch?v=hjkoV9PXDrw
There is more coming, visit us during the VLife user summit for a live demo at Hyatt Regency, Pune on 9th June 2012, 3pm onwards.
VLife now announces VLifeMDS Elite User Group. VLifeMDS Elite User Group not only showcase highly recommended research resource centre but also recognize efforts of researchers with rewards.
In the last post on scripting I added a note on parallellization. As the processor technology is moving towards wide scale adaption of typically more than one core on a single desktop machine, it is important for us to be able to use this extra processing power. Towards this end, we have rewritten many of our compute intensive algorithms to take advantage of multi-core processors when available. Along with this we have also done a number of optimization on our existing algorithms for faster performance. The end game is that now you as a researcher, benefits from faster algorithms, resulting in less wait time, and better research productivity.
You would also be interested to know that we have worked closely with Intel recently to optimize our codes specifically for the Intel’s second generation Core architecture, also code named as Sandybridge.
To know more about the technical details of the work done (which are now shipping with VLifeMDS 4.1), I would like you to redirect to the white paper available here [1,2].
Enjoy a faster new year with VLifeMDS
 Optimizing VLife* Molecular Design Suite Using Intel® Parallel Studio XE, Available from here http://www.vlifesciences.com/support/whitepaper_multicore.php
Wish you all “A Very Happy and Prosperous New Year!!!”
Since my last post, discussion on individual threads was required, in this post I will discuss on modeled protein structures.
Why model protein structures?
To understanding the biological function of proteins including their interactions with other proteins, ligands, substrates and inhibitors, it is necessary to determine their 3D structure. Classical methods of structure determination take massive time and money.
Thus, computational methods for predicting the protein structure form the sequence, like Ab-initio, threading, comparative modeling, become increasingly important.
How accurate are modeled protein structures?
For reasons like huge number of conformations of protein structures and incomplete understanding about physical stability of protein structures; the chances of errors in modeled structures are high. The extent of errors inherited in the model may be different in different regions, like the non-conserved surface loops are expected to be the least reliable parts of a protein structure may deviate markedly from experimentally determined control structure.
Modeled helices and strands are ideally straight, while bends, due to many factors steric interaction between side chains or interaction with solvent molecules, are often found in the secondary structures of proteins. On the contrary, such a tricky assignment of secondary structure could produce large bends in the protein structure, which can be analyzed by visual inspection.
Optimizing local atomic geometry of the structure, deviation from global topology is expected and when a small number of discrete states are used to represent a protein structure, the structure is necessarily distorted. Although small in nature the distortion effects can be cumulative, small distortions in a larger target can lead to substantial structural changes.
Use of modeled protein structures?
Inclusion of predicted protein structures with local structural distortions in the screening process may yield much lower enrichment of known actives, especially for structural distortions present near the binding site, resulting in significant drop-off in the ability to recognize ligands. Interpretation of the in-vivo expression profiles of proteins on the basis of such a model may be misleading.
Identifying the sources and magnitude of errors/variations in predicting biological profiles for small molecules could prove critical in cases where modeled proteins are the only solution.
This exclusive webinar covers:
• GPCR modeling challenges
• Introduction to GQSAR
• Model building leads to lead optimization
• Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista
• Macintosh®-based attendees
• Required: Mac OS® X 10.4 (Tiger®) or newer
• 03:00 PM (Sydney Time)
• 10:30 AM / 03:00 PM (India Time)
• 10:30 AM (UK Time)
• 12:30 PM (US Eastern Time)
Reserve your Webinar seat now at:
In the last post, we got a glance of scripting support in VLifeMDS. Since the last post, we have added a number of small improvements to the scripting interface, the more noticeable one being support for command line scripting in Windows. With this you will now get a consistent scripting experience across Windows and Linux.
In this current post, I will dwell more into writing some scripts that you can use for simplifying your workflow.
What functions are available?
As with Python, you can get help about any scripting functionality in VLifeMDS using the help() function:
Doing more with inbuilt Python functions
In the last post, we looked at few low level and module level functions provided by VLifeMDS. In this post, I will give example of how to combine these functions with what is already available in Python to perform batch processing.
Let us consider a typical batch processing scenario where you have a bunch of 2D .sdf files stored in a directory. You want to convert these into 3D .mol2 structures and subsequently generate conformers for each of these files.
The following listing gives an idea of how to do this using the scripting facilities in VLifeMDS.
import os import mds import sys molFileFilters = ["mol2"] def isMolFile(molFile): isMol = False for mff in molFileFilters: if (molFile.endswith(mff)): isMol = True break return isMol def myJob(indir, outdir) mds.convert2Dto3D(indir, outdir) molfls = os.listdir(outdir) nmols = len(molfls) for i in range(nmols): if (not isMolFile(molfls[i])): continue molFileName = os.path.join(outdir, molfls[i]) mol = mds.readMolecule(molFileName) mds.generateConformer(mol, caType=0, noOfSeeds=100, FF=FF.MMFF, torsianRMS=70.0, dieleFunc=1) mds.deleteMolecule(mol) # execute this script: myJob("path/to/sdf", "path/to/mol2")
Lets walk through the above code snippet. The first three statements are imports of packages required for writing our script. Next, we define a list of accepted file extensions (
molFileFilters ) and a function (
isMolFile() ) that checks for these given a file name. The
myJob() function that follows actually does what we just stated above: convert 2D to 3D structures and the generate conformers for each.
mds.convert2Dto3D(indir, outdir), converts all the 2D .sdf files in the folder specified by
indir to another folder
outdir as .mol2 files. Next, we get a list of all the files present in folder
os.listdir() function from Python. The subsequent code then iterates through each of the files, reads the molecule object, generates conformer and then finally cleans up memory for the molecule object.
How to control conformer generation?
In the listing above, the call to
mds.generateConformer() does the necessary background work to generate conformers. As with the GUI for generating conformers, you have complete control over the parameters for this script function. Let us quickly run through some of the parameters used in above listing:
mds.generateConformer(mol, caType=0, noOfSeeds=100, FF=FF.MMFF, torsianRMS=70.0, dieleFunc=1)
The first argument is the molecule object returned from a call to
mds.readMolecule() function. This is the only argument that is mandatory, all other arguments are optional and have some default values. Optional arguments should be provided using
name=value format, as in
caType=0. The following is quick description of the optional arguments used in above call:
caType: VLifeMDS currently supports two methods for generating conformer : Monte-Carlo Metropolis (
caType=0, default), and Systematic (
caType=1) , See reference 
noOfSeeds: If caType=0, then number of seeds used for Monte-Carlo run (default value is 50), else has no effect.
FF: The force field type used to compute energy of generated conformer by Monte-Carlo as well as Systematic methods (default is
torsianRMS: torsian RMS cutoff used when generating conformers (default is 50.0 degrees). This parameter is used to identify geometrically similar conformers so that they can be filtered out from the final output. A lower cutoff value would produce more conformers that are close by.
dielecFunc: use di-electric function when computing energy (default is 0, signifying false)
When the call to
mds.generateConformer() completes, a
.ca file with the same name as the molecule file is generated. This file can be opened using the worksheet available in VLifeMDS GUI or can be processed using a script. In later posts I will give information on how to process a
.ca file using scripting.
Note on parallelization
Compute intensive functions such as
mds.generateConformer() have been parallelized and automatically utilize all the cores available on multi-core processor. To get best performance, it is wise to start only one VLifeMDS instance and refrain from submitting more than one jobs simultaneously. If one starts multiple instances of VLifeMDS that all try to perform some compute intensive jobs, the overall performance will be severely hampered as all these processes try to steal CPU cycles from one another. If you have multiple jobs to be run, a better way is to create a batch of all these jobs that are run sequentially.
More to come : How to handle file types not natively supported by VLifeMDS, process
.ca files etc..
[In the mean time the following are links for general information on Python scripting]
1. ‘Byte of Python’ by C. H. Swaroop available at: [External Link: http://www.ibiblio.org/swaroopch/byteofpython/files/120/byteofpython_120.pdf ]
2. ‘Dive into Python’ available at: [External Link: http://diveintopython.org/]
- For information on Monte-Carlo Metropolis algorithm see: Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. (1953). “Equations of State Calculations by Fast Computing Machines“. Journal of Chemical Physics 21 (6): 1087–1092. doi:10.1063/1.1699114
VLifeMDS 4.0 free license coupon offer is now valid for just 2 weeks, offer ends on 9th September 2011. Only privileged members are offered free license of VLifeMDS 4.0. Check out some of the exciting features of version 4.0 here . To avail the VLifeMDS 4.0 Free license please fill in the form before 9th September 2011 with the coupon # that you have. If you do not have coupon number then write us an email at email@example.com and brief us about your research project and how do you plan to use VLifeMDS 4.0 free license software. You might be the lucky one to avail the free copy of VLifeMDS 4.0. Alternatively request for a limited period evaluation of VLifeMDS 4.0. Also spare a moment on terms and conditions mentioned while filling up free license coupon form.
Computational techniques in drug discovery and development process are rapidly gaining popularity, implementation and appreciation. Computational and experimental techniques have important roles in drug discovery and development and represent complementary approaches.
The latest technological advances (QSAR/QSPR, SBDD, combinatorial library design, cheminformatics & bioinformatics); the growing number of chemical and biological databases; and an explosion in currently available software tools are providing a much improved basis for the design of ligands and inhibitors with a desired specificity. Expertise to handle these technologies can be encountered only after persistent efforts and knowing basics of this field. Here I am listing few of my of early days try outs and learning while doing molecular modeling.
1. Selection of correct PDB structure
Searching for a protein structure from the protein data bank you may run into several challenges, for example, structures from different species, particular those determined by crystallography, only include information about part of the functional biological assembly. The most significant challenge is the selection of structure truly predictive of the disease under investigation.
Selection of PDB for a target where there is more than one structure available within a metafold, domains comprising an entire chain in the PDB can be preferred over domains from mulitmers and multi-domain proteins. Resolution as a measure of underlying data quality in a diffraction experiment, should also be considered for selection of PDB structure.
2. Cleaning of the selected PDB structure
The typical structure file from the PDB is not suitable for immediate use in molecular modeling calculations. A typical PDB structure file consists only of heavy atoms and may include co-crystallized ligand, water molecules, metal ions, and cofactors. Often, PDB files may have missing atoms, missing residues or incomplete residues which may or may not be a part of site of interest.
Special care must be taken in this case, as many softwares read only the ATOM and HETATM records, not the SEQRES records, and so will not handle missing structures (like Hydrogens, heavy atoms, residues or loops).
3. Elimination of water molecules
The inclusion of specific crystallographic water molecules has been reported to improve the accuracy of the cross-docking predictions for some specific complexes.
Removal of all the waters molecules within the binding pocket to avoid user-derived bias towards specific bound complexes can avoid false positive results.
4. Identification and extraction of correct Co-crystal ligand
When enzyme structures are determined by X-ray crystallography or NMR, the resulting structures may or may not have an incorporated ligands. Instead these ligands are often inhibitors or substrate analogues mainly used for identification of binding site. As the case may be ligand components shown can be modified protein residue, ligands, metal ions, and cofactors.
Deleting the unwanted components is necessary before using the structure, to avoid unwanted interactions.
5. Sequence alignment for homology modeling
Sequence alignments are accurate for proteins of high sequence similarity and become unreliable as approach the so-called ‘twilight zone’ where sequence similarity gets indistinguishable from random.
It is necessary for the user to carefully examine the alignment to see if it makes some biological sense.
Knowledge on the functional regions of the sequence being aligning, can be used for assessing the quality of the alignment. The functional regions are often more or less conserved between the relatively closely related sequences. As the case may be, quite a few gaps should be present into closely conserved areas, and most of the gaps should be present into less well conserved areas.
6. Loop modeling
Loop modeling is a problem in protein structure prediction requiring the prediction of the conformations of loop regions in proteins without the use of a structural template.
Knowledge-based approach benefits enormously from the steadily growing PDB, and has been shown to be a strong competition for ab-initio loop prediction.
The loop conformations extracted from the database are ranked by a combined quality score, that considers sequence similarity, bumps with the rest of the structure, and the fit to the terminal loop anchor points.
7. Geometry Check and Correction for the Protein structure/Homology Model
During rectification of protein structure the side chains might not be in the correct geometry, and may have bad contacts with surrounding atoms or residues. Addition, modification or deletion of residues during structure refinement may possibly led to deviation form favored geometry. It is necessary to correct the side-chain geometry to a more reasonable form.
8. Ligand structures for docking studies
To give the best docking results, the structures that are docked must be good representations of the actual ligand structures as they would appear in a protein-ligand complex. The ligand must be in correct 3D format compatible to the program. The ligand structures provided for the docking studies should be with correct bond lengths and bond angles because ligand flexible docking only modifies torsional angles.
9. Post-Docking Energy Minimization(calculating Binding Energy)
The free energy of binding is the change in free energy that occurs on binding,
ΔGbinding = Gcomplex – Gseparated
where Gcomplex and Gseparated are the free energies of the complex and non interacting protein and ligand respectively.
The ligand poses generated during docking are rarely exactly at a local minimum and optimization of complex allows the relaxation of the protein to certain extent which can account for the conformational changes that happen in the protein structure on binding of the ligand.
Aggregation of the binding site with the ligand in a complex is way of estimating free energy of a complex without affecting the geometry of ligand during post-docking energy minimization.
10. Application of Docking to virtual screening
Virtual screening (VS) is a computational technique used in drug discovery research. It involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme. Docking combined with a scoring function can be used to quickly screen large databases of potential drugs in silico to identify molecules that are likely to bind to protein target of interest.
Application of ADME filter for the virtual screen during docking helps in eliminating ligand which may exhibit poor ADME properties in-vivo.
I hope budding modeler would find these precautions helpful in avoiding many unwanted results. Happy modeling!
Scripting in VLifeMDS has been a frequent request from user community. Though earlier versions had limited scripting abilities, with version 4.0 refresh, we have introduced full support for user scripts. With this, users have the flexibility of easily performing repetitive tasks, connecting modules in new ways or even writing entirely new applications based on VLifeMDS platform.
• VLifeMDS uses Python for scripting purposes. Python is a relatively easily language to learn and is in use by many in the Scientific community. For a biggner’s guide to using Python refere to ‘Byte of Python’ by C. H. Swaroop available at: [External Link: http://www.ibiblio.org/swaroopch/byteofpython/files/120/byteofpython_120.pdf ]
• For more information on Python and using it see: [External Link: http://www.python.org ]
• For Linux and Windows to access the interactive scripting console.
• For Linux pure command line support is also available and is accessible using:
runmds.sh <scriptfile.py> <command line arguments to script>
• Current implementation of scripting doesnot support GUI / Graphics manipulations.
The easiest way to try scripting in VLifeMDS is to use the interactive scripting console. This is accessible from Tools -> Scripting Console menu. The scripting console provides an interactive prompt which accepts Python commands.
To use MDS functionality with in the scripting console, one is required to use the ‘mds” package as follows:
>>> import mds >>> mds.helloMDS()
The above command will print a welcome string. To get the current version of VLifeMDS use:
Following screenshot gives an idea of the interaction with the scripting console on Ubuntu:
Low level functions
The scripting functionality allows a user of VLifeMDS to use many low level functionalities. These fuctionalities basically deal with manipulating molecule objects. For instance, to read a molecule file one can use the following:
>>> mol = mds.readMolecule("path-to-supported-molecule-format")
In the above code snippet, the variable ‘mol’ captures the molecule object that MDS internally uses. Note that if the file you are trying to read is not supported or has format problem, an exception is thrown. The good thing about scripting is that, if a file format is not supported, one can write a custom reader and generate a runtime molecule object that can can then be used with other MDS functions. More on this in upcoming posts.
Module level functions
Highlevel module functions allow one to perform common functionalities in VLifeMDS without using the GUI. In this way, one gets more flexibility to the way these functionalities may be used. Examples of this include computing energy, energy minimization, conformation generation, 2D to 3D conversion etc. For instance, to compute energy of a molecule read previously one may use:
>>> print mds.computeEnergy(mol)
This will print the energy of the molecule computed using default parameters. One can change optional parameters to suite ones needs. Say one needs the energy to be computed using MMFF force field, rather than the default UFF, the following modification to the above script can be used:
>>> print mds.computeEnergy(mol, FF=mds.FF.MMFF)
You can also check out a short technical whitepaper downloadable from here: [Molecular modeling scripting whitepaper]
Scripting VLifeMDS will be a regular coulumn in this blog. In subsequent posts, I will provide varying examples of how you can tap into the power and flexibility offered by writing scripts for VLifeMDS.
Happy scripting and stay tuned!