Selecting the Right Protein Modeling Program: A comparison of I-TASSER, Phyre ^2, and PredictProtein.
Non-Technical Introduction:
Proteins are the building block of life and the functional downstream product of a gene. They are an integral part of the Central Dogma of Molecular Biology (DNA>RNA>Protein) and carry out a vast array of functions that keep cells, and thereby organisms, alive each day from acting as defense compounds, transcriptional regulators, and structural components of cells. Proteins can function alone, interact with one another to form associations and larger biological ‘machines’, or combine together create massive structures like bones, the heart, or a plant seed – they are all made of proteins. Seeing as they are ubiquitous throughout all of life and millions are found in each cell, understanding the minute details of proteins is a very important field of study for many scientists in understanding the genes, proteins, or even organs they study. Here I offer a comparison of 3 programs that help researchers resolve questions of protein structure, and characteristics related to structure, as structure is a large factor in protein function or lackthereof.
Technical Introduction:
In this review brief I will review I-TASSER, Phyre 2, and PredictProtein, arguably the three most widely employed and accurate protein modeling programs implicated across the molecular biology literature space within the timespan of 2016-2021. I have identified strengths, weaknesses, and capabilities/lackthereof of each and report these findings in a comparative format to aid the reader in identifying the correct program for a particular analysis to assist with program selection tailored to answer the research questions posited by the reader. All three programs have been cited thousands of times with tens-of-thousands of reads on their various original and follow-up articles. I will not attempt to proxy accuracy, popularity, or quality of the programs by total citations or read metrics between programs since they have been in existence for varying lengths of time and their stewards publish update articles at different intervals between the programs thus confounding this potential comparative strategy.
These programs possess the capabilities to undertake many protein-function related analysis such as synonymous/non-synonymous mutation identification to a known GOI sequence, binding pocket prediction, and biological function prediction, just to name a few, in addition to predicting and generating a 3D protein structure. I report the capabilities of each function within each program to assist the reader with program selection based on your investigative needs.
Quick Hits:
- Only I-TASSER and Phyre 2 can create 2D or 3D protein structures/visualizations.
- PredictProtein is the easiest to use for a beginner or a quick overview of your protein, visually beautiful, but makes data traceability, detailed-data recording, and interpreting output data long-term difficult.
- I-TASSER has been previously noted as the most accurate homology-based 3D protein modeling program based on its engine and computations, but some functions which are available in one of the other two programs may not be available in I-TASSER.
- Each program possess its own unique suite of functions which make outright comparisons and recommendations hard – it is best for the reader to review the comparison matrix and make the decision for yourself which program(s) best suites your needs.
Overall Impressions (Discussion):
- Batch processing is only possible with Phyre 2.
- Only I-TASSER and Phyre 2 produce 3D protein prediction models.
- PredictProtein results last indefinitely while I-TASSER and Phyre 2 last 30 days.
- I-TASSER produces great output data that can be saved as a .pdf but produces few actual download-able files; conversely, Phyre 2 and PredictProtein produce many downloadable files. Keep this in mind depending on data preservation preferences.
- PredictProtein’s downloadable files are difficult to interpret and glean information from once they are disassociated from the graphically interactive results webpage which results are reported in.
- B-Value, an important measure of protein flexibility in a structure, is only reported in I-TASSER and PredictProtein; it is unclear if Phyre 2 accounts for this important statistic in its modeling function and just does not report it or if it does not take this into account when modeling.
- Prediction accuracy measures may or may not be reported depending upon the function in question in each of the programs.
- Categorically:
- I-TASSER and Phyre 2 possess the best ‘General Quality of Life Items’ (download ease, data traceability, etc.).
- Phyre 2 possesses more functionality for the category of ‘Protein Modeling’, however, I-TASSER is still reputed in the field as the most accurate program for homology-based 3D protein modeling.
- If the reader wishes to conduct ‘Sequence Investigation’ (BLASTs, nt conservation reporting to a known sequence, etc.) in parallel with the protein analysis, PredictProtein or Phyre 2 are recommended.
- In the category of ‘Structural and Activity Investigation’, each program possesses unique functions lacked by the other programs, therefore, I cannot recommend one program over another in this category.
- I recommend PredictProtein as the best program for ‘Biological Function Investigation’ as it reports a subcellular localization prediction while the other two programs do not. Please note that there is some variability if functionality between programs again within this category.
- PredictProtein reports binding affinities for DNA, RNA, and protein while I-TASSER reports ‘Ligand Binding’ and ‘Enzyme Commission (Active Site)’ sites. Phyre 2 detects pockets, which is a related but different topic, but still very important to activity, etc.. I recommend combining Phyre 2’s pocket binding prediction with either I-TASSER’s or PredictProtein’s site detection.
- While PredictProtein data traceability and detailed data recovery may be challenging, it is a great program for gaining a sound overall picture of the protein of interest prior to submitting to either I-TASSER or Phyre 2 for a more nuanced, deeper analysis with 3D protein modeling.
Final Programmatic Recommendation Ranking:
- I-TASSER (with Phyre 2 a close second)
- Phyre 2
- PredictProtein
Important Notes to Remember:
- Results are cleared from each program’s logs 30-days after the query completes (besides PredictProtein).
- All results are just projections of the true protein structure, sites, activities, etc.. All of the results are simply theoretical but such programs typically offer high accuracy modeling for most proteins based on the structure of other known proteins. If your protein is very unique or very large the odds of correctly modeling your protein are decreased, but that is not to say that a lot cannot still be learned from modeling.
- In silico protein modeling employing these programs is cheap and fast! It’s free and can be completed in minutes or maybe at the worst days! Creating a crystal structure to determine structure of a protein of interest can take months and is far from free.
- If you do not want your protein to become part of the public domain, be sure to deselect or select the appropriate option when submitting a job in any of the programs.
Results and Comparison Matrix:
Table 1. - | |||||||||
I-TASSER | Phyre 2 | PredictProtein | |||||||
General Items | Quality | Best | Good | Good | |||||
Time | 2 days to 2+ weeks | 2.5h | 20min | ||||||
User Interface | Basic but Easy | High Quality and Easy | Beautiful and Easy | ||||||
Report Access Period | 30 days | 30 days (+30 upon request) | Indefinitely | ||||||
Batch Processing | No | Yes | No | ||||||
File Download In Native Format | There’s really nothing to download using this platform besides the protein models | Yes | Yes | ||||||
File Download in text, excel, or human readable format | See above | Yes | No | ||||||
Save webpage as a Quality pdf | Yes | Yes | Yes but the page is really interactive and graphically intense so saving a non-interactable pdf isn’t all that useful | ||||||
Protein Modeling | Protein Modeling | Yes, 2D & 3D | Yes, 2D & 3D | No, 1D sequence only analysis | |||||
Protein Model Confidence and Coverage | Yes and Yes | Yes and Yes; Also provides a ProQ2 quality score and an alignment quality score upon request; graphical, model, and print reporting hybrid | N/A | ||||||
Atom Clash Reporting | No | Yes, upon request; graphical, model, and print reporting hybrid | N/A | ||||||
Rotamer Reporting | No | Yes, upon request; graphical, model, and print reporting hybrid | N/A | ||||||
Ramachandran Analysis a | No | Yes, upon request; graphical, model, and print reporting hybrid | N/A | ||||||
Number of Models Reported | Top 5 | Top 30 | N/A | ||||||
Models are Downloadable | Yes | Yes | N/A | ||||||
Allows Editing of Protein Image/Video Environment | Yes | No | N/A | ||||||
Reports the Templates Used in the Modeling | Yes | Yes | N/A | ||||||
Image Exporting Made Easy? | Yes | Yes | Yes | ||||||
Sequence Investigation | Conducts an AABLAST of the Protein | No | Yes | Yes | |||||
Reports Top AABLAST Alignments | N/A | Yes, print and fasta file reporting | Yes, graphical interactive reporting | ||||||
Conservation Reporting b | No | Yes, upon request; graphical, model, and print reporting hybrid | Yes, graphical interactive reporting (ConSeq) | ||||||
AA Composition Breakdown (and piechart) | No | No | Yes, yes graphical output | ||||||
Structural and Activity Investigation | Secondary Structure Reporting | Yes, print reporting | Yes, graphical and print reporting hybrid | Yes, graphical interactive reporting (RePROF) | |||||
Secondary Structure Confidence Reporting | Yes, print reporting | Yes, graphical and print reporting hybrid | No | ||||||
B-Value c | Yes, graphical and print reporting hybrid | No, but see below. | Yes, graphical interactive reporting (PROFbval) | ||||||
Disordered Region Reporting d | No | Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request | Yes, graphical interactive reporting (Meta-disorder) | ||||||
Disordered Region Confidence Reporting | N/A | Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request | No | ||||||
Accessibility Reporting e | Yes, print reporting | No | Yes, graphical interactive reporting (RePROF) | ||||||
Transmembrane Domain (Topology) f | No | Yes, a descriptive illustration | Yes, graphical interactive reporting | ||||||
Protein, DNA, and RNA Binding Site Reporting | See Below | Not exactly, see Pocket Detection below | Yes; all reported separately. Does not take into account 3D model and context of AA in space relative to one another as it does not make a 3D model. | ||||||
Ligand Binding Site Reporting | Yes, predicted by taking into account the 3D model and context of AA in space relative to one another. Print reporting with visualizations | Not exactly, see Pocket Detection below | See Above | ||||||
Enzyme Commission (Active Site) Prediction | Yes, predicted by taking into account the 3D model and context of AA in space relative to one another. Print reporting with visualizations | Not exactly, see Pocket Detection below | No | ||||||
Pocket Detection Reporting g | No | Yes, upon request; only large pockets reported (ie – pockets that are strong candidates to contain an active site(s)); graphical, model, and print reporting hybrid | No | ||||||
Disulfide Bond Reporting | No | No | Yes, graphical interactive reporting (DISULFIND) | ||||||
Effect of a Theoretical Point Mutation’s (at each AA) Impact on the Overall Protein Structure | No | Yes, upon request; graphical, model, and print reporting hybrid | Yes, heat map graphical output and a raw data export file but it seems very unwieldy | ||||||
Biological Function Investigation | GO Analysis Built In | Yes | Not exactly, but it does report in detail suspected biological functions for each protein model generated | Yes, interactive with print reporting | |||||
CD Search Built In | No | Yes | No | ||||||
Predicted Function Reported | Kind of, as part of the GO analysis you can click on the various terms to give you an idea of biological functions | Yes, for each protein model | As part of the GO analysis yes, but only for the topmost handful of hits on the sequence provided | ||||||
Subcellular Localization Prediction | No | No | Yes, graphical output |
Definitions and Notes:
a = Rates the likelihood of bonds allowing such bends between such atoms
= It conducts an AABLAST in the background to determine this
= A measure of flexibility of a stretch of AA; more vibration leads to higher ambiguity when modeling a protein structure as when it moves around more the image capturing device (when making an actual protein crystal) is less certain of where it truly is or is at most often. Generally speaking, more flexible regions are typically analogous to catalytic and sensing sites whereas rigid regions tend to relate to sites that cannot withstand movement and still complete their task, such as an active site.
= This is essentially the same outcome as B-Value but what this is actually doing predicting the amount of disorder specifically a crystal structure may experience, not how much flexibility a stretch of AA possesses innately (although knowledge of such is necessary to determine this as well).
= Based on the best model
= This topology is indicative of a transmembrane protein
= Large pockets are often locations of active sites
Written by TPS graduate student Cullen Dixon