Selecting the Right Protein Modeling Program: A comparison of I-TASSER, Phyre ^2, and PredictProtein.

November 22, 2021

Selecting the Right Protein Modeling Program: A comparison of I-TASSER, Phyre ^2, and PredictProtein.

Translational Plant Sciences Graduate Program Blog

Non-Technical Introduction:

Proteins are the building block of life and the functional downstream product of a gene.  They are an integral part of the Central Dogma of Molecular Biology (DNA>RNA>Protein) and carry out a vast array of functions that keep cells, and thereby organisms, alive each day from acting as defense compounds, transcriptional regulators, and structural components of cells.  Proteins can function alone, interact with one another to form associations and larger biological ‘machines’, or combine together create massive structures like bones, the heart, or a plant seed – they are all made of proteins.  Seeing as they are ubiquitous throughout all of life and millions are found in each cell, understanding the minute details of proteins is a very important field of study for many scientists in understanding the genes, proteins, or even organs they study.  Here I offer a comparison of 3 programs that help researchers resolve questions of protein structure, and characteristics related to structure, as structure is a large factor in protein function or lackthereof.

 

Technical Introduction:

In this review brief I will review I-TASSER, Phyre 2, and PredictProtein, arguably the three most widely employed and accurate protein modeling programs implicated across the molecular biology literature space within the timespan of 2016-2021.  I have identified strengths, weaknesses, and capabilities/lackthereof of each and report these findings in a comparative format to aid the reader in identifying the correct program for a particular analysis to assist with program selection tailored to answer the research questions posited by the reader.  All three programs have been cited thousands of times with tens-of-thousands of reads on their various original and follow-up articles.  I will not attempt to proxy accuracy, popularity, or quality of the programs by total citations or read metrics between programs since they have been in existence for varying lengths of time and their stewards publish update articles at different intervals between the programs thus confounding this potential comparative strategy.

These programs possess the capabilities to undertake many protein-function related analysis such as synonymous/non-synonymous mutation identification to a known GOI sequence, binding pocket prediction, and biological function prediction, just to name a few, in addition to predicting and generating a 3D protein structure.  I report the capabilities of each function within each program to assist the reader with program selection based on your investigative needs.

 

Quick Hits:

  • Only I-TASSER and Phyre 2 can create 2D or 3D protein structures/visualizations.
  • PredictProtein is the easiest to use for a beginner or a quick overview of your protein, visually beautiful, but makes data traceability, detailed-data recording, and interpreting output data long-term difficult.
  • I-TASSER has been previously noted as the most accurate homology-based 3D protein modeling program based on its engine and computations, but some functions which are available in one of the other two programs may not be available in I-TASSER.
  • Each program possess its own unique suite of functions which make outright comparisons and recommendations hard – it is best for the reader to review the comparison matrix and make the decision for yourself which program(s) best suites your needs.

 

Overall Impressions (Discussion):

  • Batch processing is only possible with Phyre 2.
  • Only I-TASSER and Phyre 2 produce 3D protein prediction models.
  • PredictProtein results last indefinitely while I-TASSER and Phyre 2 last 30 days.
  • I-TASSER produces great output data that can be saved as a .pdf but produces few actual download-able files; conversely, Phyre 2 and PredictProtein produce many downloadable files.  Keep this in mind depending on data preservation preferences.
  • PredictProtein’s downloadable files are difficult to interpret and glean information from once they are disassociated from the graphically interactive results webpage which results are reported in.
  • B-Value, an important measure of protein flexibility in a structure, is only reported in I-TASSER and PredictProtein; it is unclear if Phyre 2 accounts for this important statistic in its modeling function and just does not report it or if it does not take this into account when modeling.
  • Prediction accuracy measures may or may not be reported depending upon the function in question in each of the programs.
  • Categorically:
    • I-TASSER and Phyre 2 possess the best ‘General Quality of Life Items’ (download ease, data traceability, etc.).
    • Phyre 2 possesses more functionality for the category of ‘Protein Modeling’, however, I-TASSER is still reputed in the field as the most accurate program for homology-based 3D protein modeling.
    • If the reader wishes to conduct ‘Sequence Investigation’ (BLASTs, nt conservation reporting to a known sequence, etc.) in parallel with the protein analysis, PredictProtein or Phyre 2 are recommended.
    • In the category of ‘Structural and Activity Investigation’, each program possesses unique functions lacked by the other programs, therefore, I cannot recommend one program over another in this category.
    • I recommend PredictProtein as the best program for ‘Biological Function Investigation’ as it reports a subcellular localization prediction while the other two programs do not.  Please note that there is some variability if functionality between programs again within this category.
  • PredictProtein reports binding affinities for DNA, RNA, and protein while I-TASSER reports ‘Ligand Binding’ and ‘Enzyme Commission (Active Site)’ sites.  Phyre 2 detects pockets, which is a related but different topic, but still very important to activity, etc..  I recommend combining Phyre 2’s pocket binding prediction with either I-TASSER’s or PredictProtein’s site detection.
  • While PredictProtein data traceability and detailed data recovery may be challenging, it is a great program for gaining a sound overall picture of the protein of interest prior to submitting to either I-TASSER or Phyre 2 for a more nuanced, deeper analysis with 3D protein modeling.

 

Final Programmatic Recommendation Ranking:

  1. I-TASSER    (with Phyre 2 a close second)
  2. Phyre 2
  3. PredictProtein

 

Important Notes to Remember:

  • Results are cleared from each program’s logs 30-days after the query completes (besides PredictProtein).
  • All results are just projections of the true protein structure, sites, activities, etc..  All of the results are simply theoretical but such programs typically offer high accuracy modeling for most proteins based on the structure of other known proteins.  If your protein is very unique or very large the odds of correctly modeling your protein are decreased, but that is not to say that a lot cannot still be learned from modeling.
  • In silico protein modeling employing these programs is cheap and fast!  It’s free and can be completed in minutes or maybe at the worst days!  Creating a crystal structure to determine structure of a protein of interest can take months and is far from free.
  • If you do not want your protein to become part of the public domain, be sure to deselect or select the appropriate option when submitting a job in any of the programs.

Results and Comparison Matrix:

 

Table 1. -
   I-TASSER Phyre 2 PredictProtein
General ItemsQuality Best Good Good
Time 2 days to 2+ weeks 2.5h 20min
User Interface Basic but Easy High Quality and Easy Beautiful and Easy
Report Access Period 30 days 30 days (+30 upon request) Indefinitely
Batch Processing No Yes No
File Download In Native Format There’s really nothing to download using this platform besides the protein models Yes Yes
File Download in text, excel, or human readable format See above Yes No
Save webpage as a Quality pdf Yes Yes Yes but the page is really interactive and graphically intense so saving a non-interactable pdf isn’t all that useful
 
Protein ModelingProtein Modeling Yes, 2D & 3D Yes, 2D & 3D No, 1D sequence only analysis
Protein Model Confidence and Coverage Yes and Yes Yes and Yes; Also provides a ProQ2 quality score and an alignment quality score upon request; graphical, model, and print reporting hybrid N/A
Atom Clash Reporting No Yes, upon request; graphical, model, and print reporting hybrid N/A
Rotamer Reporting No Yes, upon request; graphical, model, and print reporting hybrid N/A
Ramachandran Analysis a  No Yes, upon request; graphical, model, and print reporting hybrid N/A
Number of Models Reported Top 5 Top 30 N/A
Models are Downloadable Yes Yes N/A
Allows Editing of Protein Image/Video Environment Yes No N/A
Reports the Templates Used in the Modeling Yes Yes N/A
Image Exporting Made Easy? Yes Yes Yes
 
Sequence InvestigationConducts an AABLAST of the Protein No Yes Yes
Reports Top AABLAST Alignments N/A Yes, print and fasta file reporting Yes, graphical interactive reporting
Conservation Reporting b  No Yes, upon request; graphical, model, and print reporting hybrid 

Yes, graphical interactive reporting

(ConSeq)

AA Composition Breakdown (and piechart) No No Yes, yes graphical output
          
Structural and Activity InvestigationSecondary Structure Reporting Yes, print reporting Yes, graphical and print reporting hybrid 

Yes, graphical interactive reporting

(RePROF)

Secondary Structure Confidence Reporting Yes, print reporting Yes, graphical and print reporting hybrid No
B-Value c  Yes, graphical and print reporting hybrid No, but see below. 

Yes, graphical interactive reporting

(PROFbval)

Disordered Region Reporting d No Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request 

Yes, graphical interactive reporting

(Meta-disorder)

Disordered Region Confidence Reporting N/A Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request No
Accessibility Reporting e Yes, print reporting No 

Yes, graphical interactive reporting

(RePROF)

Transmembrane Domain (Topology) f No Yes, a descriptive illustration Yes, graphical interactive reporting
Protein, DNA, and RNA Binding Site Reporting See Below Not exactly, see Pocket Detection below Yes; all reported separately.  Does not take into account 3D model and context of AA in space relative to one another as it does not make a 3D model.
Ligand Binding Site Reporting Yes, predicted by taking into account the 3D model and context of AA in space relative to one another.  Print reporting with visualizations Not exactly, see Pocket Detection below See Above
Enzyme Commission (Active Site) Prediction Yes, predicted by taking into account the 3D model and context of AA in space relative to one another.  Print reporting with visualizations Not exactly, see Pocket Detection below No
Pocket Detection Reporting g No Yes, upon request; only large pockets reported (ie – pockets that are strong candidates to contain an active site(s)); graphical, model, and print reporting hybrid No
Disulfide Bond Reporting No No 

Yes, graphical interactive reporting

(DISULFIND)

Effect of a Theoretical Point Mutation’s (at each AA) Impact on the Overall Protein Structure No Yes, upon request; graphical, model, and print reporting hybrid Yes, heat map graphical output and a raw data export file but it seems very unwieldy
 
Biological Function InvestigationGO Analysis Built In Yes Not exactly, but it does report in detail suspected biological functions for each protein model generated Yes, interactive with print reporting
CD Search Built In No Yes No
Predicted Function Reported Kind of, as part of the GO analysis you can click on the various terms to give you an idea of biological functions Yes, for each protein model As part of the GO analysis yes, but only for the topmost handful of hits on the sequence provided
Subcellular Localization Prediction No No Yes, graphical output

 

Definitions and Notes:

 a = Rates the likelihood of bonds allowing such bends between such atoms

 = It conducts an AABLAST in the background to determine this

 = A measure of flexibility of a stretch of AA; more vibration leads to higher ambiguity when modeling a protein structure as when it moves around more the image capturing device (when making an actual protein crystal) is less certain of where it truly is or is at most often.  Generally speaking, more flexible regions are typically analogous to catalytic and sensing sites whereas rigid regions tend to relate to sites that cannot withstand movement and still complete their task, such as an active site.

= This is essentially the same outcome as B-Value but what this is actually doing predicting the amount of disorder specifically a crystal structure may experience, not how much flexibility a stretch of AA possesses innately (although knowledge of such is necessary to determine this as well).

= Based on the best model

= This topology is indicative of a transmembrane protein

  = Large pockets are often locations of active sites

 Written by TPS graduate student Cullen Dixon

News Filters: