Ohio State nav bar

Selecting the Right Protein Modeling Program: A comparison of I-TASSER, Phyre ^2, and PredictProtein.

November 22, 2021

Selecting the Right Protein Modeling Program: A comparison of I-TASSER, Phyre ^2, and PredictProtein.

Translational Plant Sciences Graduate Program Blog

Non-Technical Introduction:

Proteins are the building block of life and the functional downstream product of a gene.  They are an integral part of the Central Dogma of Molecular Biology (DNA>RNA>Protein) and carry out a vast array of functions that keep cells, and thereby organisms, alive each day from acting as defense compounds, transcriptional regulators, and structural components of cells.  Proteins can function alone, interact with one another to form associations and larger biological ‘machines’, or combine together create massive structures like bones, the heart, or a plant seed – they are all made of proteins.  Seeing as they are ubiquitous throughout all of life and millions are found in each cell, understanding the minute details of proteins is a very important field of study for many scientists in understanding the genes, proteins, or even organs they study.  Here I offer a comparison of 3 programs that help researchers resolve questions of protein structure, and characteristics related to structure, as structure is a large factor in protein function or lackthereof.

 

Technical Introduction:

In this review brief I will review I-TASSER, Phyre 2, and PredictProtein, arguably the three most widely employed and accurate protein modeling programs implicated across the molecular biology literature space within the timespan of 2016-2021.  I have identified strengths, weaknesses, and capabilities/lackthereof of each and report these findings in a comparative format to aid the reader in identifying the correct program for a particular analysis to assist with program selection tailored to answer the research questions posited by the reader.  All three programs have been cited thousands of times with tens-of-thousands of reads on their various original and follow-up articles.  I will not attempt to proxy accuracy, popularity, or quality of the programs by total citations or read metrics between programs since they have been in existence for varying lengths of time and their stewards publish update articles at different intervals between the programs thus confounding this potential comparative strategy.

These programs possess the capabilities to undertake many protein-function related analysis such as synonymous/non-synonymous mutation identification to a known GOI sequence, binding pocket prediction, and biological function prediction, just to name a few, in addition to predicting and generating a 3D protein structure.  I report the capabilities of each function within each program to assist the reader with program selection based on your investigative needs.

 

Quick Hits:

  • Only I-TASSER and Phyre 2 can create 2D or 3D protein structures/visualizations.
  • PredictProtein is the easiest to use for a beginner or a quick overview of your protein, visually beautiful, but makes data traceability, detailed-data recording, and interpreting output data long-term difficult.
  • I-TASSER has been previously noted as the most accurate homology-based 3D protein modeling program based on its engine and computations, but some functions which are available in one of the other two programs may not be available in I-TASSER.
  • Each program possess its own unique suite of functions which make outright comparisons and recommendations hard – it is best for the reader to review the comparison matrix and make the decision for yourself which program(s) best suites your needs.

 

Overall Impressions (Discussion):

  • Batch processing is only possible with Phyre 2.
  • Only I-TASSER and Phyre 2 produce 3D protein prediction models.
  • PredictProtein results last indefinitely while I-TASSER and Phyre 2 last 30 days.
  • I-TASSER produces great output data that can be saved as a .pdf but produces few actual download-able files; conversely, Phyre 2 and PredictProtein produce many downloadable files.  Keep this in mind depending on data preservation preferences.
  • PredictProtein’s downloadable files are difficult to interpret and glean information from once they are disassociated from the graphically interactive results webpage which results are reported in.
  • B-Value, an important measure of protein flexibility in a structure, is only reported in I-TASSER and PredictProtein; it is unclear if Phyre 2 accounts for this important statistic in its modeling function and just does not report it or if it does not take this into account when modeling.
  • Prediction accuracy measures may or may not be reported depending upon the function in question in each of the programs.
  • Categorically:
    • I-TASSER and Phyre 2 possess the best ‘General Quality of Life Items’ (download ease, data traceability, etc.).
    • Phyre 2 possesses more functionality for the category of ‘Protein Modeling’, however, I-TASSER is still reputed in the field as the most accurate program for homology-based 3D protein modeling.
    • If the reader wishes to conduct ‘Sequence Investigation’ (BLASTs, nt conservation reporting to a known sequence, etc.) in parallel with the protein analysis, PredictProtein or Phyre 2 are recommended.
    • In the category of ‘Structural and Activity Investigation’, each program possesses unique functions lacked by the other programs, therefore, I cannot recommend one program over another in this category.
    • I recommend PredictProtein as the best program for ‘Biological Function Investigation’ as it reports a subcellular localization prediction while the other two programs do not.  Please note that there is some variability if functionality between programs again within this category.
  • PredictProtein reports binding affinities for DNA, RNA, and protein while I-TASSER reports ‘Ligand Binding’ and ‘Enzyme Commission (Active Site)’ sites.  Phyre 2 detects pockets, which is a related but different topic, but still very important to activity, etc..  I recommend combining Phyre 2’s pocket binding prediction with either I-TASSER’s or PredictProtein’s site detection.
  • While PredictProtein data traceability and detailed data recovery may be challenging, it is a great program for gaining a sound overall picture of the protein of interest prior to submitting to either I-TASSER or Phyre 2 for a more nuanced, deeper analysis with 3D protein modeling.

 

Final Programmatic Recommendation Ranking:

  1. I-TASSER    (with Phyre 2 a close second)
  2. Phyre 2
  3. PredictProtein

 

Important Notes to Remember:

  • Results are cleared from each program’s logs 30-days after the query completes (besides PredictProtein).
  • All results are just projections of the true protein structure, sites, activities, etc..  All of the results are simply theoretical but such programs typically offer high accuracy modeling for most proteins based on the structure of other known proteins.  If your protein is very unique or very large the odds of correctly modeling your protein are decreased, but that is not to say that a lot cannot still be learned from modeling.
  • In silico protein modeling employing these programs is cheap and fast!  It’s free and can be completed in minutes or maybe at the worst days!  Creating a crystal structure to determine structure of a protein of interest can take months and is far from free.
  • If you do not want your protein to become part of the public domain, be sure to deselect or select the appropriate option when submitting a job in any of the programs.

Results and Comparison Matrix:

 

Table 1. -

 

   

 

 

I-TASSER

 

Phyre 2

 

PredictProtein

General Items

   

Quality

 

Best

 

Good

 

Good

Time

 

2 days to 2+ weeks

 

2.5h

 

20min

User Interface

 

Basic but Easy

 

High Quality and Easy

 

Beautiful and Easy

Report Access Period

 

30 days

 

30 days (+30 upon request)

 

Indefinitely

Batch Processing

 

No

 

Yes

 

No

File Download In Native Format

 

There’s really nothing to download using this platform besides the protein models

 

Yes

 

Yes

File Download in text, excel, or human readable format

 

See above

 

Yes

 

No

Save webpage as a Quality pdf

 

Yes

 

Yes

 

Yes but the page is really interactive and graphically intense so saving a non-interactable pdf isn’t all that useful

 

   

 

 

 

 

 

 

 

Protein Modeling

   

Protein Modeling

 

Yes, 2D & 3D

 

Yes, 2D & 3D

 

No, 1D sequence only analysis

Protein Model Confidence and Coverage

 

Yes and Yes

 

Yes and Yes; Also provides a ProQ2 quality score and an alignment quality score upon request; graphical, model, and print reporting hybrid

 

N/A

Atom Clash Reporting

 

No

 

Yes, upon request; graphical, model, and print reporting hybrid

 

N/A

Rotamer Reporting

 

No

 

Yes, upon request; graphical, model, and print reporting hybrid

 

N/A

Ramachandran Analysis a 

 

No

 

Yes, upon request; graphical, model, and print reporting hybrid

 

N/A

Number of Models Reported

 

Top 5

 

Top 30

 

N/A

Models are Downloadable

 

Yes

 

Yes

 

N/A

Allows Editing of Protein Image/Video Environment

 

Yes

 

No

 

N/A

Reports the Templates Used in the Modeling

 

Yes

 

Yes

 

N/A

Image Exporting Made Easy?

 

Yes

 

Yes

 

Yes

 

   

 

 

 

 

 

 

 

Sequence Investigation

   

Conducts an AABLAST of the Protein

 

No

 

Yes

 

Yes

Reports Top AABLAST Alignments

 

N/A

 

Yes, print and fasta file reporting

 

Yes, graphical interactive reporting

Conservation Reporting b 

 

No

 

Yes, upon request; graphical, model, and print reporting hybrid

 

Yes, graphical interactive reporting

(ConSeq)

AA Composition Breakdown (and piechart)

 

No

 

No

 

Yes, yes graphical output

 

   

 

 

 

 

 

 

 

Structural and Activity Investigation

   

Secondary Structure Reporting

 

Yes, print reporting

 

Yes, graphical and print reporting hybrid

 

Yes, graphical interactive reporting

(RePROF)

Secondary Structure Confidence Reporting

 

Yes, print reporting

 

Yes, graphical and print reporting hybrid

 

No

B-Value c 

 

Yes, graphical and print reporting hybrid

 

No, but see below.

 

Yes, graphical interactive reporting

(PROFbval)

Disordered Region Reporting d

 

No

 

Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request

 

Yes, graphical interactive reporting

(Meta-disorder)

Disordered Region Confidence Reporting

 

N/A

 

Yes, graphical and print reporting hybrid with even more in depth graphical, model, and print reporting hybrid upon request

 

No

Accessibility Reporting e

 

Yes, print reporting

 

No

 

Yes, graphical interactive reporting

(RePROF)

Transmembrane Domain (Topology) f

 

No

 

Yes, a descriptive illustration

 

Yes, graphical interactive reporting

Protein, DNA, and RNA Binding Site Reporting

 

See Below

 

Not exactly, see Pocket Detection below

 

Yes; all reported separately.  Does not take into account 3D model and context of AA in space relative to one another as it does not make a 3D model.

Ligand Binding Site Reporting

 

Yes, predicted by taking into account the 3D model and context of AA in space relative to one another.  Print reporting with visualizations

 

Not exactly, see Pocket Detection below

 

See Above

Enzyme Commission (Active Site) Prediction

 

Yes, predicted by taking into account the 3D model and context of AA in space relative to one another.  Print reporting with visualizations

 

Not exactly, see Pocket Detection below

 

No

Pocket Detection Reporting g

 

No

 

Yes, upon request; only large pockets reported (ie – pockets that are strong candidates to contain an active site(s)); graphical, model, and print reporting hybrid

 

No

Disulfide Bond Reporting

 

No

 

No

 

Yes, graphical interactive reporting

(DISULFIND)

Effect of a Theoretical Point Mutation’s (at each AA) Impact on the Overall Protein Structure

 

No

 

Yes, upon request; graphical, model, and print reporting hybrid

 

Yes, heat map graphical output and a raw data export file but it seems very unwieldy

 

   

 

 

 

 

 

 

 

Biological Function Investigation

   

GO Analysis Built In

 

Yes

 

Not exactly, but it does report in detail suspected biological functions for each protein model generated

 

Yes, interactive with print reporting

CD Search Built In

 

No

 

Yes

 

No

Predicted Function Reported

 

Kind of, as part of the GO analysis you can click on the various terms to give you an idea of biological functions

 

Yes, for each protein model

 

As part of the GO analysis yes, but only for the topmost handful of hits on the sequence provided

Subcellular Localization Prediction

 

No

 

No

 

Yes, graphical output

 

Definitions and Notes:

 a = Rates the likelihood of bonds allowing such bends between such atoms

  = It conducts an AABLAST in the background to determine this

  = A measure of flexibility of a stretch of AA; more vibration leads to higher ambiguity when modeling a protein structure as when it moves around more the image capturing device (when making an actual protein crystal) is less certain of where it truly is or is at most often.  Generally speaking, more flexible regions are typically analogous to catalytic and sensing sites whereas rigid regions tend to relate to sites that cannot withstand movement and still complete their task, such as an active site.

 = This is essentially the same outcome as B-Value but what this is actually doing predicting the amount of disorder specifically a crystal structure may experience, not how much flexibility a stretch of AA possesses innately (although knowledge of such is necessary to determine this as well).

 = Based on the best model

 = This topology is indicative of a transmembrane protein

  = Large pockets are often locations of active sites

 Written by TPS graduate student Cullen Dixon

News Filters: