PDB Structure Quality Tools

May 31, 2009

PDB Structure Quality is a website that separates variables that are often correlated during crystallographic data processing and refinement. The paper entitled, Quality of protein crystal structures, is quite good and brings to light a couple of interesting points 1) the higher the impact factor of the journal the lower the quality of structures and 2) PDB quality has not significantly changed over time. If a macromolecular structure at given resolution was deposited in 1997, it is comparable to one published in 2007.

The database is current as of Nov 17, 2008, so the most recent structures will not be present.

As a heads up, the paper and website are not the same (as stated on their website):
Note: The number of independent and dependent variables has been expanded in this analysis compared to the published literature. Independent variables describing the crystal have been mined from the available structures. Those variables that were missing have been multiply imputed.

The site also has a PDB Structure Quality prediction tool to calculate the R-factor, R-free, the occupancy-weighted B-value, and the ramachandran violation percentage. As stated on their site, “This task allows a crystallographer to determine what validation metrics they should obtain at the end of refinement. If the refined validation metrics are not as good as those predicted, then further model building and refinement may be warranted.” Sweet.

Related Posts:

Share with others

5 Awesome Insights so far | Have Your Say!

  1. Eric Brown
    June 1st, 2009 at 2:30 PM #

    Thanks for the mention. I’d like to point out, though, that we do only show that the journals Nature, Science, Cell, and Molecular Cell publish structures that are worse that you’d expect, not that all high impact journals publish poor structures. In fact, there are some high impact journals such as JBC, JMB, Proteins, and Biochemistry that publish better than expected structures.

    BTW, we are working on updating the latest statistics for the latest structures.

  2. Sean
    June 1st, 2009 at 4:44 PM #

    Personally, when I think of high impact Nature, Science and Cell come to mind, but note taken that higher impact does not necessarily mean worse structures.

    Also thanks for all the work in putting the site together! I really appreciate tools that have such broad appeal to the crystallographic community. I will be looking forward to the addition of the latest structures.

    Do you have any plans to publish a paper giving further details on the expanded independent and dependent variables?

  3. Oliver Clarke
    June 2nd, 2009 at 6:10 PM #

    While I think this is a great tool for assessing overall quality trends in groups of structures, I think it can be a little misleading for individual cases if one only looks at the Q-factor. Partly I think this is because as a single metric, individually very worrying statistics can be missed if all the other stats are better than average.

    Additionally, the Q-factor does not (as far as I can see) incorporate any assessment of the gap between R and Rfree, which can give an in indication of either overfitting (if R and Rfree are too far apart) or a biased validation set (if they are too close)

    Eg, as an example have a look at 1adv. This is a 3.2A structure, with an R/Rfree of 21/34. A gap of 13 points between the R and Rfree indicates clear overfitting, but because R and Rfree apparently are assessed independently wrt the Q-factor (the lower the better), the structure isn’t penalised for this.

    Further, this structure has an average B-factor of 17 (2-10 in the core of the protein), which is clearly a bit odd for a 3.2A structure, but there is no penalty for having a lower B-factor than expected, so the overall Q-factor is excellent – -0.79!

    So overall, I’d say it’s an excellent tool to get a quick summary of the stats, but I’d still want to have a look at the model and map to get a real idea of the quality of any particular structure.

  4. Eric Brown
    June 4th, 2009 at 2:22 PM #

    Sean: I’m working on a follow-up paper discussing the improvements that have been made in the past year. But I don’t have any money to work on this so my day-job is taking priority right now.

    Oliver: I agree that a single metric cannot express everything we want to know about a structure’s quality. I hope to expand the site to show how the different “orthogonal” features that individually make up the q-value vary — thus letting a user see that their structure might be decent on a global scale (R-free/R-factor) but still have some local error (ex. Ramachandran problems).

    The process that I’m using now makes it easy to add additional “quality” measures. So I’ll try adding a R-factor / R-free gap and see how those relate. I’m not sure how the gap will correlate with the R-free or R-factor but one benefit of the method is that it will look for correlations and work with them.

  5. Sean
    June 4th, 2009 at 8:16 PM #

    Eric: When I was getting together the ‘Do we need an X-ray Diffraction Image Data Bank?’ post ready I came across the paper entitled Case-controlled structure validation in Acta D65, 2009, 140-147. I was amazed at how many times your work was mentioned.

    Also @Oliver and Eric – interesting discussion on what is the proper way to treat the R-factor/R-free gap.

Leave a Feedback

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>