Origin and Orientation

Dec 11, 2009

A tragic week in the crystallographic community (see: 449 Citations maybe Effected by Retracted Structures). The Birmingham News article mentions researchers finding a preponderance of evidence that the structures were incorrect. I do not have any direct proof that the crystallographic data were falsified or fabricated, but let’s take a walk.

How could only one person publish these structures without others knowing?

A lab produces crystals, collects data, but unfortunately is unable to process the data. The grad student is frustrated, the post-doc can’t figure it out and so the data is handed to the PI. The PI works on the data set in their office (over the weekend, at home, etc…) and emerges successful! The paper is written and only one person knows exactly how the structure was solved.

How could the data have been fabricated?

Let us take a look at the one structure that has already been removed from PDB: 1BEF.

The data could have been back generated from a desired protein structure using a tool like mlfsom.
However, I believe there is a better explanation.

Another method to fake the data would be to perform an isomorphous replacement using a related protein. A number of residues would be different and with some help from a homology server you could tweek the structure.

The problem is that reviewers could be experts on those structures and would notice small anomalies. In addition, the protein would need to fold ‘perfectly’ in order to be used for isomorphous replacement so that proper crystal packing is maintained.

To avoid this scenario you could find a structure that is crystallographically unrelated (different space group and unit cell) to the protein of interest and use it as a template.

In order for this hypothesis to be supported, we would need to find the unrelated structure in the PDB.
Needle in a hay stack type of problem.

To save you some time, we are going to tell you the structure used: 1NS3
The figure at right shows 1BEF in light green and 1NS3 in aqua green:
1bef 1ns3 Origin and Orientation
Here is the PyMol script for those that like to play along at home:
fetch 1bef
fetch 1ns3
select 1ns3_A, 1ns3 and chain A+C
hide everything
show ribbon, 1bef
show ribbon, 1ns3_A

The following is the crystal information from the PDB headers:
1BEF:
CRYST1 48.800 62.400 39.600 90.00 96.70 90.00 P 1 21 1

1NS3:
CRYST1 96.960 96.960 167.100 90.00 90.00 120.00 P 63 2 2

1NS3 was used as the starting model and with the addition of some water and noise, bingo.

The unit cell and space groups are totally different and yet the two structures have nearly an identical origin and orientation.

What are the chances of two crystallographically unrelated structures having the same origin and orientation?
Zero.

The structures still don’t look close enough for your liking? Take 1BEF and put it into a homology server like MODELLER then compare.

Follow up: Covering your Tracks

    Related Posts:

    9 Awesome Insights so far | Have Your Say!

    1. Eric
      December 11th, 2009 at 5:50 PM #

      Your explanation seems to disagree with the comment from Janssen, Read, Brunger, and Gros.

    2. Sean
      December 11th, 2009 at 6:19 PM #

      I would love to hear another explanation. Could you link to the comments that you are referring too?

    3. Eric
      December 11th, 2009 at 9:54 PM #

      It’s the correspondence to the Nature article.

      http://bit.ly/4Um0KZ

    4. Sean
      December 11th, 2009 at 11:55 PM #

      Hi Eric,

      I don’t have access to Scopus, but found the article that includes the authors you previously mentioned.

      My focus in this post is on the PDB entry 1BEF not 2HR0, which the correspondence addresses.

      The method that is proposed here of fitting a certain structure (or part of one) then placing that structure in a different space group could result in 30-40 Angstrom slabs that were mentioned in the correspondence.

      The slabs of missing density would appear as a result of falsifying the space group. This would be the challenging part of fabricating the data – generating reasonable crystal contacts and packing that fit your desired space group.

      The correspondence is also taking a different perspective on the issue. They are saying that we processed the data and have noted inconsistent features. This post is saying coming from the other side – here is the initial structure that they used to fabricate the data.

    5. Eric
      December 12th, 2009 at 10:12 AM #

      “I don’t have access to Scopus, but found the article that includes the authors you previously mentioned.”

      I figured your university would give you access to Scopus, as mine does. Sorry.

      “My focus in this post is on the PDB entry 1BEF not 2HR0, which the correspondence addresses.”

      I realize that. I referred to 2HR0 for two reasons (which I was too hurried to mention).

      1) 2HR0 seems to be the model that got Murthy caught.
      2) I figure he used similar methods on any falsified models.

      “The correspondence is also taking a different perspective on the issue. They are saying that we processed the data and have noted inconsistent features. This post is saying coming from the other side – here is the initial structure that they used to fabricate the data.”

      As I read the correspondence, there seems to be a fair bit of implied “This is how he did it” (c.f. comments about protein in a vacuum). The last paragraph throws down the gauntlet by demanding to see the original diffraction data. Out of curiosity, does your proposed method account for the other anomalies mentioned by Janssen, et al.? Does it account for the missing bulk solvent and the strange B-factors and R-factors (see correspondence figures)? Lastly, have ideas what the template for 2HR0 might have been?

      Anyhow, thanks for humoring my ramblings. I’m an intelligent systems guy, not a structural biologist (though I work with one). I’ve picked up a lot, but there’s still a lot I don’t fully understand. ;)

    6. Pinko Punko
      December 12th, 2009 at 10:58 AM #

      I see no reason to assume that Krishna Murthy would use the same approach every time. I hypothesize that there will a few avenues among his different falsifications.

    7. Artem
      December 13th, 2009 at 8:19 PM #

      ‘Plausible packing’ can be generated using molecular replacement programs that have packing function criteria. Give them a bogus dataset and relax solution criteria – and presto, you can get a ‘properly packed’ solution that can be further tweaked manually.

      Thing is though – why on earth would the fakers leave out 40A of empty space when they could have easily filled it in with something…

    8. Sean
      December 15th, 2009 at 12:11 AM #

      @Eric I believe that this idea accounts for all the anomalies seen in the structure. I have yet to find a template for the 2HR0 structure and as Pinko Punko notes there may not be one.

      @Artem I like you style. If you are going to fabricate data at least do a good job of it :)

    9. Pinko Punko
      December 21st, 2009 at 10:16 AM #

      I think the hardest things to fake would be plausible crystal contacts (not that I really know anything here). It does seem that discussions of the faked work have all mentioned that the structures tend to lack strong crystal contacts.

    Leave a Feedback

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>