A tragic week in the crystallographic community (see: 449 Citations maybe Effected by Retracted Structures). The Birmingham News article mentions researchers finding a preponderance of evidence that the structures were incorrect. I do not have any direct proof that the crystallographic data were falsified or fabricated, but let’s take a walk.
How could only one person publish these structures without others knowing?
A lab produces crystals, collects data, but unfortunately is unable to process the data. The grad student is frustrated, the post-doc can’t figure it out and so the data is handed to the PI. The PI works on the data set in their office (over the weekend, at home, etc…) and emerges successful! The paper is written and only one person knows exactly how the structure was solved.
How could the data have been fabricated?
Let us take a look at the one structure that has already been removed from PDB: 1BEF.
The data could have been back generated from a desired protein structure using a tool like mlfsom.
However, I believe there is a better explanation.
Another method to fake the data would be to perform an isomorphous replacement using a related protein. A number of residues would be different and with some help from a homology server you could tweek the structure.
The problem is that reviewers could be experts on those structures and would notice small anomalies. In addition, the protein would need to fold ‘perfectly’ in order to be used for isomorphous replacement so that proper crystal packing is maintained.
To avoid this scenario you could find a structure that is crystallographically unrelated (different space group and unit cell) to the protein of interest and use it as a template.
In order for this hypothesis to be supported, we would need to find the unrelated structure in the PDB.
Needle in a hay stack type of problem.
To save you some time, we are going to tell you the structure used: 1NS3
The figure at right shows 1BEF in light green and 1NS3 in aqua green:
Here is the PyMol script for those that like to play along at home:
select 1ns3_A, 1ns3 and chain A+C
show ribbon, 1bef
show ribbon, 1ns3_A
The following is the crystal information from the PDB headers:
CRYST1 48.800 62.400 39.600 90.00 96.70 90.00 P 1 21 1
CRYST1 96.960 96.960 167.100 90.00 90.00 120.00 P 63 2 2
1NS3 was used as the starting model and with the addition of some water and noise, bingo.
The unit cell and space groups are totally different and yet the two structures have nearly an identical origin and orientation.
What are the chances of two crystallographically unrelated structures having the same origin and orientation?
The structures still don’t look close enough for your liking? Take 1BEF and put it into a homology server like MODELLER then compare.
Follow up: Covering your Tracks
The Birmingham News just reported that former researcher, H.M. Krishna Murthy, may have falsified or fabricated data. The Journal of Biological Chemistry has already retracted the paper in question, which contains PDB entry 1BEF.
If other journals follow suite the impact will be significant. According to Google Scholar, a total of 449 cite the papers in which these structures appear.
The University of Alabama at Birmingham announced that 12 structures were falsified or fabricated. The 12 questionable structures that have been deposited into the PDB are as follows: 1BEF, 1CMW, 1DF9, 2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0.
The publications involve a wide range of topics including: dengue viruses, serine proteases, Taq DNA polymerase, heparan sulfate proteoglycans, apolipoprotein A-II and A-I, suramin in heparin binding, and complement component 3.
The following table contains links to the structures, pdf of the journal articles and citations. The table is worth exploring if you believe you may draw conclusions based on these structures.
The process in macromolecular crystallography for generating heavy atom derivatives can be tedious. Problems may arise from heavy atoms not being incorporated into your protein to difficulty in producing crystals for derivatization trials therefore making each attempt critical.
The Heavy-Atom Database System: HATODAS II has been created to address these problems. The database uses 93 known heavy atom binding motifs (derived from 3103 heavy atom binding sites) and can take into account the amino acid sequence as well as the crystallization condition (ref).
Here is an example of a prediction that HASTODAS generates for potential heavy-atom reagents:
The following is a list of the suggested motifs that are present in the submitted sequence:
If your protein does not contain a His, Cys or Met then you maybe forced to mutate a residue for derivatization, but which one do you choose? HASTODAS addresses this question by suggesting a point mutation(s) based on multiple sequence alignments of homologous proteins.
Points for creating a database with guts.
I have used PyMol for years and have done numerous posts related to it. However, that may soon come to an end. The reason being that Discovery Studio Visualizer 2.5 (DSV) is just better. I am still getting used to the program, but wanted to share with you three ways to ’select a region’ to get you pumped up!
1) Highlight amino acid sequence
You can display the amino acid sequence by Sequence -> Show Sequence
When you highlight on the sequence – the structure is also highlighted.
The structure can be viewed on the other tab (I really like having tabs instead of separate windows):
Simply draw over the region you would like to be selected.
An atom can simply be selected with the pointer and with a right click, pertinent information will be displayed. In this example, I have double clicked which highlights the residue.
Four clicks will select a chain.
Six will select the entire structure.
Once a selection is made you can then easily make adjustments using the display tab (I previously mentioned this here).
DSV has a ligand script so that can VERY quickly generate the following figure
Scripts -> Ligand Interactions -> Show Ligand Interactions with Atoms:
A couple of changes and you can have a publication quality figure in no time.
I have a good feeling about how cancer changes lives. My mother, aunt, uncle, grandmother and both grandfathers have had cancer.
The cost of cancer research is mind blowing, at least billions.
I have complete respect for the people that are raising money for cancer research (Bill and Melinda Gates Foundation, Lance Armstrong Foundation, Susan G. Komen, @drewfromtv). The people and organizations that are willing to fund cancer research are critical, but money alone does not fix the problem, people do. Scientists do.
I am missing a hero in my life. This hero is dedicating their life toward curing cancer.
I mean really working on it. Not some buzz word on an intro slide.
Who is your favorite scientist involved in cancer research? Who is the person that will fill Madison Square Garden with a science talk?
TLS stands for Translation Libration Screw-motion (the dash makes it acronym-ically fine) which is a method of refinement in the program REFMAC5 within the CCP4 suite or in phenix.refine. According to developer, Martyn Winn, TLS refinement can be at almost any resolution.
Why should I use it?
The benefits of using TLS refinement is that it can reduce your Rfree and Rwork values. The implication being that the produced structural model will be a better representation of the collected data.
How Does it Work?
TLS refines ’sequence groups’ that are described using 20 parameters per each group.
How Do I Determine the Groups?
The TLS Motion Determination (TLSMD) is a server that allows for the submission of your amino acid sequence and recommends how to segment your sequence (ref). A number of different TLS groups are possible for the same sequence (ref).
How do I actually do this?
1) Do a rigid body refinement followed by ~10 rounds of restrained refinement
2) Take this output and submit it to the TLSMD
3) Take the segments that are produced by the TLSMD and fix the B factors to 40 (ref)
Note: the B factor was set to 20 in the literature reference
4) REFMAC needs the following inputs: REFI TLSC 20, TLSIN, BFAC SET 40 (more details and here)
5) Perform TLS refinement
6) Perform restrained refinement followed by the addition of ligands, ions and solvent
How to do you know if TLS helped?
A decrease in the Rfree value as well as an improvement in the electron density maps.
I have done my best to condense about 100 pages of websites, presentations and literature into 250 words. Please let me know what I need to change/add/remove to make this post more helpful, thanks!
Dear Protein Data Bank,
It’s not you, it’s me.
We’ve been inseparable for what seems like forever, we have been through a lot. Unfortunately, I don’t think that our relationship is going to work out.
I’ve done my best to be patient and even offered suggestions on how we could make things better. I know that you have been improving and even updated your site. I just feel that I need to be better connected to other resources.
Maybe I’m giving up on you too soon.
I’ll miss you,
P.S. I thought you should know that I’ve been seeing PDBsum lately.
I thought the enzyme catalytic mechanisms (ECM) database would be a nice follow up from yesterday.
The amazing part about this database is not the number of entries (720), but in the details. The ECM has devised a classification of enzymatic reactions which are as follows:
R: Basic Reaction
L: Ligand group involved in catalysis
C: Catalysis type
P: Residues/cofactors located on Proteins.
This classification system creates a hierarchy that is then search able by the user. The hydrolysis classification even has pictures of the general mechanisms. Here is an example of a Pepsin-like mechanism and Trypsin-like mechanism. If you or your students are learning general mechanisms a number of these illustrations could serve as excellent real world examples.
The search page contains a number of unique inputs.
For example, ECM utilizes the KEGG pathway database to generate a reference pathway.
Note: use the bottom search button not the top, which is (above the fold) if you do not have a DB code input.