Do we need an X-ray Diffraction Image Data Bank?
Jun 4, 2009
I started a poll (to your right) about whether we need to create an X-ray diffraction data bank.
Feel free to share your reasons for your answer in comments below.
Posted by Sean |
Categories: Crystallographic Data Collection, Crystallographic Data Processing, Crystallographic Data Refinement, Scientific Publication and Presentation | Tagged: Crystallographic Data Processing, Crystallography, Diffraction Images, Indexing, Poll, Tools |
Rich Apodaca
June 4th, 2009 at 1:36 PM #
Sean, interesting idea. Besides storing images, what else would the databank need to do in order to be helpful to crystallographers?
olchemist
June 4th, 2009 at 2:12 PM #
what is the benefit in your opinion?
Sean
June 4th, 2009 at 9:18 PM #
I wanted to post this poll based on a thread on the ccp4bb. I highly recommend reading the discussion (first entry if you google: ccp4bb images), you can then go to the bottom of the particular post and click through to the top of the thread.
I don’t want to copy paste all the wonderful ideas that were presented and not properly credit the authors.
However, to also address olchemist – a number of benefits that were mentioned:
Verify phasing
Errors in symmetry
Twinning
Diffuse scatter
Improvements in integration and refinement programs
Help settle disputes (http://www.nature.com/nature/journal/v448/n7154/full/nature06103.html)
I also believe that it would serve as wonderful teaching tool. Finally, I have had difficulty in attaining various types of data sets (for example: if I wanted to play around with phasing off of Sulfurs or test how my MAD anomalous signal compares to a deposited structure).
Oliver Clarke
June 5th, 2009 at 1:10 AM #
Absolutely we need this as soon as practical, for all the reasons Sean outlined above.
Additionally, to aid interpretation of unmodelled density, all known and suspected components of the crystallisation solution should be listed with the deposition (not just the precipitant solution).
Preferably one would also include a direct beam shot or the coordinates of the refined beam centre.
Oliver Clarke
June 5th, 2009 at 1:18 AM #
Unfortunately, I doubt it’s going to solve the unfortunate issue of synthesised data – if you’ve seen any of James Holton’s simulated diffraction images, they look astonishingly realistic (both visually and with regards to the intensity stats, errors etc).
Additionally, that particular structure is unquestionably a fake – one can see that just by looking at the chemically impossible contacts that are apparently perfectly well-defined in the electron density.
The fact that Nature published the (blatantly inadequate) reply by the authors and didn’t request retraction of the paper was a little disappointing, to say the least.
(And if you have a browse through some of the previous structures that Krishna Murthy has published, it would appear not to be an isolated incident…)
Sean
June 5th, 2009 at 9:38 AM #
I agree that an X-ray data bank won’t solve the problem with synthesized data.
My thought is that very few will even submit synthesized images. The reason being is that eventually they will be caught. If the structure has been published then at some point it will be attempted by another researcher.
This is an interesting point, but one that should not prevent a data bank from being developed if the community would find it useful.
Graeme Winter
June 5th, 2009 at 10:41 AM #
This keeps on coming up, I keep on saying “yes” we need to be able to store / share diffraction data. Some criteria should be applied on the amount of metadata available, for example the intention of the data collected, the numebr of sites / sequence / MR model so that the structure solution may be repeated.
David Waterman
June 5th, 2009 at 12:49 PM #
Yes, it would be very useful, for developers interested in integration methodology and detector properties as well as the reasons mentioned above.
James Whisstock
June 5th, 2009 at 6:11 PM #
Check out http://www.tardis.edu.au/ which provides a practical distributed mechanism for deposition of data – currently this is generally used by the Australian community, however, the download tools that will be released shortly are designed both to permit ready archiving of data (a useful thing in its own right) as well as linking these data to the repository.
Cheers
J
Sean
June 5th, 2009 at 8:07 PM #
James, awesome.
I will be looking forward to the tools being released. I presume that the reason the depository has been used generally by the Australian community is due to being developed there?
I did not come across a reason why this could not be expanded into a world wide depository of diffraction images.
DrNO
June 6th, 2009 at 12:55 PM #
What about space and upload/download time? A data set can be a several GBs. With 5-10k? structures solved/year, that’s 500 TB. I’ld say let’s wait a few years. Let us first have a unified system for images, something that would convert the detector specific image to a general image format that would be easily recognized by all indexing programs. I hate having to ask HKL2000 every time I use a new detector.
Ezra Peisach
June 7th, 2009 at 10:22 AM #
Such a resources would be useful for archival purposes as well as an aid for software developers.
There are a few major hurdles – including the meta information.
From a point of view of indexing processing – all the various criteria would be necessary to accurately reproduce the experiment. This would include spots used for indexing, I/sigma cutoffs, resolution limits in various programs, frames used for indexing, etc… The list goes on. For the well diffracting crystals – all of the above is not necessary – but for those marginal crystals – having the above information will be necessary to “reproduce the experiment”
Unless there is buy in from equipment manufacturers to dump this information from the data collection/processing – it will not happen.
Then there is the issue of cost. At the Tardis site – they report 33 datasets for 16 structures coming out to 106Gb (3.2Gb/dataset). From the PDB website, in the first quarter of 2009, 2041 structures were deposited. So say 8000 per year… This is about 50 terabytes per year of data!!! This is without mirroring, etc. Perhaps there is a solution with regional databases – by country, region, etc. – but this would require government support…
So – I see the value, there is a hefty cost…
Jan Dohnalek
June 8th, 2009 at 1:12 AM #
Definitely YES. We have been discussing this in the computational work package of Instruct as well (Integrated infrastructure for structural biology in Europe). Amounts of data, transfer times and standardization are the real bottlenecks. There “might” be personnel dedicated under INSTRUCT in the future.
Jan