
Insights into Structural Biology Research Trends
The structural biology field is evolving rapidly, with advancements in experimental methods, data challenges, and scientific goals. Structural biologists are adapting to new techniques and facing challenges in data management and reproducibility. The community aims to improve data archiving, develop automated pipelines, and enhance quality indicators to deliver reliable results. Users express a preference for easier access to data processing tools and integrated platforms for data management. These insights offer a glimpse into the current landscape of structural biology research.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Towards a Structural Biology Work Bench Chris Morris, STFC
Structural Biologists are mature computer users 10000 1000 Protein Data Bank New entries by year (log) 100 10 1 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 First use of digital computers in 1940s Combined data rate for European structural experiments > LHC Rate will double with XFEL
New scientific goals Other (please specify) My targets include: (313 responses) Protein / Nucleic Acid Protein / Protein Complexes Single Gene Products Membrane proteins Soluble proteins Prokaryotic proteins Eukaryotic proteins 0.0% 20.0% 40.0% 60.0% 80.0%100.0% 100% 100% 90% 90% 80% 80% Protein-RNA complex 70% 70% 60% 60% Protein-DNA complex Membrane 50% 50% Cytoplasmic 40% 40% Heteromeric protein 30% 30% Extracellular 20% 20% Homomeric protein 10% 10% 0% 0% 1971 1975 1979 1983 1987 1991 1995 1999 2003 2007 2011 1971 1975 1979 1983 1987 1991 1995 1999 2003 2007 2011 New PDB entries, %
New experimental methods Mean 3.9 techniques / respondent Biologists, not technique experts Small samples Data noisy and incomplete 324 respondents Other techniques used in past year: X-ray diffraction NMR Spectroscopy Electron Microscopy SAXS Modelling CD SPR Calorimetry Total Preferred technique: 52 36 68 21 89 44 74 50 64 24 80 463 39 222 X-ray diffraction NMR spectroscopy Electron microscopy 35 9 12 15 20 82 3 8 6 13 14 6 6 10 17 22 116 168 952 56 95 5 5 21 24 15 21 SAXS 14 77 10 Modelling 55 105 160 175 130 Total
New data challenges Improve archiving of data and metadata Improve automated pipelines for MX create pipelines for other techniques Reproducibility Keywords, version numbers Combined algorithms Deliver results to other life scientists Quality indications
Users want data processing, not data management I would use combined techniques if the software was easier to get and use. Last year I repeated some work because I could not find the sample or file produced. If a web portal offers integrated access to data archives and to processing software, I will use it If a web portal offers integrated access to Last year I discarded some samples or files because their provenance was not recorded well enough. data archives and to processing software, I will use it Agree
Crowdsourcing from the middle tier Community includes: Life scientists who use computers End user programmers Algorithm developers Must be easier to compose existing services to make a new web page Google widgets Semantic web BioJS
Structural Biology Work Bench Seamless data transfer between stages Accumulate metadata without user intervention No installation effort Extensible
Reinvent nothing Existing best practise includes: weNMR PaNData Diamond: pipelines and archives Scipion Data Life Cycle Lab Integration, not competition
Developing infrastructures Understand context of use Detailed requirements User experience design Technical architecture Develop Seek feedback 1. 2. 3. 4. 5. 6. users need to become much more directly involved in strategy, coordination and innovation in each of the e-Infrastructure components. This implies that users also need to be empowered to drive the direction of e-Infrastructure service. To this end, the funding for service delivery should be channelled through the users, rather than directly to the service delivery organisations. e-IRG White Paper 2013
Work packages o a distributed file system o a toolkit for making new active web pages that address new scientific questions o a rigid body docking service that can use a variety of experimental evidence o an atomistic structure solution service that can use a variety of experimental evidence o a construct design service scientific collaborations, which validate the work in progress by putting it to use
Part of life of mmCIF file PDB CCP4GUI2 Local Store Xia2 PiMS MrBump BioInf
Pilot survey at Instruct AGM 73% working on eukaryotic rather than prokaryotic systems 84% working on complexes rather than single gene products Each research team routinely uses three-four different techniques 83% would use combined SB techniques more often if it was easier to get access to experimental facilities 73% of the cases found it hard to combine software tools for different techniques in integrated workflows 100% 90% 80% 70% 60% Eukaryote s 50% 40% Viri 30% 20% 10% 0% 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010
References Biasini et al. (2013). Acta Cryst. D69, 701-709. Gutmanas et al. (2013). Acta Cryst. D69, 710-721. Karaca, E. & Bonvin, A. M. J. J. (2013). Acta Cryst. D69, 683-694. Marabini, et al. (2013). Acta Cryst. D69, 695-700. Morris, C. & Segal, J. (2012). IEEE Software, 29, 9-12. Perrakis et al. J. Struct. Biol. 175, 106-112. DiMaio et al., Nature Methods, Improved protein crystal structures at low resolution by integrated refinement with Phenix and Rosetta, in press
Structural Biology Work Bench Experimental facilities produce > petabyte/year Community requirements No installation effort - like WeNMR portal Scattered file system - like ICAT Accumulate metadata without user intervention Users want data processing, not data management We identified the following use cases I: protein-protein interaction II: disordered proteins Agree If a web portal offers integrated access to data archives and to processing software, I will use it