Transforming Data Management for Scientific Discovery
The next generation of scientific discovery emphasizes data-driven approaches, demanding a significant shift in data management practices to extract value for economic growth and societal progress. Key aspects include public good preservation, confidentiality, and leveraging public funding for research. Explore insights on high-throughput experimental methods, industrial-scale production, and the importance of preserving data sets for future advancements. Discover the quest for an active data storage solution that seamlessly synchronizes local and network storage for enhanced collaboration and data sharing. Uncover the critical role of research facilitators in coordinating expertise and training to support researchers throughout the entire research life cycle. Embrace the transformative potential of research data services to elevate academic practices and national expertise in managing research data effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Dr Jonathan Tedds Senior Research Liaison Manager IT Services Research Fellow Dept of Physics & Astronomy twitter @jtedds jat26@le.ac.uk
Even the Chancellor says he gets it! The next generation of scientific discovery will be data-driven discovery We need to make sure we capture value from this mass of data both for economic growth and for social advances, such as better health. This requires a transformation in data management Speech by the Chancellor of the Exchequer, Rt Hon George Osborne MP, to the Royal Society 9 Nov 2012
Public good Preservation Discovery Confidentiality First use Recognition Public funding
Research and the long tail High throughput experimental methods Industrial scale Commons based production Public data sets Cherry picked results Preserved GenBank PDB UniProt ChemSpider CATH, SCOP (Protein Structure Classification) Pfam Spreadsheets, Notebooks Local, Lost 2012-02-07 4 Slide: Carole Goble DCC roadshow East Midlands - CC-BY-SA
Active Data Storage: Identifying the Holy Grail ? what is needed is a tool to transparently synch local and network storage (Marieke Guy, JISCMRD2 Nottingham) CKAN (Orbital, Lincoln)? Research Data Toolkit (Herts) hybrid solutions DropBox-like functionality a must Usability Technical interoperability aim to help create databases for research data so facilitates collaboration and data sharing enables the subsequent publication of datasets challenge is to ensure data are documented Preserved service is sustainable
UK Research Data Service (UKRDS) pathfinder Key finding of HEFCE funded study 2008/9 at Leicester, Oxford, Bristol, Leeds: research facilitators = missing link need institutional level support Coordinate research life cycle, expertise & training across key institutional stakeholders Researchers in Colleges, depts, groups, projects IT Services embryonic research computing support HPC, active storage . Research Support Office funding, legal requirements Information Assurance FoI Library - repository archive, preservation, ethical & training expertise Academic Practice Unit training Coordinate to national expertise including JISC Managing Research Data programme #jiscmrd Digital Curation Centre, UKDA e.g. research data management planning tools RCUK & EU FP6,7 e-science program RIN, OMII, UCISA ..
Portable Antiquities Scheme (British Museum) Place-names (Nottingham) Surnames Genetics IT hosting and GIS Best practice: #JISCMRD, UKRDS, DCC, international http://halogen.le.ac.uk
Halogen as template for research data management #jiscmrd Requirements Analysis must be iterative! Data Management Plan use DMPonline (DCC) Scalable research data management infrastructure pilot phase to nationally available resource LAMP stack IT infrastructure: host research database work with JISC/DCC A model for the long term delivery of a data management service within the institution including support, maintenance, governance & charging policies Include researchers, IT services, research support office, library services etc.
DIRECT BENEFITS New research opportunities Cross database work seed new research samples Scholarly communication/access to national resources Key to English Place Names (Nottingham) Portable Antiquities Scheme (British Museum) Verification, re-purposing, re-use of data Cleaning & enhancing private research datasets for reuse & correlation Increased transparency excellent training for best practice in research data management Increasing research productivity Build in cleaning, annotation, enhancement into normal research workflows research datasets may immediately be reusable and interoperable Impact & Knowledge Transfer Reuse IT infrastructure: EU FP7 Mintweld (industrial engineering) & BRICCS National Health Service/University Trust data sharing. Increasing skills base of researchers/students/staff
INDIRECT BENEFITS (COSTS AVOIDED) No re-creation of data Researchers avoid valuable time needed to transcribe external data sources Inter disciplinary research platform available centrally for reuse as a service Lower future preservation costs Reusable Service Level Agreements in place Not dependent on individuals alone Re-purposing data, methodologies for new audiences Internal & national research resources can become nationally reusable e.g. Geneticists learn better spatial correlation analysis techniques Protecting returns on earlier investments research funders:Wellcome Trust, Leverhulme Trust, AHRC, British Museum Institutions: Universities of Leicester, Nottingham, UCL
CHALLENGES interdisciplinary research database ingest each input dataset in form such that sufficient information is carried forward to enable interoperation Cultural differences versioning & provenance for input datasets which software tools, infrastructure , Query interface? suitable for multi disciplinary researchers Requirements upon the institution for sustaining the research assets & skills Requirements upon the researchers Annotating Refreshing Maintainence of datasets
Top Tip: how to get researchers attention? Research grant pre-award costing (LUCRE - SAP based) Dominates researchers minds! Enable PIs to build grant application using actual costs of staff, overheads, and the right rules for funder Trigger involvement of IT Research Liaison and wider institutional expertise via flags sensitive research data costing/planning support including curation and preservation over research lifecycle Track institution wide needs via IT Service Desk
Researcher Responses Researcher Responses to to Contacts Made Contacts Made Response Received 37% Response Received 37% No Response 63% No Response 63%
ORGANISATIONAL CHALLENGESAND SOLUTIONS Cultural differences Recognise different cultures and mind sets research community and IT specialists in central services different professional language, expectations and working practises management of a research project usually requires a different, iterative methodology than an IT infrastructure project having a more clearly pre determined end point Research Liaison Role strong research background helps! enables effective ways of liaising with research community bridging gaps in understanding Leveraging expertise within and external to the organisation coordinate specialists See Research Fortnight blog piece Feb 2011
Suggested institutional timeline From Whyte & Tedds (2011), DCC Briefing http://www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
Add governance.... Governance JISCMRD policy event, 12-13 March 2012, Leeds University Policy and Strategy Information and Communications Technology Committee Academic Policy Committee Research Policy Committee Information Security Policy Working Group IT Portfolio Board Learning Technologies Management Group IT Management Groups for Corporate Services (10 in total) Research Computing Management Group
As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database. We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7] Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/project s/science-public-enterprise/report/
#JISCMRD PREPARDE:Peer REview for Publication & Accreditation of Research Data in the Earth sciences capture the processes and procedures required to publish a scientific dataset ingestion into a data repository formal publication in a data journal address key issues in data publication how to peer-review a dataset? what criteria are needed for a repository to be considered objectively trustworthy? how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community? PREPARDE team includes key expertise in Research academic publishing data management Earth Sciences focus but produce general guidelines applicable to a wide range of scientific disciplines and data publication types
Geoscience Data Journal, Wiley-Blackwell and the Royal Meteorological Society capture and manage workflows required to operate the Geoscience Data Journal from submission of a new data paper and dataset, through review and to publication develop procedures and policies for authors, reviewers and editors allow the Geoscience Data Journal to accept data papers as submissions for publication focus on guidelines for scientific reviewers who will review the datasets incorporate some technical developments at the point of submission data visualisation checks interface improvements enhance the resulting data publications put in place procedures needed for data publication in the California Digital Library
Universities and their users Future .? http://www.brisskit.le.ac.uk Advice and requirements response Brokerage and Strategic Support DCC JANET(UK) Others e.g. HPC Service delivery and support Service delivery and support http://www.janetbrokerage.ac.uk/ SLAs Ts and Cs Compliance monitoring http://umfcloudpilot.eduserv.org.uk Cloud Storage Providers Other Cloud Infrastructure XaaS
BRISSKit: Biomedical Research Infrastructure Software Service Kit A vision for cloud-based open source research applications #BRISSKit http://www.brisskit.le.ac.uk
BRISSKit context: The I4Health goal of applying knowledge engineering to close the ICT gap between research and healthcare (Beck, T. et al 2012) Data as a public good & research efficiencies = strategic priority for government, NHS, funders (e.g. MRC, Wellcome, CRUK)
Overview of BRISSKit Developing software as a service data management infrastructure based on open- source applications More efficient & easier for researchers Offers significant savings in research database and IT support costs Development funded by HEFCE University of Leicester in partnership with the University Hospitals Leicester Trust and the Cardiovascular BRU
BRISSkit USPs Integrated support for core research processes Well-established mature open source applications as protoyped in Cardiovascular: fully UK customised A platform for seamless management and integration between applications An API allows integration with existing clinical systems Easy set up, use and administration through browser (including on mobile devices) Capability of being hosted in any compliant cloud provider including UHL (NHS information governance)
BRISSkit components = web services CiviCRM Enables end-to-end contact management for volunteers and research participants, tracking approaches, contact, responses, recruitment, exclusions. CiviCRM was designed for the 'civic sector' and has an object model that reflects community building and non-profit relationships.
OBiBa Onyx Records participant consent, questionnaire data and primary specimen IDs. Web-based, secure data entry by research staff. E.g. used for all patient recruits in LCBRU mobile computing on wards and outpatient clinic in TMF. Await significant new release
caTissue Holds data on primary, derived and aliquot specimen, including linear and 2d barcodes. Storage inventory, order tracking currently over 30,000 LCBRU samples stored and recorded.
i2b2 Data from multiple data sources combined into multiple ontologies for flexible and sophisticate d searching, cohort discovery and research.
The semantic bridge Bio-ontology! OBiBa Onyx i2b2 Records participant consent, questionnaire data and primary specimen IDs Cohort selection and data querying ?
www.brisskit.le.ac.uk Email: brisskit@le.ac.uk
Market: who is BRISSkit for? Modular approaches and scalable tools with open source licenses make good investments Individual researchers and associates enterprise-level tools without the IT overheads Research themes and departments stand-alone instances of required tools to accelerate research Research units and centres integrated toolkit with clinical data loading services, or 'jigsaw pieces' to complement existing provision
BRISSkit Sustainability OS engagement OS community engagement standards compliance service vision Cross-enterprise Service Architecture: how to join & use service for new groups & partners Definition of service vision, organisational & service components all relevant standards & tools which Brisskit partners will be expected to use and comply with OS Community Engagement Charter defining engagement with existing & new OS communities including adoption & code commitments
Summary Can t do it all in house! But many disciplines don t have data centres Build coalition of institutional actors Essential to have high level support Take and shape Identify what you do have in-house Access external tools, standards where possible Active storage, collaboration, eprints Propose best of breed for (inter)national reuse Share benefits (and costs) over JANET Sustainability the key challenge