Understanding TOPMed Data Sharing and Analysis Workshop Overview

topmed analysis workshop n.w
1 / 25
Embed
Share

Explore the TOPMed Analysis Workshop held at the University of Washington, emphasizing genetic analysis, data coordination, and consent types. Accessing the website, data organization on dbGaP, and TOPMed Exchange Area organization are discussed in detail.

  • TOPMed Data
  • Genetic Analysis
  • Data Coordination
  • Research Workshop
  • University of Washington

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction to TOPMed and Data Sharing Cathy Laurie (cclaurie@uw.edu)

  2. 4/30/2025 2

  3. 4/30/2025 3

  4. Consent Types

  5. To access the web site: Sign and return non-disclosure agreement from TOPMed DCC gccbiost@uw.edu

  6. * *~50 Participating Studies currently in TOPMed (not shown) 6

  7. TOPMed WGS Overview DNA samples sequence data Sequencing Center IRC Study Michigan joint genotype call sets harmonized sequence data NCBI phenotypes Study Coordinating Center dbGaP DCC UW SRA* harmonized phenotypes phenotypes, genotypes, sequence data phenotypes Working Group COPD Working Group asthma Study A analysis team Study B analysis team Working Group atherosclerosis Scientific Community etc... etc... Study-focused publications Cross study publications Personalized Medicine 7 *SRA is being replaced with an NIH cloud as repository for aligned sequence data

  8. Data Organization on dbGaP Parent study accessions* Phenotype data Prior SNP array and other non-TOPMed genotype data Some have omics data Most are currently released (available to general scientific community) TOPMed accessions* Exchange Area accessible only by authorized TOPMed investigators Whole genome sequence data and genotype call sets Some have phenotype data Some have files contributed by Working Groups for sharing Released accessions Phase 1 studies currently being released (~18k, samples) Include SRA/BAM, VCF, annotation and phenotype files 8 *click to find accession numbers

  9. TOPMed Exchange Area Organization and Current Content Cross-study genotype call set Common Exchange Area Genotype call sets (cross-study) Variant & sample annotation Study- specific Exchange Areas Study A Study B Study C Study-specific TOPMed EA content: Sample files sample-subject mapping, subject consent, sample attributes, pedigrees Study-specific phenotype files - submitted specifically for TOPMed (many studies already have their phenotypes in a released parent study accession that is publicly available) Harmonized phenotype files some from DCC and others contributed by Working Group members BAM/SRA files Misc files (e.g. prior SNP array data for some studies) 9

  10. Sequence Data Joint call sets from IRC Freeze 4 (current version) - alignment to build 37; includes all samples from phase 1 studies (except SAFS), ~18k samples Next freeze (August 2017) alignment to build 38; to include all phase 1 samples and a large fraction of phase 2 samples 10

  11. Phenotype Data Study-specific phenotypes Parent study accessions some have thousands of phenotypes; see website document for tips on how to find what you need in released accessions TOPMed accessions most current data are in Exchange Area; phase 1 study releases can be searched like other released accessions Cross-study harmonized phenotypes DCC is performing harmonization for a limited set of traits based on data in released dbGaP accession; currently this includes blood cell counts and basic demographics; the harmonized data are in the study-specific Exchange Areas Working Groups are exchanging files through the Exchange Areas for their own harmonization efforts 11

  12. Other -Omics Data Many studies have some prior (i.e. non-TOPMed) omics data See survey results Heterogeneous platforms Much of this is not currently available on dbGaP TOPMed Omics Pilot MESA currently underway RNASeq Metabolomics Array-based methylation Proteomics TOPMed plans for additional Omics data generation and analysis PAR-16-021: Omics Phenotypes of Heart, Lung, and Blood Disorders (X01) RFA-HL-18-020: Integrative Computational Biology for Analysis of NHLBI TOPMed Data (R01)

  13. Access & Use Permissions for Data from TOPMed Exchange Areas Accessing Data Only 6 Individuals per study are eligible to apply for access (named by PI) Data Access Requests (DARs) are submitted to dbGaP using TOPMed-generic application Successful applicants may share data with others at their institution A group of applicants with coordinated DARs may share data in a cloud environment Using Data Data may NOT be used for any purpose without an APPROVED paper proposal Exception: a study investigator may use his/her own study s data as they wish Paper proposals originate in the TOPMed Working Groups IMPORTANT: Access to TOPMed Exchange Area data does NOT confer permission to use it in analysis 13

  14. Data Access Mechanisms for Exchange Areas 1. Each study PI and his/her nominees from other institutions apply to dbGaP for access to multiple TOPMed studies (6 total per study) 2. NHLBI DAC reviews and approves/disapproves applications 3. Data are downloaded by each approved investigator to their own institution s IT system 4.Analysts gain access to cross-study data through the PI/nominee of the study through which they are affiliated 5. Currently, several PIs/nominees have approved access 14 See https://www.nhlbiwgs.org/information, section Data sharing

  15. How data sharing via the Exchange Area works Study A: phenotypes Study B: phenotypes Study C: phenotypes Local Study Storage Uploaders Cross-study genotype call set dbGaP Exchange Areas Study A: phenotypes Study B: phenotypes Study C: phenotypes Downloader Uploading requires study registration Downloading requires Data Access Request approved by NHLBI DAC Study B Cross-study Association analysis Local Study Computers 15

  16. Other Data-Sharing Mechanisms Sharing Exchange Area data in a Cloud Environment requires coordinated dbGaP applications and a Cloud management plan Study investigators may share their own study s TOPMed data outside of dbGaP Data Transfer Agreements are generally required DCC s focus is dbGaP sharing, so not able to help with these kinds of arrangements 16

  17. Principles of Data Sharing & Publication in TOPMed PIs and other investigators who obtain dbGaP approval to download TOPMed data are responsible for how it is used i.e. making sure that consents and Data Use Limitations are respected by everyone with whom they share the data (generally only within an institution) Investigators obtain access to data through the PI or other senior investigator of the study with which they are affiliated Investigators may begin analyzing TOPMed data only after they have an approved paper proposal Paper proposals must be approved by a TOPMed Working Group prior to submission Each paper proposal must specify what studies data they intend to use and form a collaboration with investigators from that study. PI approval of data use is required prior to submitting the proposal. The person submitting the proposal must also select specific study-consent groups as they become available and sign off on their agreement to abide by the Data Use Limitations 17

  18. Paper Proposal Process Steps for TOPMed paper proposal development and approval: 1. Develop proposal within a TOPMed Working Group (WG), including selection of studies to be analyzed. Approval by this WG is required before proceeding. 2. Discuss data access mechanisms with leaders of the TOPMed study with which you are affiliated 3. Request initial approval from PIs for use of data from selected studies. Approval (or failure to respond within 2 weeks) is required before proceeding. 4. Submit the paper proposal for scientific review using the online form; this will be reviewed by the TOPMed Publications Committee. 5. Submit data set selection for review ; this will be reviewed by the PIs of the selected data sets. https://www.nhlbiwgs.org/paperproposals/about 18

  19. Date set selection sample https://www.nhlbiwgs.org/paperproposals/data-sets 19

  20. Check your paper proposal to see for which consent groups you have approval. Your manuscript will not be approved for publication if it uses data sets not listed here as approved! 20

  21. Links to resources Guide for Working Groups Data Sharing through the TOPMed Exchange Areas Paper Proposal Instructions Many other pages on the TOPMed web site Questions: Contact the DCC program coordinators (gccbiost@uw.edu) and they will answer or route your query to the appropriate person 21

  22. Extras 22

  23. Paper proposal submission for scientific review 23 https://www.nhlbiwgs.org/node/add/paper-proposal

  24. Agree to Data Use Limitations (DULs) https://www.nhlbiwgs.org/paperproposals/data-sets After you select your data sets on this page, you will be asked to agree to the Data Use Limitations (DULs) 24

  25. Your dashboard https://www.nhlbiwgs.org/paperproposals/dashboard 25

Related


More Related Content