Mutability Sandbox

Mutability Sandbox
Slide Note
Embed
Share

Unravel the complexity of genetic overlap enrichment mechanisms through insightful analysis and computational modeling. Explore the interplay between somatic and germline SNPs, with a focus on flavor distinctiveness and tissue specificity in mutation rates. Discover the implications for understanding genetic contamination rates and mutation propensities across diverse tissues.

  • Genetic
  • Enrichment
  • Mechanisms
  • Computational Models
  • SNPs

Uploaded on Mar 03, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Mutability Sandbox Will Meyerson 1

  2. Prospectus Timeline This Friday: Submit prospectus Orally present prospectus Dec 8 2

  3. Germline Contamination Rate Still 0.06% Last week, I showed that a somatic SNP in a given patient is up to 5x more likely to be a germline SNP in another patient than we would expect by chance. I argued that only 0.06% of these variants are likely germline contaminants One premise of my argument was that there are 200K rare variants per person, which I obtained from the upper bound of the 1000 Genomes range of 40K-200K However, the good people here pointed out to me that 1000 Genomes was low-pass sequencing and therefore unreliable for this kind of estimate So I re-did the analysis using data derived from deep (64x) sequencing of NA12878. NA12878 has 180,442 autosomal SNPs that are not present gnomAD or present in gnomAD with AF < 0.5% Fortunately, this is very close to the 200K estimate I used in my calculations, so my estimate of the likely number of germline contaminants in PCAWG remains unchanged WARNING: implications for VAF-GERP analysis 3

  4. Conceptual Model for Overlap Enrichment Last week, I showed that a somatic SNP in a given patient is up to 5x more likely to be a germline SNP in another patient than we would expect by chance. This week, I present a forward model of one plausible mechanism by which this sort of overlap enrichment can occur. My model assumes that mutations come in different flavors, with distinct relative rates of occurrence. For now, we can think of a flavor as an ordered pair from a reference allele to an alternative allele (e.g. C->T), but the concept is more general and could be modeled to include additional features, such a a C- >T mutation within a CpG island or within a promoter. The exact relative propensities to mutation of the various flavors may change across tissue types, but my model assumes that each flavor nonetheless has some central tendency of high or low mutability that is preserved across tissues. My simulations show, quite intuitively, that mutation overlap rates increase with flavor distinctiveness and decrease with tissue distinctiveness Next time, I seek to use my model to learn the empirical flavor distinctiveness and tissue distinctiveness parameters implied by the overlaps of germline data and somatic data from diverse tissues. 4

  5. Computational Model for Overlap Enrichment Assume 100,000 SNPs are possible on some tiny, fictitious chromosome segment, and that these SNPs fall into 5 equally prevalent flavors. From the PanFlavor_Distribution_of_PanTissue_Mutability_Distributions, randomly draw one PanTissue_Mutability_Distribution for each flavor. For each flavor s PanTissue_Mutability_Distribution, for each tissue, draw a Tissue_Specific_Flavor_Specific mutability. Then in each tissue, draw 10,000 SNPs with likelihoods proportional to the relevant Tissue_Specific_Flavor_Specific mutability for each potential SNP (which will depend on the flavors of the potential SNPs). 5

  6. Statistical Model for Overlap Enrichment Flavor_i_Mutability_in_Tissue_J ~ logNormal(meanLog = PanTissue_Mutability_for_Flavor_i , sdlog = Tissue_Mutability_Distinctiveness) PanTissue_Mutability_for_Flavor_i ~ logNormal(meanlog = PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center , sdlog= Flavor_Mutability_Distinctiveness) PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center = 0 6

  7. Two draws from the same tissue are more likely to overlap each other if flavors are highly distinct Flavor_i_Mutability_in_Tissue_J ~ logNormal(meanLog = PanTissue_Mutability_for_Flavor_i , sdlog = Tissue_Mutability_Distinctiveness) PanTissue_Mutability_for_Flavor_i ~ logNormal(meanlog = PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center , sdlog= Flavor_Mutability_Distinctiveness) PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center = 0 7

  8. Two draws from different tissues are less likely to overlap each other if flavors are highly distinct Flavor_i_Mutability_in_Tissue_J ~ logNormal(meanLog = PanTissue_Mutability_for_Flavor_i , sdlog = Tissue_Mutability_Distinctiveness) PanTissue_Mutability_for_Flavor_i ~ logNormal(meanlog = PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center , sdlog= Flavor_Mutability_Distinctiveness) PanFlavor_Distribution_of_PanTissue_Mutability_Distributions_Center = 0 8

  9. Next Steps: Learn parameters for the model from the data Use realistic SNP flavors SNP meta-types, tissue meta-types Now inclined to consider germline less special, just another tissue (maybe in its own meta-meta-type) 9

More Related Content