Genetic Variant Positioning Simulation

1 / 7

Embed Share

Explore the process of simulating variant positions in a genome, incorporating binning and clustering techniques for efficient data analysis. Understand the steps involved in shuffling variants to new locations, taking into account genome structure and covariate matrix calculations.

nik_hen Follow

Uploaded on Aug 20, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

MOATsim A somatic variant simulator Given a set of input variants, shuffle to new locations, taking genome structure into account MOAT-v s permutation step without the p-value calculations 1

MOATsim Read genome coordinates into memory Divide into bins of user-specified size Subtract blacklist regions Use bigWigAverageOverBed to derive covariate values for each bin Generate covariate matrix covar 1 covar 2 covar 3 bin 1 bin 2 bin 3 bin 4 bin 5 2

Blacklist filtering Relevant variables: bin width: The width that represents the local genome context. Variants are shuffled to new locations within their containing bin. min bin width: In the event that an especially small bin is formed, either due to a chromosome end or subtraction of a blacklist region, merge the bin with the nearest full size bin if it s below this width. bin width bin width 3

Blacklist filtering Excludes low mappability regions Results in bins smaller than bin_width Possibly too small to offer many new variant locations Hence, guarantee a minimum bin width min_bin_width user parameter Typically set to half the size of bin_width truncated bin_width 4

Blacklist filtering Bins smaller than min_bin_width are merged with an adjacent neighbor If no adjacent neighbor is available, remove the bin bin_width merged 5

Covariate matrix: row clustering Goal: Find similar bins (i.e. similar covariate vectors) and treat as single block Currently using k-means clustering Found that the optimal cluster number k was around 30 using MutSig covariate file (defined over exome only) Within-sum-of-squares prone to stochastic variation beyond this point 50000 40000 Within groups sum of squares 30000 20000 Rough location for beginning of Stochastic variation in within-ss 10000 6 0 10 20 30 40 50 Number of Clusters

Variant Placement Step Start with many bins Mark which have similar covariate vectors Given input variants Shuffle to new locations within bin cluster 7

Genetic Variant Positioning Simulation

Download Presentation

Presentation Transcript

Related

More Related Content