Novel Contextual Locality Sensitive Hashing Algorithm for SMRT Reads Mapping

conlsh context based locality sensitive hashing n.w
1 / 14
Embed
Share

Explore a novel contextual Locality Sensitive Hashing algorithm designed to align noisy SMRT reads effectively with the reference genome. This algorithm, conLSH, surpasses rHAT in speed and memory requirements, offering significant improvements for aligning SMRT reads in genomics research.

  • Locality Sensitive Hashing
  • SMRT Reads
  • Genomics Research
  • Algorithm Efficiency
  • Next-Gen Sequencing

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. conLSH: Context based Locality Sensitive Hashing for mapping of noisy SMRT reads Angana Chakraborty, Sanghamitra Bandyopadhyay Computational Biology and Chemistry Volume 85,April 2020, 107206 Presenter: Yuan-Hsiu Tsai Date: Sept. 22, 2020

  2. Abstract(1/2) Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel contextual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has space requirement, where n is the number of sequences in the corpus and is a constant. + 1 p ( ) O n

  3. Abstract(2/2) The indexing time and querying time are bounded by + 1 p ln n n ( ) O p ( ) O n and respectively, where P2>0, is a 1 P ln 2 probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately 24.2% less processing time, while saving about 70.3% in peak memory requirement for H.sapiens PacBio dataset.

  4. Introduction- LSH S = ACBCBAB I = 1234567 (index) We have a function F to index the char of S F(1) = F(6) = the position of A in LSH Table So, after do this, we can get same char index fast.

  5. FASTA

  6. Function /Dataset conLSH rHAT1 rHAT2 BWA E.coli 1.38 (MB) 290 4317 9.7 0.01 (S) 4 49 8.7 H.sapiens 724.32 11755.34 15806.77 4523 49 785 1166 3904 A.thaliana 29.87 742.84 4770.16 209 1 35 115 114.7 S.cerevisiae 3.03 316.91 4343.47 21.3 0.01 3 50 10.16 O.sativa 93.18 1744.08 5804 652.4 7 122 253 543.3

  7. Experiment Method Time Taken (Sec) % of Reads Aligned Peak Memory Footprint rHAT1 (13) 210839 99.8 14562 rHAT2 (15) 667280 100 15746 conLSH 159888 99.8 4327 BLASR(BWT) 724304 99.7 8100 BLASR(SA) 848602 99.8 14700 BWA-MEM 372782 99.7 5200 BWA-SW 1145150 99.76 7100

  8. Function /Dataset Method Time Taken(Sec) % of Reads Aligned Peak Memory Footprint (MB) E.coli_sim rHAT1 6 94.8 294 rHAT2 59 100 4225 conLSH 15 100 85 S.cerevisiae_sim rHAT1 9 100 326 rHAT2 85 100 4259 conLSH 45 100 156 A.thaliana_sim rHAT1 89 100 849 rHAT2 372 100 4781 conLSH 337 99.9 574 D.melanogaster_ sim rHAT1 139 100 1054 rHAT2 223 98 4987 conLSH 395 99 705

  9. Experiment

Related


More Related Content