Modular Database Pipeline for 454 Sequencing Variant Identification

joachim de schrijver n.w
1 / 13
Embed
Share

Explore the possibilities of a modular and database-oriented pipeline for 454 sequencing variant identification. Learn about the Roche/454 GS-FLX sequencing system, standard software options, and the advantages of a modular database approach for efficient data processing and storage. Discover how VIP software is being utilized in various sequencing applications, including amplicon resequencing and de novo shotgun sequencing. Uncover the benefits of coverage improvement, PCR assessment, and handling homopolymers in sequencing experiments.

  • Database Pipeline
  • 454 Sequencing
  • Variant Identification
  • Modular Approach
  • VIP Software

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Joachim De Schrijver

  2. Short introduction on 454 sequencing Variant Identification pipeline Possibilities of a DB oriented pipeline Examples Coverage Improving PCR Fast Q assessment Homopolymers

  3. Roche/454 GS-FLX sequencing: Pyrosequencing 400,000 reads/run Average length: 200-250bp Applications: Resequencing: Variant identification De novo (genome) sequencing: Assembly of new regions, plasmids or entire genomes Standard Software: Variants: Amplicon Variant Analyzer (AVA) Assembly: Standard 454 assembler

  4. Standard software + + Easy to use + + reproducible results on similar datasets + + GUI (graphical user interface) - - No answer for non-standard questions Methylation experiments Different types of experiments grouped together - - What about hidden information? Homopolymer error rates Quality score ~ length of sequenced read Multirun information

  5. Modular database pipeline Modular: Efficient planning Scalable Database (DB): No loss of data Grouping several runs together Modular and database oriented

  6. Basic idea: Data is processed and stored in DB. Results (reports) are calculated on the fly using the DB data. Fast & efficient Calculations only happen once Everybody can access the database without risk of data modification Reporting is independent from the dataprocessing Paper: De Schrijver et al. 2009. Analysing 454 sequences with a modular and database oriented Variant Identification Pipeline

  7. VIP originally developed for variant identification Now being used in: Amplicon resequencing De novo shotgun Methylation ~ solexa experiments Hidden data can be extracted using intelligent querying strategies Results per lane/Multiplex MID/run

  8. Coverage can be calculated per Lane MID Amplicon Base position Assessment of errors (PCR dropouts vs. human errors) MID MID frequency (unmapped) frequency (unmapped) 15.00% 10.00% 5.00% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12

  9. Amplicon Resequencing experiment Goal: Variant identification Length distributions Mapped Unmapped Short mapped Additional length separation + Improved PCR Result: Improved efficiency

  10. Can the length of a homopolymer be assessed using the Q score? Yes, when homopolymer length < 6bp

  11. Fast assessment of the quality of a run Q value ~ position Q value ~ position Q value ~ position Q value ~ position 50 50 40 40 30 30 Q value Q value 20 20 10 10 0 0 24 47 70 93 1 116 139 162 185 208 231 254 277 300 323 346 369 0 50 100 150 200 250 300 Lab work OK Errors in lab work

  12. Biobix Wim Van Criekinge Tim De Meyer Geert Trooskens Tom Vandekerkhove Leander Van Neste Gerben Mensschaert CMG UZ Gent Jo Vandesompele Jan Hellemans Filip Pattyn Steve Lefever Kim Deleeneer Jean-Pierre Renard Biobix Ugent Ugent NXT Paul Coucke Sofie Bekaert Filip Van Nieuwerburgh Dieter Deforce Wim Van Criekinge Jo Vandesompele NXT- -GNT GNT CMG UZ Gent

  13. Questions ? Joachim.deschrijver@ugent.be

Related


More Related Content