Unlocking the Potential of OBITOOLS in DNA Metabarcoding

o b i t o o l s n.w
1 / 33
Embed
Share

Discover how OBITOOLS revolutionizes DNA metabarcoding by enabling efficient species identification and taxonomy-based sequence sorting. Learn about the software's benefits, functionalities, and applications in analyzing NGS data for environmental DNA barcoding.

  • DNA Metabarcoding
  • OBITOOLS
  • Species Identification
  • NGS Data Analysis
  • Taxonomy Sorting

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. O B I T O O L S Reporter: Liang Linyan Date: February 10, 2023

  2. CONTENTS 01 Introduction 02 Instructions

  3. PART 01 Introduction

  4. DNA metabarcoding DNA barcode generally refers to the DNA sequence with a length of about 100-200 bp. It has been widely used as identification and discovery of new species. At present,different amplification been determined for DNA barcodes of different types of organisms, for example, COI gene has been widely used as a standard barcode to identify animal species; Ribosome 16SRNA is often used to identify bacteria; The ITS gene is used for species identification as the standard barcode of fungi. Environmental DNA barcode amplify and high-throughput sequence the DNA isolated from environmental samples (such as soil, water, dung, etc.) by using macro barcode, so as to identify multiple species (or advanced taxons) in environmental samples at the same time. a method for species fragments have technology is to

  5. OBITOOLS: a UNIX-inspired software package for DNA metabarcoding As a method of species identification and discovery of new species, DNA metabarcoding relies on the use of second-generation sequencing, so it needs the ability to process a large number of sequence data sets. OBITOOLS software requirement. It is a UNIX-inspired software package and a set of programs specially designed to analyze NGS data in the context of DNA metabolic enzymes. package meets this

  6. List of OBITOOLS programs Converts sequence files to different output formats Converts a sequence file to a tabular file Sequence annotations Assigns sequences to taxa OBICONVERT OBITAB ECOTAG Tags a set of sequences for PCR/ sequencing errors identification Trims sequences OBICLEAN OBICUT Extract samples from a data set OBIEXTRACT Groups and dereplicates sequences Sequence sampling and filtering OBIUNIQ

  7. Advantages of OBITOOLS The innovation of the OBITOOLS is their ability to take into account the taxonomic annotations, ultimately allowing sorting and filtering of sequence records based on the taxonomy; The main difference with classical UNIX programs is that text files are not analysed line per line but sequence record per sequence record OBITOOLS allows users to set up versatile data analysis pipelines, adjustable to the broad applications. range of DNAmetabarcoding

  8. PART 02 Instructions

  9. Q1: How to install OBITOOLS 1. Availability of the OBITools The (http://www.cecill.info/licences/Licence_CeCILL_V2.1-en.html). The OBITools are deposited https://pypi.python.org/pypi/obitools) and all the sources can be downloaded from our subversion server (http://www.grenoble.prabi.fr/public-svn/OBISofts/OBITools). OBITools are open source and protected by the CeCILL 2.1 license on the Python Package Index (PyPI : To install the OBITools, you need that these softwares are installed on your system: 2. Prerequisites Python 2.7 (installed by default on most Unix systems, available from the Python website) gcc (installed by default on most Unix systems, available from the GNU sites dedicated to GCC and GMake)

  10. 3. On a linux system You have to take care that the Python-dev packages are installed. 4. On MacOSX The C compiler and all the other compilation tools are included in the XCode application not installed by default. The Python included in the system is not suitable for running the OBITools. You have to install a complete distribution of Python that you can download as a MacOSX package from the Python website.

  11. 5. Downloading and installing the OBITools The OBITools are downloaded and installed using the get-obitools.py script. This is a user level installation that does not need administrator privilege. Once downloaded, move the file get-obitools.py in the directory where you want to install the OBITools. From a Unix terminal you must now run the command : > python get-obitools.py The script will create a new directory at the place you are running it in which all the OBITools will be installed. No system privilege are required, and you system will not be altered in any way by the obitools installation.

  12. The newly created directory is named OBITools-VERSION where version is substituted by the latest version number available. Inside the newly created directory all the OBITools are installed. Close to this directory there is a shell script named obitools. Running this script activate the OBITools by reconfiguring your Unix environment. > ./obitools Once activated you can desactivate the OBITools byt typing the command exit. > exit OBITools are no more activated, Bye... ======================================

  13. 6. System level installation To install the OBITools at the system level you can follow two options : copy the obitools script in a usual directory for installing program like /usr/local/bin but never move the OBITools directory itself after the installation by the get-obitools.py. The other solution is to add the export/bin directory located in the OBITools directory to the ``PATH``environment variable. 7. Retrieving the sources of the OBITools If you want to compile by yourself the OBITools, you will need to install the same prerequisite: > pip install -U virtualenv > pip install -U sphinx > pip install -U cython

  14. moreover you need to install any subversion client (a list of clients is available from Wikipedia) Then you can download the > svn co http://www.grenoble.prabi.fr/publicsvn/OBISofts/OBITools/ branches/OBITools-1.00/ OBITools This command will create a new directory called OBITools. 8. Compiling and installing the OBITools From the directory where you retrieved the sources, execute the following commands: > cd OBITools > python setup.py --serenity install Once installed, you can test your installation by running the commands of the tutorials.

  15. Q2: How to analyze DNA metabarcoding / eDNA data produced on Illumina sequencers using the OBITools? Data acquisition The data needed to run the tutorial are the following: fastq files resulting of a GA IIx (Illumina) paired-end (2 x 108 bp) sequencing assay of DNA extracted and amplified from four wolf faeces: wolf_F.fastq wolf_R.fastq The file describing the primers and tags used for all samples sequenced: The tags correspond to short and specific sequences added on the 5 end of each primer to distinguish the different samples wolf_diet_ngsfilter.txt

  16. Data acquisition The file containing the reference database in a fasta format: This reference database has been extracted from the release 117 of EMBL using ecoPCR db_v05_r117.fasta The NCBI taxonomy formatted in the : embl_r117.ndx embl_r117.rdx embl_r117.tdx

  17. 1. Micro assembly of paired-end sequences with illuminapairedend Recover full sequence reads from forward and reverse partial reads > illuminapairedend --score-min=40 -r wolf_R.fastq wolf_F.fastq > wolf.fastq 2. Remove unaligned sequence records with obigrep Unaligned sequences (mode=joined) cannot be used. The following command allows removing them from the dataset: > obigrep -p 'mode!="joined"' wolf.fastq > wolf.ali.fastq The first sequence record of wolf.ali.fastq can be obtained using the following command line: > obihead --without-progress-bar -n 1 wolf.ali.fastq mode!="joined" means that if the value of the mode attribute is different from joined, the corresponding sequence record will be kept.

  18. And the result is:

  19. 3. Assign each sequence record to the corresponding sample/marker combination with NGSfilter > ngsfilter -t wolf_diet_ngsfilter.txt -u unidentified.fastq wolf.ali.fastq > \ wolf.ali.assigned.fastq For instance, the first sequence record of wolf.ali.assigned.fastq is:

  20. 4. Dereplicate reads into uniq sequences with obiuniq For dereplication, we use the obiuniq command with the -m sample. > obiuniq -m sample wolf.ali.assigned.fastq > wolf.ali.assigned.uniq.fasta The first sequence record of wolf.ali.assigned.uniq.fasta is:

  21. 5. Limit number of informations with obiannotate To keep only these two key=value attributes, we can use the obiannotate command: > obiannotate -k count -k merged_sample \ wolf.ali.assigned.uniq.fasta > $$ ; mv $$ wolf.ali.assigned.uniq.fasta The first five sequence records of wolf.ali.assigned.uniq.fasta become:

  22. 6. Filtering sequances by count and length with obigrep Based on the previous observation, we set the cut-off for keeping sequences for further analysis to a count of 10. > obigrep -l 80 -p 'count>=10' wolf.ali.assigned.uniq.fasta \ > wolf.ali.assigned.uniq.c10.l80.fasta The first sequence record of wolf.ali.assigned.uniq.c10.l80.fasta is:

  23. 7. Clean the sequences for PCR/sequencing errors (sequence variants) with obiclean > obiclean -s merged_sample -r 0.05 -H \ wolf.ali.assigned.uniq.c10.l80.fasta > wolf.ali.assigned.uniq.c10.l80.clean.fasta As a final denoising step, using the obiclean program, we keep the head sequences (-H option) that are sequences with no variants with a count greater than 5% of their own count (-r 0.05 option).

  24. 8. Taxonomic assignment of sequences with NCBI BLAST+ blastn Once denoising has been done, the next step in diet analysis is to assign the barcodes to the corresponding species in order to get the complete list of species associated to each sample. Taxonomic assignment of sequences requires a reference database compiling all possible species to be identified in the sample. Assignment is then done based on sequence comparison between sample sequences and reference sequences.

  25. 9. Build a reference database The full list of steps for building this reference database would then be: 1. Download the whole set ftp://ftp.ebi.ac.uk/pub/databases/embl/release/) 2. Download the NCBI ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz) 3. Format them into the ecoPCR format (see obiconvert for how you can produce ecoPCR compatible files) 4. Use ecoPCR to simulate amplification and build a reference database based on putatively amplified barcodes together with their recorded taxonomic information of EMBL sequences (available from: taxonomy (available from:

  26. 10. Download the sequences > mkdir EMBL > cd EMBL > wget -nH --cut-dirs=4 -Arel_std_\*.dat.gz m ftp://ftp.ebi.ac.uk/pub/databases/embl/release/ > cd .. 11. Download the taxonomy > mkdir TAXO > cd TAXO > wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz > tar -zxvf taxdump.tar.gz > cd ..

  27. 12. Format the data > obiconvert --embl -t ./TAXO --ecopcrDB-output=embl_last ./EMBL/*.dat 13. Use ecoPCR to simulate an in silico` PCR > ecoPCR -d ./ECODB/embl_last -e 3 -l 50 -L 150 \ TTAGATACCCCACTATGC TAGAACAGGCTCCTCTAG > v05.ecopcr Note that the primers must be in the same order both in wolf_diet_ngsfilter.txt and in the ecoPCR command.

  28. 14. Clean the database 1. filter sequences so that they have a good taxonomic description at the species, genus, and family levels (obigrep command below). 2. remove redundant sequences (obiuniq command below). 3. ensure that the dereplicated sequences have a taxid at the family level (obigrep command below). 4. ensure that sequences each have a unique identification (obiannotate command below) > obigrep -d embl_last --require-rank=species \ --require-rank=genus --require-rank=family v05.ecopcr > v05_clean.fasta > obiuniq -d embl_last \ v05_clean.fasta > v05_clean_uniq.fasta > obigrep -d embl_last --require-rank=family \ v05_clean_uniq.fasta > v05_clean_uniq_clean.fasta > obiannotate --uniq-id v05_clean_uniq_clean.fasta > db_v05.fasta

  29. 15. Filter database and query sequences by ID to re associate informations with Filter sequences by ID Once the reference database is built, taxonomic assignment can be carried out using the ecotag command. > ecotag -d embl_r117 -R db_v05_r117.fasta wolf.ali.assigned.uniq.c10.l80.clean.fasta > \ wolf.ali.assigned.uniq.c10.l80.clean.tag.fasta The first sequence record of wolf.ali.assigned.uniq.c10.l80.clean.tag.fasta is:

  30. 16. Generate the final result table Some unuseful attributes can be removed at this stage. > obiannotate --delete-tag=scientific_name_by_db --delete-tag=obiclean_samplecount \ --delete-tag=obiclean_count --delete-tag=obiclean_singletoncount \ --delete-tag=obiclean_cluster --delete-tag=obiclean_internalcount \ --delete-tag=obiclean_head --delete-tag=taxid_by_db --delete-tag=obiclean_headcount \ --delete-tag=id_status --delete-tag=rank_by_db --delete-tag=order_name \ --delete-tag=order wolf.ali.assigned.uniq.c10.l80.clean.tag.fasta > \ wolf.ali.assigned.uniq.c10.l80.clean.tag.ann.fasta The first sequence record of wolf.ali.assigned.uniq.c10.l80.clean.tag.ann.fasta is then:

  31. Finally, a tab-delimited file that can be open by excel or R is generated. > obitab -o wolf.ali.assigned.uniq.c10.l80.clean.tag.ann.sort.fasta > \ wolf.ali.assigned.uniq.c10.l80.clean.tag.ann.sort.tab This file contains 26 sequences. You can deduce the diet of each sample: 13a_F730603: Cervus elaphus 15a_F730814: Capreolus capreolus 26a_F040644: Marmota sp. (according to the location, it is Marmota marmota) 29a_F260619: Capreolus capreolus Note that we also obtained a few wolf sequences although a wolf-blocking oligonucleotide was used.

  32. Thank you for listening! : PPT

Related


More Related Content