Microbiome Data Analysis with QIIME: Processing and Data Transformation

1 / 18

Embed Share

Learn about preparing secondary analysis in microbiome data processing, transforming data formats, and utilizing scripts for automation. Explore the steps involved in converting DADA2 sequence tables for use with QIIME and understand the significance of data wrangling in secondary analysis.

jveron Follow

Uploaded on Apr 03, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Lecture #8: Processing Microbiome Data: Secondary Data Analysis with QIIME (Part 1)

No Lecture Next Week Reminder that there is no lecture scheduled for Tuesday, March 28th The HDC is reserved for a conference We will resume with Lecture #9 on Tuesday, April 4th 2

Preparing Secondary Analysis At the end of the primary data analysis steps We have a chimera filtered sequence table from DADA2 We will convert this table to the format needed for use with QIIME There are several data manipulations that are needed to transform the data from the DADA2 format into the files needed for QIIME A script was developed to automate this process: > dada2_to_qiime.py i seqtab.nochim.csv 3

Recall our seqtab.nochim.csv 4

Data Wrangling Transformation of data from one format to another Usually, there is no loss of information Depending on formats and data needed, there can be 5

Transforming the Data The seqtab.nochim.csv has samples in the rows and sequence variants in the column QIIME s biom table format is the transpose of this with samples in the columns and OTUs in the rows QIIME also requires a representative set file with sequences representing each OTU indicated Even though the sequence variants are not OTUs per se, they can be conceptually thought of in the same way as OTUs and treated for secondary analysis 6

DADA2 to QIIME Script > dada2_to_qiime.py i seqtab.nochim.csv This will create: seqtab.nochim.otutable A transposed version of the seqtab.nochim.csv with each sequence variant replaced with a node number seqtab.nochim.repset A mapping between the node number and actual full length sequence variants dada_to_qiime.sh A shell script that we will run next 7

dada_to_qiime.sh > ./dada_to_qiime.sh This shell script will: Build a phylogenetic tree from the sequence variants Perform taxonomy assignment for the sequence variants Create a biom file out of the otutable from previous step Insert a new level of taxa (level 8) for representing the sequence variants (sub-species level) We call this node level 8

dada_to_qiime.sh code > parallel_align_seqs_pynast.py -i seqtab.nochim.csv.repset -o pynast_aligned O 42 > filter_alignment.py -i pynast_aligned/seqtab.nochim.csv_aligned.f asta -o pynast_aligned/ > make_phylogeny.py -i pynast_aligned/seqtab.nochim.csv_aligned.f asta -o tree.tre 9

dada_to_qiime.sh code > parallel_assign_taxonomy_rdp.py -i seqtab.nochim.csv.repset -o rdp_assigned_taxonomy/ -O 32 > biom convert -i seqtab.nochim.csv.otutable --to-json -o otus.json --table-type "OTU table" 10

dada_to_qiime.sh code > biom add-metadata -i otus.json -o otus.taxa.json --observation-metadata-fp rdp_assigned_taxonomy/seqtab.nochim.csv_ta x_assignments.txt --observation-header OTUID,taxonomy,confidence,method --sc- separated taxonomy > biom convert -i otus.taxa.json --to-json -o otus.json --table-type "OTU table" > noderize_med_biom.py -i otus.json -o otus.biom 11

Secondary Analysis The final products of the dada_to_qiime.sh transition that are needed are the otus.biom and tree.tre files The otus.biom file is our qiime formatted otu table It doesn t contain OTUs but instead has nodes The tree.tre file is a phylogenetic tree built from our dada2 sequence variants We also need the mapping file that contains the metadata for our study Recall what that looks like 12

Mapping Information 61 Total Samples 20 CHOW, 20 HFD, 18 RS, 1 CHOWD, 1 HFDD, 1 RSD 13

Parameters file Lastly we need to create a parameters file for QIIME: > summarize_taxa:level 1,2,3,4,5,6,7,8 > plot_taxa_summary:labels Kingdom,Phylum,Class,Order,Family,Genus,Species,Node > alpha_diversity:metrics shannon,simpson,PD_whole_tree,chao1,observed_species > multiple_rarefactions:min ??? > multiple_rarefactions:max ??? > multiple_rarefactions:step ??? > beta_diversity:metrics bray_curtis,unweighted_unifrac,weighted_unifrac > beta_diversity_through_plots:seqs_per_sample ??? 14

Rarefaction Level In any microbiome sequencing experiment, there are several samples that get sequenced We aim for uniform read depth in each sample In reality, samples get sequenced at different depths In order to fairly compare the microbial communities across these samples, a rarefaction level is chosen and a random sample of reads is taken from each sample This is called the rarefaction level and is chosen based on the distribution of read counts in each sample

Summarize Biom Table > biom summarize-table i otus.biom o table_summary Num samples: 61 Num observations: 803 Total count: 1145975 Table density (fraction of non-zero values): 0.165 Counts/sample summary: Min: 5180.0 Max: 51789.0 Median: 17372.000 Mean: 18786.475 Std. dev.: 9384.608 Sample Metadata Categories: None provided Observation Metadata Categories: taxonomy; confidence; method 16

Summarize Biom Table Counts/sample detail: BKF32: 5180.0 BKF28: 5840.0 BKCM04: 7093.0 BKCM07: 7170.0 BKF31: 7282.0 BKF22: 7920.0 BKF16: 8157.0 BKF06: 8289.0 BKCM32: 8666.0 BKF14: 8973.0 BKCM21: 9199.0 BKF40: 9341.0 BKCM36: 10613.0 BKF24: 10936.0 BKF29: 11837.0 BKF03: 12186.0 BKCM34: 12633.0 BKF25: 12920.0 BKF27: 13345.0 BKCM06: 13955.0 BKCM13: 14238.0 BKF13: 14565.0 BKF07: 14611.0 BKCM02: 14656.0 BKCM03: 15545.0 BKCM05: 15658.0 BKF05: 15674.0 BKCM01: 15824.0 BKCM27: 16337.0 BKF01: 16502.0 BKCM24: 25625.0 BKCM12: 25647.0 BKCM22: 26280.0 BKCM29: 26633.0 BKCM31: 27575.0 BKF21: 28347.0 BKCM28: 28512.0 BKF36: 29231.0 BKCM09: 29637.0 BKCM11: 30852.0 BKCM14: 32746.0 BKHFD: 36383.0 BKCM16: 39225.0 BKRS: 40385.0 BKCM10: 51789.0 BKF04: 17372.0 BKF11: 17754.0 BKF02: 17861.0 BKF12: 18101.0 BKCM33: 18327.0 BKF10: 18673.0 BKF09: 19123.0 BKF33: 19476.0 BKF35: 19519.0 BKF34: 21032.0 BKCM15: 21407.0 BKCHOW: 21577.0 BKCM40: 21805.0 BKCM35: 22188.0 BKCM25: 22663.0 BKF15: 25085.0 17

Parameters file Now we can fill in the parameters file for QIIME: > summarize_taxa:level 1,2,3,4,5,6,7,8 > plot_taxa_summary:labels Kingdom,Phylum,Class,Order,Family,Genus,Species,Node > alpha_diversity:metrics shannon,simpson,PD_whole_tree,chao1,observed_species > multiple_rarefactions:min > multiple_rarefactions:max > multiple_rarefactions:step > beta_diversity:metrics bray_curtis,unweighted_unifrac,weighted_unifrac > beta_diversity_through_plots:seqs_per_sample 100 5100 1000 4000 18

Microbiome Data Analysis with QIIME: Processing and Data Transformation

Download Presentation

Presentation Transcript

Related

More Related Content