
Automated Read-based Metagenomic Analysis Pipeline Overview
Discover how ARMAP provides automated read-based metagenomic analysis, including quality evaluation, filtering, duplicate removal, data splitting, and taxonomic classification. See results of quality evaluation, read summary after filtering, and community structure analysis.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Automated Read-based Metagenomic Analysis Pipeline (ARMAP)
Automatic Read-based Metagenomic Analysis Pipeline (ARMAP) Quality evaluation Read statistics Quality filtering Duplicate removal Quality evaluation Raw data Duplication removal NGS QC Toolkit Table of raw read and base number, duplicate rate, HQ read and base number, data usage Quality filtering FastQC CD-HIT Trim base, filter reads with poor average score and ambiguous bases Read statistics Format convert Remove duplication Quality, duplicate, contamination Calculate quality profile Data splitting DIAMOND search Table integration MEGAN classification DIAMOND search Data splitting Format convert MEGAN classification Table integration Comparison of multiple samples in taxonomy and function MEGAN6 (UE) DIAMOND fastasplitn NGS QC Toolkit Taxonomic and functional classification Convert to fasta Search against NR Break down data Includes a wrapper script with 1100 lines of Perl code and 8 command files. Current version: 1.4.1
Log file e.g.
Results of quality evaluation -rw-r--r--. 1 tianrm ieg 0 Jan 15 19:43 1_done -rw-r--r--. 1 tianrm ieg 264K Jan 15 19:43 1_R1_fastqc.html -rw-r--r--. 1 tianrm ieg 274K Jan 15 19:43 1_R1_fastqc.zip -rw-r--r--. 1 tianrm ieg 249K Jan 15 19:43 1_R2_fastqc.html -rw-r--r--. 1 tianrm ieg 261K Jan 15 19:43 1_R2_fastqc.zip -rw-r--r--. 1 tianrm ieg 425 Jan 15 19:43 fastqc_1.sh -rw-r--r--. 1 tianrm ieg 184 Jan 15 19:43 log.err -rw-r--r--. 1 tianrm ieg 60 Jan 15 19:43 log.out
Results of read summary after quality filtering e.g. Sam ple ID Raw reads Raw base Quality filtering Read1 No. of HQ reads Total base(H Q reads) Data usage rate Duplication duplica ted reads Read2 No. of HQ reads duplic attion rate No. of bases No. of HQ bases Q20 (percenta ge of HQ bases) No. of bases No. of HQ bases Q20 (percent age of HQ base) 1 5000 755000 4918 1.6% 2432 364800 359928 98.7% 2432 364800 356990 97.9% 729600 96.6% 2 5000 755000 4936 1.3% 2452 367800 362769 98.6% 2452 367800 360225 97.9% 735600 97.4% 3 5000 755000 4936 1.3% 2452 367800 362769 98.6% 2452 367800 360225 97.9% 735600 97.4%
Results of community structure e.g.: phylum level Sample ID Candidatus Rokubacteria Acidobacteria candidate division NC10 Candidatus Tectomicrobia Crenarchaeota Streptophyta Candidatus Doudnabacteria Glomeromycota Candidatus Dependentiae Proteobacteria Armatimonadetes Euryarchaeota Chlamydiae Candidatus Campbellbacteria Cyanobacteria Ascomycota Gemmatimonadetes Candidatus Eisenbacteria Amoebozoa Lentisphaerae Candidatus Omnitrophica Chordata Verrucomicrobia Firmicutes environmental samples <Bacteria> Candidatus Saccharibacteria Actinobacteria <phylum> Microgenomates group 1 2 3 0.12% 3.45% 0.00% 0.04% 0.04% 0.00% 0.00% 0.06% 0.04% 11.10% 0.04% 0.45% 0.06% 0.02% 0.10% 0.19% 0.43% 0.00% 0.04% 0.00% 0.02% 0.04% 0.62% 0.39% 0.33% 0.04% 8.51% 0.04% 0.10% 3.22% 0.06% 0.08% 0.00% 0.04% 0.04% 0.12% 0.00% 10.95% 0.06% 0.49% 0.00% 0.00% 0.20% 0.16% 0.22% 0.04% 0.00% 0.02% 0.02% 0.00% 0.51% 0.27% 0.08% 0.04% 8.10% 0.00% 0.10% 3.22% 0.06% 0.08% 0.00% 0.04% 0.04% 0.12% 0.00% 10.95% 0.06% 0.49% 0.00% 0.00% 0.20% 0.16% 0.22% 0.04% 0.00% 0.02% 0.02% 0.00% 0.51% 0.27% 0.08% 0.04% 8.10% 0.00%
Results of functional profiling - SEED e.g.: level 1 Sample ID SEED;Motility and Chemotaxis; SEED;Phages, Prophages, Transposable elements; SEED;Sulfur Metabolism; SEED;Cell Division and Cell Cycle; SEED;Cofactors, Vitamins, Prosthetic Groups, Pigments; SEED;Iron acquisition and metabolism; SEED;Transcriptional regulation; SEED;Plant cell walls and outer surfaces; SEED;Stress Response; SEED;Plant Glucosinolates; SEED;DNA Metabolism; SEED;Fatty Acids, Lipids, and Isoprenoids; SEED;Regulation and Cell signaling; SEED;Carbohydrates; SEED;Metabolism of Aromatic Compounds; SEED;Virulence; SEED;Nitrogen Metabolism; SEED;RNA Metabolism; SEED;Protein Metabolism; SEED;Phosphorus Metabolism; SEED;Dormancy and Sporulation; SEED;Phages, Prophages, Transposable elements, Plasmids; SEED;Potassium metabolism; 1 2 3 0.31% 0.04% 0.54% 0.39% 2.22% 0.06% 0.06% 0.12% 0.62% 0.06% 0.93% 1.01% 0.25% 2.57% 0.60% 0.70% 0.43% 0.68% 1.40% 0.21% 0.08% 0.12% 0.02% 0.29% 0.06% 0.22% 0.29% 2.35% 0.10% 0.08% 0.04% 0.71% 0.00% 0.92% 1.04% 0.41% 2.53% 0.51% 1.04% 0.25% 0.67% 1.37% 0.16% 0.04% 0.29% 0.29% 0.29% 0.06% 0.22% 0.29% 2.35% 0.10% 0.08% 0.04% 0.71% 0.00% 0.92% 1.04% 0.41% 2.53% 0.51% 1.04% 0.25% 0.67% 1.37% 0.16% 0.04% 0.29% 0.29%
Results of functional profiling - SEED e.g.: level 3 SEED;Carbohydrates;Lactate utilization temp;Succinate-semialdehyde dehydrogenase [NADP+] (EC 1.2.1.79); 0.00062 0 0 SEED;Carbohydrates;Lactate utilization;Predicted L-lactate dehydrogenase, Fe-S oxidoreductase subunit YkgE; 0.00041 0 0 SEED;Carbohydrates;Lactose and Galactose Uptake and Utilization;Beta-galactosidase (EC 3.2.1.23); 0 0.0002 0.0002 SEED;Carbohydrates;Maltose and Maltodextrin Utilization;Alpha-amylase (EC 3.2.1.1); 0 0.00041 0.00041 SEED;Carbohydrates;Maltose and Maltodextrin Utilization;Maltodextrin glucosidase (EC 3.2.1.20); 0.00021 0 0 SEED;Carbohydrates;Propionyl-CoA to Succinyl-CoA Module;Methylmalonyl-CoA mutase (EC 5.4.99.2); 0.00041 0.00041 0.00041 SEED;Carbohydrates;Propionyl-CoA to Succinyl-CoA Module;Propionyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.3); 0 0.00041 0.00041 SEED;Carbohydrates;Propionyl-CoA to Succinyl-CoA Module;Propionyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.3); 0 0.0002 0.0002 SEED;Carbohydrates;Pyruvate metabolism II: acetyl-CoA, acetogenesis from pyruvate;NAD-dependent protein deacetylase of SIR2 family; 0.00041 0 0 SEED;Carbohydrates;Sugar utilization in Thermotogales;Beta-xylosidase (EC 3.2.1.37); 0.00041 0 0 SEED;Carbohydrates;Sugar utilization in Thermotogales;Xylulose kinase (EC 2.7.1.17); 0.00041 0 0
Results of functional profiling - KEGG e.g.: level 2 Sample ID KEGG;Cellular Processes;Cell growth and death; KEGG;Cellular Processes;Cell motility; KEGG;Cellular Processes;Transport and catabolism; KEGG;Environmental Information Processing;Membrane transport; KEGG;Environmental Information Processing;Signal transduction; 1 2 3 0.21% 0.56% 0.64% 2.36% 0.70% 0.20% 0.33% 0.67% 2.12% 0.78% 0.20% 0.33% 0.67% 2.12% 0.78% KEGG;Environmental Information Processing;Signaling molecules and interaction; 0.00% 0.04% 0.04% KEGG;Genetic Information Processing;Folding, sorting and degradation; KEGG;Genetic Information Processing;Replication and repair; KEGG;Genetic Information Processing;Transcription; KEGG;Genetic Information Processing;Translation; KEGG;Human Diseases;Antimicrobial resistance; 0.64% 0.51% 2.43% 0.82% 0.08% 0.69% 0.71% 2.79% 1.02% 0.29% 0.69% 0.71% 2.79% 1.02% 0.29%
Results of functional profiling - GO e.g.: level 2 Sample ID InterPro2GO;Unclassified;IPR026523 Paraneoplastic antigen Ma; InterPro2GO;GO:0008150 biological_process;GO:0048870 cell motility; InterPro2GO;Unclassified;IPR016046 Transcription initiation Spt4-like; InterPro2GO;GO:0003674 molecular_function;GO:0016787 hydrolase activity; InterPro2GO;GO:0008150 biological_process;GO:0006935 chemotaxis; InterPro2GO;Unclassified;IPR020599 Translation elongation factor P/YeiP; InterPro2GO;GO:0008150 biological_process;GO:0006810 transport; InterPro2GO;Unclassified;IPR023970 Methylthiotransferase/radical SAM-type protein; InterPro2GO;GO:0008150 biological_process;GO:0016043 cellular component organization; InterPro2GO;GO:0008150 biological_process;GO:0009405 pathogenesis; InterPro2GO;GO:0003674 molecular_function;GO:0005215 transporter activity; InterPro2GO;GO:0008150 biological_process;GO:0006950 response to stress; InterPro2GO;GO:0008150 biological_process;GO:0007154 cell communication; InterPro2GO;Unclassified;IPR000362 Fumarate lyase family; InterPro2GO;Unclassified;IPR030664 Succinate dehydrogenase/fumarate reductase, alpha/adenylylsulphate reductase subunit; InterPro2GO;Unclassified;IPR000801 Putative esterase; InterPro2GO;Unclassified;IPR006533 Type VI secretion system, RhsGE-associated Vgr protein; InterPro2GO;GO:0003674 molecular_function;GO:0004872 receptor activity; InterPro2GO;GO:0003674 molecular_function;GO:0016740 transferase activity; InterPro2GO;GO:0003674 molecular_function;GO:0000156 phosphorelay response regulator activity; 1 2 3 0.00% 0.04% 0.04% 2.63% 0.08% 0.00% 2.59% 0.16% 0.45% 0.00% 1.23% 0.66% 0.04% 0.12% 0.02% 0.08% 0.00% 2.10% 0.18% 0.04% 2.39% 0.02% 0.22% 0.04% 1.61% 0.69% 0.10% 0.08% 0.02% 0.08% 0.00% 2.10% 0.18% 0.04% 2.39% 0.02% 0.22% 0.04% 1.61% 0.69% 0.10% 0.08% 0.00% 0.00% 0.04% 0.04% 2.82% 0.00% 0.08% 0.04% 0.02% 0.02% 2.22% 0.02% 0.08% 0.04% 0.02% 0.02% 2.22% 0.02%