Marine Metagenomics Data Architecture

Marine Metagenomics Data Architecture
Slide Note
Embed
Share

Marine metagenomics data architecture involves three tiers of database curation, storage, transfers, and pipelines for analyzing marine genetic information. The architecture includes MarRef as Tier 1 for complete genomes, MarDB as Tier 2 for marine genome projects, and MarCat as Tier 3 for assembled metagenomics and metatranscriptomics reads. Data storage architecture encompasses Reference DB, Spark, HDFS, and various tools while data transfers are managed between ENA and Troms. Pipelines like EMG/MGP and META-pipe are optimized for cloud environments and data benchmarking.

  • Marine Metagenomics
  • Data Architecture
  • Database Curation
  • Data Storage
  • Pipelines

Uploaded on Mar 16, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. WP6: Marine metagenomics IngeAlexander Raknes, Giacomo Tartari(ELIXIR-NO) ELIXIR All Hands, 8-9 March 2016, Barcelona, Spain ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559. www.elixir-europe.org/excelerate

  2. Use case architecture

  3. Database Tier 1 -MarRef Gold Standard and build upon complete marine prokaryotic, eukaryotic and virus genomes available in UniProtproteome database. Manually curated. Tier 2 MarDB Includes all prokaryotic, eukaryotic and virus genomes independent of whether they are complete or not. Manually curated at the beginning. Later there will be standards to avoid manual curation. Tier 3 -MarCat Based upon annotation of assembled marine metagenomicsand metatransciptomicsreads.

  4. Tier 1 Tier 1 MarRef (Gold standard complete genomes ) ENA/Genebank/DDBJ RefSeq Manual curation and enrichment MarRef Nucleotide MarRef MarRef Protein

  5. Tier 2 Tier 2 MarDb marine genome database Genome Projects ENA/Genebank/DDBJ MarDb Nucleotide MarineDb MarDb Protein

  6. Tier 3 Tier 3 Marine gene catalogue Marine metagenomics reads EBI metagenomics ENA Marine metatranscriptome reads ENA Tier1 database Tier2 database META-pipe MarCat Nucleotide MarCat gene catalogue MarCat Protein

  7. Data Storage Architecture Reference DB Spark HDFS big data WEB GUI Curator REST API gridFTP SQL metadata Scientist Admin NorStore backup

  8. Data Transfers Transferred 36 projects/studies from ENA to Troms Temporarily parked data on NorStore staging area Thanks to Tony Wildish and Thierry Toutain Not the expected speed investigation in progress

  9. Pipelines EMG/MGP: porting to cloud (Embassy cloud or Amazon EC2) META-pipe: adapting to Apache Spark Defining set of tools for benchmarking Defining data standards

  10. Meta-pipe architecture Execution environments Execution Manager (Stallo) Execution Manager (CSC) Web front-end CLI Tool Execution Manager (ICE-2) Execution Manager (anywhere else?) Public API Elixir AAI Auth Storage Job Service - - Tokens Authentication events - - Inputs / uploads Outputs / downloads - - Job queue Execution status

  11. Spark Meta-pipe Currently have a set of tools that are individually submitted to Torque Implement the workflow execution of Meta- pipe in Spark Already have most of the Meta-pipe codebase written in Scala

  12. Cloud Deployment Use cPoutaas a computational backend for Meta-pipe Other environments could be Amazon, etc. Looking into technologies like AppImage to make it more easily deployable

  13. Tasks March -June AAI Integration Spark backend for Meta-pipe cPouta evaluation Tool Benchmarking June -December Prototype database

  14. Conclusions Design document for the DataBase https://github.com/uit-no/elixir- excelerate/blob/master/reference-database.md

Related


More Related Content