Efficient Data Management and Processing with EGI DataHub

egi datahub for n.w
1 / 9
Embed
Share

Explore the EGI DataHub for seamless data management and processing presented by Lukasz Dutka. Covering topics such as architecture, data ingestion, access control, and more. Learn about OneData distributed data management and strategies for data ingestion into spaces. Discover how to access and process data unified by a transparent data layer. Dive into OneClient's direct access mode for efficient storage access. Understand data access control for service providers and data owners within the EGI ecosystem.

  • Data Management
  • Data Processing
  • EGI DataHub
  • Data Ingestion
  • Access Control

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. EGI DATAHUB FOR DATA MANGEMENT AND PROCESSING Presented by: Lukasz Dutka

  2. AGENDA 01 02 03 04 05 Architecture Initial Data Ingestion Access and processing data Authentication and Authorization Data Migration and Replication

  3. ONEDATA DISTRIBUTED DATA MANAGEMENT Cloud Cloud / HPC Data Processing Centre Worker Nodes POSIX, S3, WebDAV Local Storage Ceph , LUSTRE, S3, NFS Direct Accesss When Possible Oneprovider Oneprovider HPC Data Processing Worker Nodes Local Storage Ceph , LUSTRE, S3, NFS Direct Accesss When Possible Oneprovider datahub.egi.eu

  4. DATA INGESTION INTO SPACES STRATEGY 1 - PUSH STRATEGY 2 Selective PULL Pushing files Registration and Selective Data Transfers (in one step registration and data upload): Option 1. WebGUI Use Gateway Provider Option 2. REST API CDMI for files and metadata Configure External storage: S3, XRootD, HTTP, WebDAV, management SWIFT, POSIX Option 3. OnedataFS Python overlay for virtual File Registration: file system Option 1: Enable Scanning, or Option 4. Oneclient FUSE driver Option 2: Register Files via API Data transfer: Option 1: Transfer Jobs, or Option 2: Define QoS Rules on a directory or Files

  5. HOW TO ACCESS AND PROCESS DATA? UNFORM MANAGED BY TRANSPARENT DATA LAYER LOCAL FILE SYSTEMS SYNCED TO/FROM ONEDATA File system mounted on worker nodes using Onedata is syncing local storage in and out Oneclient FUSE Python applications using OnedataFS python system overlay (for JupyterHub) Data can be replicated to processing centre: Data can be replicated to processing centre: as a replication job before processing as a replication job before processing on the fly when it needed job related tagging for data modified during the processing session

  6. ONECLIENT DIRECT ACCESS MODE P2P Storage Access P2P Direct Access when possible P2P Lustre Ceph S3 Parallel Processing Nodes using POSIX oneclient FUSE 6

  7. HOW TO DATA ACCESS CONTROL SERVICE PROVIDERS DATA OWNERS Users authenticated by EGI Checkin System administrators working with POSIX based EGI Datahub inherits user membership groups backend storages should define rules of users Data owners are able to control access to the spaces mapping in case it matters Global user Local POISX UID based on: Individual invitations Group invitations (only for groups managers) Users defines POSIX or ACL access modes to data

  8. USEFUL FEATURES DATA MIGRATION AND QUALITY OF SERVICE QoS Rules Driven Data Migrations: Flexible way of managing the expectations of data location in a distributed environment Support for Hierarchy of QoS rules Many metrics: geographical data locations, storage durability, storage performance etc. Efficient way of validating if particular file or directory fulfils the expected rules Data migration tasks able to move data: Based data structure as a transfer task Based on filtered data related to metadata Based on filtered data tagged (touched) by processing jobs

  9. QUESTIONS?

Related


More Related Content