CITI Program's Mission in Research Education

Slide Note

Dive into the mission and training topics provided by the Collaborative Institutional Training Initiative (CITI Program) for enhancing research, ethics, and regulatory oversight. Learn how to register with ease for CITI training affiliated with Chapman University, ensuring quality education for researchers and ethics committee members. Explore step-by-step tutorials to start your research education journey through CITI's comprehensive courses.

vyan Follow

Uploaded on Apr 04, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

RAL Disk Storage for IRIS Tom Byrne Senior Data Storage Systems Architect October, 2023

Introduction The SCD Data Services Group provides disk storage to numerous services and projects, almost all of this is provisioned utilising the software-defined storage technology called Ceph We use the community version of Ceph and pay no licencing fees, and we get injections of capital for hardware Ceph is an open source, distributed storage technology with a large community of users and developers

Data replication or Erasure coding? Traditionally, data integrity is ensured by replicating an object Ceph originally only supported replication, 3 replica being the common configuration An object must be written to 3 disks before the client is told the write is complete Modification must be done on all copies (e.g. 3 disks) Reads only need to read from one disk Replication advantages: simple data integrity considerations and consistent performance Erasure coding support added in 2014 provided a cheaper data storage option But it comes with a few trade-offs to consider

Erasure coding considerations Writes an object s data is spread out (rather than replicated) over multiple disks, less total data to write However, parity calculations are done by a disk server before the writes to disk, meaning more CPU required compared to replication Modification any modification requires parity recalculation, so the performance varies, and is highly workload dependent EC pools add significant complexity for partial object updates, and were not always supported Reads no EC calculations required so read performance is comparable More drives to spread data retrieval over, but more network hops EC can be significantly more space efficient in general e.g. 13PB of raw storage gives 9PB useable with EC 8+3, and only 4.3PB with replication. However

Erasure coding at the RADOS object level An object being written is processed in stripe_width size stripes on the primary OSD. stripe_width is a per-pool setting, and is usually in the range of a few kilobytes to a few tens of kilobytes dataset/file1.root.0000000000000000 Each stripe is divided into k data chunks, and m parity chunks are calculated. (k=8, m=3 in this example)

Erasure coding at the RADOS object level (2) Each stripe chunk is sent to a different OSD in the placement group and is stored on disk OSD 8 OSD 9 OSD 10 OSD 0 OSD 1 OSD 2 OSD 3 OSD 4 OSD 5 OSD 6 OSD 7 This is repeated for every stripe of the object, appending to the on-disk object OSD 8 OSD 9 OSD 10 OSD 0 OSD 1 OSD 2 OSD 3 OSD 4 OSD 5 OSD 6 OSD 7 The resulting on-disk objects are not continuous blocks of the original file, but are stripe_width/m size segments, with stripe_width size gaps

EC small file overhead In general, a larger stripe width = more efficient bulk EC operations Default is k * 4KB e.g. 32KB for an 8+3 pool Objects will be padded to a multiple of stripe width E.g. a 1KB object is padded to 32KB, a 33KB object to 64KB This padding has to be stored on disk This overhead is negligible on large files (where object size >> stripe width), but on smaller files it can become significant If small files are a significant fraction of stored data volume, this can eat into the EC storage overhead benefits vs replication.

Sirius Sirius is the Ceph cluster providing Block Storage to the STFC Cloud Originally was a HDD based cluster, providing all block device storage for the cloud Growing cloud usage resulted in more IOPS requirements, Sirius was moved to all flash based storage in 2019/2020 Growth and changes in Cloud usage required a shift to ephemeral storage for VM block devices as the default Sirius is currently used for image storage, and persistent volumes only when requested. The plan is to move VM image storage to Echo Sirius has always been 3 replica due to the random IO requirements

Echo Echo provides large scale object storage, which is publicly accessible Main use case is LHC experiment data storage All data pools are EC 8+3 65PB usable storage which will increase to 80PB Storage specification changes in the latest generations to provide improved metadata performance for S3/Swift Currently provides Swift storage via OpenStack available to users via web interface or command line The storage layer is provided by Echo and the storage access layer provided by the OpenStack Swift client Storage quotas are managed via projects in OpenStack; this is more cost-effective storage solution compared to others so users can get larger quotas of object storage In the process of enabling S3 API access for Openstack users EC2 credentials will be able to used directly against the Echo S3 endpoint to access your projects data

Deneb Deneb is a CephFS cluster It is now an 8+3 EC cluster started life as a 3-way replicated pool We at RAL built up expertise and confidence in Ceph EC Different CephFS use cases over the past 3 years were tested with replication and EC before settling on EC CephFS mounts on Linux servers running at RAL Access is not managed by OpenStack, but a significant amount of access is via OpenStack cloud VM mounts Useful for large amounts of shared filesystem, the users control access and permissions Example use cases include: Scratch space for analysis data User home directories and experimental data for DAaaS use cases for CLF and ISIS

Arided Arided is an all SSD CephFS Cluster in Early Access Currently 3 way replica while we gather information on workloads and filesizes Provides OpenStack Manila service Shared File Systems as a Service Users can self provision native CephFS shares and mount them on their VMs Designed to provide a high performance shared file system across the Cloud OpenStack users can request a quota from the cloud team Users will be expected to move data off this storage when not actively using it The STFC cloud supports a wide variety of use cases and ensuring the hardware spec and software configuration are fit for all purposes is the priority

12 RAL disk storage service capabilities Cluster: Echo Deneb Arided Sirius Large scale object storage which stores data as unstructured objects rather than in a hierarchical structure Large scale CephFS file storage. Supports shared access to a traditional hierarchical directory structure Self-service high performance CephFS file storage for STFC Cloud users Block device storage for STFC Cloud Purpose: Web-accessible storage to underpin services and infrastructure Providing local user groups (including STFC Cloud users) with shared filesystem storage Provisioning temporary shared file storage for experimental data caches and scratch space Persistent volumes for critical VM data Use case Experimental data uploaded via S3, accessed by analysis workflows on the STFC cloud Shared filesystem mounted on analysis VMs for user home directories and experimental data cache Persistent volumes for analysis workflow working directories in kubernetes clusters Supporting a VM that may need to be migrated between hypervisors Example 60 200 900 1000 Approx. cost per TB

CITI Program's Mission in Research Education

Download Presentation

Presentation Transcript

Related

More Related Content