
Optimal Enhancements for OwnCloud Protocol in Scientific Environments
Explore the necessity of revisions in OwnCloud protocol for efficient synchronization in scientific settings, with a focus on enhancements like bundling, delta syncing, compression, and chunk size adjustments based on a study conducted at CERN in 2016.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
A study of delta sync and other optimisations in HTTP/WebDav synchronisation protocols Do we need changes in OwnCloud protocol? Wojciech Jarosz AGH University of Science and Technology / CERN
Introduction Owncloud protocol, CERNBox service Enhancing current protocol Investigation of following enhancements: o Bundling o Delta-syncing o Compression o Chunk size adjustment Context: scientific environment at CERN CS3 Zurich, January 2016 2
Introduction Proposed implementation Analysis Simulation Decision Data from CERNBox FS and network logs CS3 Zurich, January 2016 3
CERNBox Distinguished features: o Integrated with 80PB of physics data o Future: easy and effective to share experiment results o Future: focus on scientific usage o Currently: a mix of scientific and personal use CS3 Zurich, January 2016 4
CERNBox as of Oct 15 ~ 31 TB of data ~ 3700 users ~ 24 milion files in ~ 3 milion directories Average file size: ~ 1.3 MB, median file size < 100kB 200k file uploads / downloads per day CS3 Zurich, January 2016 5
Filesizes Files by size 8 x 10^6 7 6 5 4 3 2 1 0 0 1 - 9 b 10 - 99 b 100 - 1000b 1kb - 10kb 10kb - 100kb 100kb - 1mb1mb - 10mb 10mb - 100mb 100mb - 1gb over 1gb CS3 Zurich, January 2016 6
Files count and size 7 M 6 5 4 3 Count size(GB) 2 1 0 No extension CS3 Zurich, January 2016 7
Where are the transfers coming from? Transfers CERN Unviersities / Institutions Others CS3 Zurich, January 2016 8
Downloads vs Uploads GETs vs PUTs 44% 56% PUT GET CS3 Zurich, January 2016 9
Protocol - chunking Could be used for: o partial upload o delta-sync o deduplication Is the chunk size chosen correctly? o Most of the files are small o Modern protocols should use network-aware chunking Currently only ~0.15% of all PUTs are chunked Is dynamic chunking a viable option? CS3 Zurich, January 2016 10
Enhancements to the current OwnCloud protocol Focus on bundling, delta-sync and compression CS3 Zurich, January 2016 11
Bundling Typically users are active only a few days a month Sample user transfers count 200000 150000 100000 50000 0 2/15/2015 4/6/2015 5/26/2015 7/15/2015 9/3/2015 10/23/2015 -50000 CS3 Zurich, January 2016 12
Bundling Even power users work in cycles Power user file transfers 25000 20000 15000 10000 5000 0 3/1/2015 4/20/2015 6/9/2015 7/29/2015 9/17/2015 CS3 Zurich, January 2016 13
Bundling Typically users are active only a few days a month Often over 2000 requests in 10 minutes Small file size tar untar Implementation? Simple bundling TARBall? Choose the right bundle size Send chunks in parallel Error reporting CS3 Zurich, January 2016 14
Bundling DROPBOX[1] Before bundling After bundling Median flow size 16.2 kB 42.4 kB Throughput PUT 358 kbit/s 552.92 kbit/s Throughput GET 783 kbit/s 1294 kbit/s CERNBOX* Before bundling After bundling Throughput PUT ~3600 kbit/s Up to 400 Mbit/s ? Throughput GET ~7653 kbit/s Up to 500 Mbit/s ? Reduce TCP slow-start effect [1] I. Drago, M. Mellia, M. M. Munaf`o, A. Sperotto, R. Sadre, and A. Pras. Inside Dropbox: Understanding Personal Cloud Storage Services. In Proceedings of the 12th ACM Internet MeasurementConference, IMC 12, pages 481 494, 2012. * Based on users inside CERN and affiliated institutions CS3 Zurich, January 2016 15
Extensions and filesizes 7 M 6 5 4 Count 3 size(GB) 2 1 0 root null jpg mp4 pdf mov enc avi zip mp3 gz img epio pptx 1 wav txt png iso nef ? CS3 Zurich, January 2016 16
Delta-sync About 7.8 % of the files are versions Typically files are modified the same day Usually small files 6 x100 (GB) / x10^5 (count) 5 4 Count 3 Size 2 1 0 root mov pptx pdf mp4 zip h5 key bz2 jpg tc null vdi gz epio pxp hep tgz f4v CS3 Zurich, January 2016 17
ROOT files Scientific software framework Complex file structure Already compressed Small changes scattered throughout the file CS3 Zurich, January 2016 18
Delta-sync Possible implementations o Chunk-based o Byte-range request More data and simulation needed It might be not worth implementing CS3 Zurich, January 2016 19
Compression From TOP20 extensions (sizewise) only .txt will compress well Compression can be slow, but almost all requests are executed from desktop clients 8 M 6 Count 4 size(GB) 2 0 root null jpg mp4 pdf mov enc avi zip mp3 gz img epio pptx 1 wav txt png iso nef CS3 Zurich, January 2016 20
Future - service CernBOX fully exposed to a very large scientific repository (ATLAS, LHCb, CMS ) Fuse-mount to underlying CernBOX storage available everywhere at CERN Will users use CERNBox in new ways? CS3 Zurich, January 2016 22
Conclusion Owncloud protocol is simple, but is it enough? Understand before implementation Proposed implementation Analysis Simulation Decision Work in progress! MSc at AGH CS3 Zurich, January 2016 23
Conclusion Bundling looks like the most viable enhancement Further research is needed for delta-sync and dynamic chunking Compression is less likely to enhance current protocol CS3 Zurich, January 2016 24
Opinions / questions most welcome! How the usage compares to your system? How to implement the new features? Feedback, ideas, comments Contact details Wojciech Jarosz Wojciech.Jarosz@cern.ch +41 22 76 75970 CS3 Zurich, January 2016 25