
Photon and Neutron RIs Data Management Workshop Uses Cases
This project, funded by the EU's Horizon 2020 programme, focuses on data management workshop findings related to Photon and Neutron Research Infrastructures (RIs). It delves into various scenarios involving data archiving, remote data analysis, service integration, and more within the realm of scientific research facilities.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Data Management Workshop Photon and Neutron RIs uses cases 3rd July, 2019 Author: J-F. Perrin (ILL) This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No. 823852
Who are we? 50,000 users Biology, Medicine, Materials, Chemistry, Nuclear Physics, Particle Physics, Cultural heritage, Geology and industrial applications. State of the art Large Scale Facilities 5 ESFRIs + 25 national RIs (PaNs) Data policies implementing FAIR principles PaNdata data policy 10s of Petabytes of scientific data, curated and archived for 5-10+ years PaNs manage and provide access to data from experiments across Europe Working together PaNOSC + ExPANDS PaNOSC: 4 Years project starting Dec 2018. 2
Current typical PaN RI Data Architecture 3
1) Remote archive of a facility experimental data Story: This is a relatively simple use case: a PaNOSC RI wants to get a cold archive of the experimental data produced on its premises. Initial transfer is in the order of 600TB, the annual volume of data that should be transferred is in the order of 300TB. (it could vary up to 10PB for some facilities). The facility needs the possibility to retrieve data on demand (in case of failure of its internal/primary storage) Actors: The user facility and the archive provider ( a second, more complex, scenario could further extend this list to the facility users and other service provider ) Potential constraints: Grid-FTP support ? 4
2) Remote Data analyses using Jupyter notebook services of another provider (EGI in this particular case) Story: A scientist wants to perform analysis on data sets obtained during an experiment at one of the PaNOSC RIs. The compute and data infrastructure are distant. The scientist should be able to get his data from the RIs and store back the results at the same facility. Datasets are still under embargo, which means that the data at the facility could only be accessed by a user with the sufficient authorizations (the user has to be part of the proposal team). The RIs offers a service to host the results of the analyses, but the user could decide to host the data elsewhere under his own responsibility. Actors: The user facility holding the data, the user performing the analysis, the Jupyter service provider. Potential constraints: Authorization are currently managed at the RI side, Security of data access, different AAI 5
Service and data integration: simple case Jupyter notebook Service hosted by a single facility Translation to local ID + Authorization Data archives at RI 6
Service and data integration: more complex case Jupyter notebook Service hosted by another provider Translation to local ID + Authorization Data archives at RI 7
Aim : o Keep it simple, at least for users Potential leads: o Users transfer themselves the data o What are the limits (size, costs, usability) of this model? o Integrate third party services as part of our federation (or accept similar solutions i.e. AARC2 development) o Proxy authentication needed. o N-N relation problem, what are the limits? o Move from local authorization model to a community one o Authorization could be describe at the UmbrellaID level (notion of group and ressources) o No need for local Id translation before authZ. o Trust the service providers to enforce this authZ? o Focus on Open Data o Release the local authentication requirements for accessing data? o Move the auth requirement from local to community authentication? 8
3) User's data transfer to its home organization/home computer. Story: A facility users wants to transfer its data to its home organisation or home computer. The volume of data (up to 100TB) is important and he has difficulty to ensure this transfer in a one go. The user needs a simple tool that will ensure the complete transfer without having to resume it by himself. Actors: The RI holding the data, the user performing the transfer. Potential constraints: As to be extremely simple at least at the user level 9
Other issues that will need to be addressed/clarified, maybe depending on the use case. o Which data transfer solutions (OneData, FTS3, Globus, ) o RI datacentre <-> Service o RI datacentre <-> Archive facility o RI datacentre <-> Users lab/computers o User support : o How to ensure that users will get proper answers o How to manage interfaces between service/data providers o SLA, Service monitoring o Usage statistics o Security and liability 10
Keep in touch : wp6@panosc.eu https://github.com/panosc- eu/panosc/tree/master/Work%20Packages/WP6%20EOSC%20Integration 11