Data Management Policies and Practical Solutions for Research Institutions

reagan moore pi mary whitton project manager n.w
1 / 33
Embed
Share

Explore the implementation and enforcement of policy-based data management strategies in academic settings, focusing on actionable policies, automated processes, and collaborative consortiums.

  • Data Management
  • Policy Solutions
  • Academic Research
  • Automation
  • Consortiums

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Reagan Moore, PI Mary Whitton, Project Manager National Science Foundation Cooperative Agreement: OCI-0940841

  2. Policy Topics Policy-based Data Management Practical Policy Working Group outcomes Data Center policies Applications DataNet Federation Consortium analyzed 175 policies for Data sharing SILS Digital library RDA Practical Policy UNC-CH Protected data Odum/Dataverse NSF data management plans Science Observatory Network PECE/RPI NOAA NCDC (research collaborations) (personal collections) (data centers) (secure medical workspace) (archive) (publication) (real-time sensor data) (anthropology) (archive)

  3. Policy-based Data Management National Science Foundation Cooperative Agreement: OCI-0940841

  4. Summary of the Problem Computer actionable policies are used to enforce data management automate administrative tasks validate compliance with assessment criteria automate scientific data processing and analyses Practical Policy Assertion or assurance that is enforced about a (data) collection (data set, digital object, file) by the creators of the collection Users motivated by issues related to scale, distribution

  5. Practical Policy Working Group National Science Foundation Cooperative Agreement: OCI-0940841

  6. Policy Templates Practical Policy members represented 11 types of data management systems 30 institutions 2 testbeds iRODS Renaissance Computing Institute, DataNet Federation Consortium DFC GPFS Institute of Physics of the Academy of Sciences, CESNET Garching Computing Centre RZG Published two documents Moore, R., R. Stotzka, C. Cacciari, P. Benedikt, Practical Policy Templates February, 2015, http://dx.doi.org/10.15497/83E1B3F9-7E17-484A-A466-B3E5775121CC. Moore, R., R. Stotzka, C. Cacciari, P. Benedikt, Practical Policy Implementations , February, 2015, http://dx.doi.org/10.15497/83E1B3F9-7E17-484A-A466-B3E5775121CC.

  7. Data Center Policies Contextual metadata extraction Automate extraction of metadata from files Data access control Automate application of appropriate access contrls Data backup Automate creation of replicas Data format control Automate identification of data format Data retention Apply a retention period Disposition Apply a disposition policy at end of retention period INLS 624 7

  8. Data Center Policies Integrity (including replication) Verify integrity and replace bad copies Notification Manage events about changes to the collection Restricted searching Manage searches on collection Storage cost reports Generate cost report Use agreements Manage use agreements before data are retrieved INLS 624 8

  9. Digital Library Management National Science Foundation Cooperative Agreement: OCI-0940841

  10. LifeTime Library Policies Requirements Enable students to create a personal digital collection Provide pedagogy mechanisms for experimenting with: Naming Arrangement Description Access controls Ingestion Distribution - File names - Organization in collections - Tags and metadata - Sharing and publication - Controlled loading of data - Storage locations INLS 624 10

  11. Student Experiences Students invariably: Changed their minds about the purpose of the collection Changed their minds about the description Term definitions tended to drift over the semester Changed their minds about the arrangement Added new collections for additional types of data Resulting collections had: 1,000 10,000 files 2 Gigabytes to 150 Gigabytes in size 4-10 metadata attributes per file INLS 624 11

  12. Protected Data National Science Foundation Cooperative Agreement: OCI-0940841

  13. Protected Data Management UNC-CH has published an administrator s guide for the management of protected data. This includes: PII Personally Identifiable Information PHI Protected Health Information PCI Payment Card Industry information The question is whether each of the tasks specified in the guide can be automated as policies enforced by the data grid. See Chapter 6 of the Policy Examples Workbook This specifies 51 tasks that should be managed by the administrator

  14. Protected Data Tasks 1 Check for presence of PII on ingestion 2 Check for viruses on ingestion 3 Check passwords for required attributes 4 Encrypt data on ingestion 5 Encrypt data transfers 6 Federation - control data copies (access control) 7 Federation - manage remote data grid interactions (update rule base) 8 Federation - periodically copy data 9 Federation- manage data retrieval (update access controls) 10 Generate checksum on ingestion 11 Generate report of corrections to data sets or access controls 12 Generate report for cost (time) required to audit events 13 Generate report of types of protected assets present within a collection 14 Generate report of all security and corruption events 15 Generate report of the policies that are applied to the collections 16 List all storage systems being used 17 List persons who can access a collection INLS 624 14

  15. Protected Data Tasks 18 List staff by position and required training courses 19 List versions of technology that are being used 20 Maintain document on independent assessment of software 21 Maintain log of all software changes, OS upgrades 22 Maintain log of disclosures 23 Maintain password history on user name 24 Parse event trail for all accessed systems 25 Parse event trail for all persons accessing collection 26 Parse event trail for all unsuccessful attempts to access data 27 Parse event trail for changes to policies 28 Parse event trail for inactivity 29 Parse event trail for updates to rule bases 30 Parse event trail to correlate data accesses with client actions 31 Provide test environment to verify policies on new systems 32 Provide test system for evaluating a recovery procedure 33 Provide training courses for users 34 Replicate data sets on ingestion INLS 624 15

  16. Protected Data Tasks 35 Replicate iCAT periodically 36 Set access approval flag 37 Set access controls 38 Set access restriction until approval flag is set 39 Set approval flag per collection for enabling bulk download 40 Set asset protection classifier for data sets based on type of PII 41 Set flag for whether tickets can be used on files in a collection 42 Set lockout flag and period on user name - counting number of tries 43 Set password update flag on user name 44 Set retention period for data reviews 45 Set retention period on ingestion 46 Track systems by type (server, laptop, router, .) 47 Verify approval flags within a collection 48 Verify files have not been corrupted 49 Verify presence of required replicas 50 Verify that no controlled data collections have public or anonymous access 51 Verify that protected assets have been encrypted INLS 624 16

  17. Task Automation There are some unifying requirements across tasks: Checking material for PII, viruses Management of passwords Generation of log files for all actions done Creation of state information to track processes Management of encryption Management of access controls Generation of audit trails Parsing of events to demonstrate compliance over time Verification that processes were correctly applied Many of these requirements can also be applied to digital libraries and research collaborations INLS 624 17

  18. Preservation National Science Foundation Cooperative Agreement: OCI-0940841

  19. Cross-Disciplinary Data Discovery and Geographically Distributed Preservation DFC April 2013 NSF Review Slide 19

  20. Archive Policies The Dataverse network has about 800 GigaBytes of data that may contain protected information. An archive is needed with independent management of the material to ensure recovery in the case of a disaster. Digital objects and provenance metadata must be re- loadable into Dataverse. Assessment criteria need to be evaluated to verify integrity. Access controls must be enforced on restricted data. Dataverse naming convention must be retained. Approach is to replicate the data holdings into an iRODS data grid. INLS 624 20

  21. Policies See chapter 5 of the Policy Examples Workbook Odum preservation policies Preservation tasks include: Staging files between Dataverse and iRODS Checking data for presence of protected information Periodic verification of integrity and replicas Verification of access controls Reports on usage statistics INLS 624 21

  22. NSF Data Management Plans National Science Foundation Cooperative Agreement: OCI-0940841

  23. NSF Data Management Plans The National Science Foundation has mandated that every project provide a 2-page description of how data will be managed. Each NSF directorate published guidelines on what the data management should include. An analysis of 12 sets of requirements identified 38 data management tasks that could be automated See Chapter 7 of Policy Template Workbook INLS 624 23

  24. NSF DMP Requirements ? ? 1? 2? 3? 4? 5? 6? 7? 8? 9? 10? 11? 12? 13? 14? 15? 16? 17? 18? ? ? ? ? DMP? tasks? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Variable? Collection? Managers? &? staff? ? ? Costs? ? ? Collection? plans? ? ? Instrument? types? ? ? Event? log? ? ? Collection? report? ? ? Required? data? policies? ? ? Data? category? ? ? ? Use? of? existing? data? Analysis? Quality? control? ? ? Analysis? plans? ? ? Data? sharing? during? analysis? Who? ? ? Data? dictionary? /? glossary? ? ? Naming? includes? ? ? Data? format? type? ? ? DOI? for? data? sets? ? ? Metadata? standard? Metadata? export? as? Roles? Budget? How,? what? Type? Event? Event? Products? Type? Source? Plans? Plans? Type? Attributes? Type? Type? Type? Type? ? INLS 624 24

  25. NSF DMP Requirements ? ? ? ? DMP? tasks? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Variable? Collection? Size? Make? original? data? public? Make? Data? products? public? ? When? Re-use? ? Re-distribution? Access? restrictions? IPR? Web? access? through? Data? sharing? system? Code? distribution? system? Retention? period? Curation? Archive? Number? of? replicas? Backup? frequency? Integrity? check? frequency? Technology? evolution? Catalog? Transformative? migration? 19? 20? 21? Publication? 22? 23? 24? 25? 26? 27? 28? 29? ? ? 30? Archive? 31? 32? 33? 34? 35? 36? ? ? 37? ? ? 38? ? ? Storage? ? ? Location? Size? When? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Policies? Community? Privacy? Type? How? Type? Type? Period? Plans? Location? #? Policies? Policies? Plans? Metadata? Formats? ? ? ? ? ? ? ? ? ? ? ? INLS 624 25

  26. Science Observatory Network National Science Foundation Cooperative Agreement: OCI-0940841

  27. Real-Time Sensor Data Harvest sensor data from the Antelope Real Time Sensor orb. Manages environmental, oceanic, seismic data More that 3,000 sensors across the US Register each sensor as an independent collection Retrieve the most recent sensor data Harvest sensor data periodically Transform to JSON, netCDF Provide access to archived data

  28. PECE / RPI National Science Foundation Cooperative Agreement: OCI-0940841

  29. Collection Management Policies Contextual metadata extraction Data access control Data backup Data format control Data retention Disposition Integrity (including replication) Notification Restricted searching Storage cost reports Use agreements INLS 624 29

  30. NOAA NCDC National Science Foundation Cooperative Agreement: OCI-0940841

  31. NOAA Climatic Data Center Manages an archive of climate data records received from multiple sources Uses a staging area to Check input data for viruses Manage ingestion into a tape archive Challenges Needed a way to improve security Eliminate direct access to storage within the NOAA firewall Needed a way to automate management of each file Verify archival storage before file is deleted

  32. iRODS Secure Ingest NCDC Internal Network DMZ Landing Zone: Open for data delivery ftp iRODS NCDC Grid iRODS DMZ ftp1 ftp2 Grid ftp3 Tape ingest1 /NCDC /Ingest DMZ Firewall HDSS /NR2 /NR3 ftp4 /DMZ /Archive /NR2 /NR3 /Archive ftp5 ingest2 /NR2 /NR3 Disk Cache FTP Load Balance FTP PUSH/PULL NCDC External Firewall iRODS is: Secure authentication Security via Obscurity (one to bind them) Uses a pull mechanism to move data into NCDC grid A virtual management tool (clean-up) Scope is entire grid FTP/FTPS iRODS External Providers

  33. www.datafed.org www.irods.org Policy Examples Workbook Policy Templates Workbook National Science Foundation Cooperative Agreement: OCI-0940841

Related


More Related Content