
Innovations in Big Data Analytics at RDA Plenary 5 Session - San Diego 2015
The RDA Plenary 5 Session held in San Diego, California, in March 2015, showcased advancements in Big Data analytics. The agenda included discussions on smart data analytics, use cases in various domains, issues in data curation, and the evolving landscape of Big Data. Presentations highlighted topics such as system orchestration, information value chain, data processing, and security. The event also addressed the creation of working groups and coordinated efforts to drive outcomes and deliverables in the Big Data domain.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
10 March 2015 Paradise Point San Diego, California RDA PLENARY 5 BIG DATA (ANALYTICS) IG SESSION
NEXT JOINT SESSION CANCELED The planned breakout joint session with Reproducibility IG has been canceled: 10 March, 16:00-17:30 Sunset Ballroom Salon V Responsible co-chairs from both IGs are unable to attend this Plenary. 2 RDA P5, San Diego, California 3/10/15
AGENDA (FINAL VERSION, PROMISE!) Presenter/Mo Presenter/Mo derator derator Time Time Presentation/Discussion Presentation/Discussion 14:00-14:05 Kuo Session Introduction (Actually we are waiting for people to come in ) Smart Data Analytics 3 use cases in different domains Issues in Big Data Curation Use Case Advances Report: ND Arrays, Spatiotemporal Earth Data The Case for IG Name Change Big Data Analytics IG 14:05-14:25 Markus G tz 14:25-14:45 Line Pouchard 14:45-15:05 Peter Baumann 14:45-15:00 Kuo 15:00-15:30 Members and BD IG Outcome and Deliverable Discussion WG Creation and Coordination Discussion Participants 3 RDA P5, San Diego, California 3/10/15
PRESENTATIONS START 4 RDA P5, San Diego, California 3/10/15
NEXT JOINT SESSION CANCELED The planned breakout joint session with Reproducibility IG has been canceled: 10 March, 16:00-17:30 Sunset Ballroom Salon V Responsible co-chairs from both IGs are unable to attend this Plenary. 5 RDA P5, San Diego, California 3/10/15
DRAFT NBD-PWG REFERENCE ARCHITECTURE INFORMATION VALUE CHAIN System Orchestrator Big Data Application Provider Data Consumer Data Provider Preparation / Curation Visualization Analytics DATA DATA Collection Access SW SW IT VALUE CHAIN DATA SW Big Data Framework Provider Security & Privacy Processing: Computing and Analytic Messaging/ Communications Streaming Resource Management Interactive Batch Management Platforms: Data Organization and Distribution Indexed Storage File Systems Infrastructures: Networking, Computing, Storage Virtual Resources Physical Resources K E Y : Big Data Information Flow Software Tools and Algorithms Transfer SW Service Use DATA 6 RDA P5, San Diego, California 3/10/15
MISSION The ultimate goals of RDA Big Data Interest Group is to produce a set of produce a set of recommendation documents recommendation documents to advise diverse research communities diverse research communities with respect to: How to How to select select an an appropriate appropriate Big particular particular science application science application with Important Important: Need to connect with various science/research domains! What are What are the the best best practices practices in in dealing data data and and computing computing issues issues associated solution. solution. to advise Big Data Data solution with optimal optimal value. solution for value. for a a dealing with associated with with various various with such such a a 7 RDA P5, San Diego, California 3/10/15
OBJECTIVES Clarifying, and sometimes defining, terminologies related to Big Data, leveraging: ISO/IEC JTC 1 Terms and Definitions, NIST Big Data PWG (NBD-PWG) Definitions, and Taxonomies documents, and RDA Terminologies WG Characterizing leading Big Data technologies. Important Important: Need to collaborate with relevant RDA IGs and initiate Working Groups. Example characteristics include: performance, resource utilization, scalability, usability, flexibility, extensibility, propensity in supporting scientific collaborations, etc. Collaborating with external entities through IG member involvements, including: ISO, NIST, INCITS, OGC, NBD-PWG, EarthCube, EarthServer, etc. Producing a set of recommendation documents based on results obtained from activities in attaining above objectives, including: A systematic classification of algorithms pertinent to the characterization of Big Data technologies, Characterizations of Big Data technologies investigated, especially their value characteristics in each category of use cases, Frequency of each class of algorithms and/or queries used by workflows in various use cases, delineated by science domains/subdomains, and Feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries. 8 RDA P5, San Diego, California 3/10/15
PARTICIPATION Domain scientists wishing to utilize Big Data solutions for their research and/or applications, Data specialists with experience in data production, curation, analysis, and management, especially involving large volumes and varieties of data, Computational scientists or software engineers with special interests in data analysis techniques and algorithm analysis, especially pertaining to BigData relevant technologies and tools, Experts, or aspiring experts, of various Big Data technologies and tools, Computational infrastructure and architecture experts in fields such as distributed computing, high-performance computing, and database systems, Data scientists with a blended interest involving some subsets the activities mentioned above, in particular with share, use and reuse of open scientific datasets, and Managers involved in any combination of the activities mentioned above. 9 RDA P5, San Diego, California 3/10/15
INTERACTION MECHANISM Monthly teleconference to with planned agenda to discuss specific issues. Proposing 10 AM US Eastern Time (4 PM Central European Time) every 1st or 2nd Thursday of each month. We will use GoToMeeting instead of the default RDA means for teleconferencing. Agenda should be available 1 week before meeting. Meeting minutes should be available within 1 week after meeting. Asynchronous collaboration using RDA Wiki, Google Docs, and email lists. Semiannual RDA Plenary meetings to hold sessions for progress reports and face-to-face interactions amongst interested parties. 10 RDA P5, San Diego, California 3/10/15
SCHEDULE Year Year Qr. Qr. Task Task 2015 2015 Q1 Revise BDA IG (original group name) charter to suit the broadened scope of proposed IG name change to Big Data Interest Group . Prepare RDA 5th Plenary. Q2 Start the planning and organization of studies into the characterization of various popular Big Data technologies Evaluation of potential WG spinoffs based on characterization work. Q3 Progress reports on characterization studies. Prepare RDA 6th Plenary. Created Spinoffs WGs on detailed big data studies Q4 Progress reports on characterization studies. 2016 2016 Q1 Produce a report on characterization studies. Prepare RDA 7th Plenary. Initial results of Spinoff WGs and their findings 11 RDA P5, San Diego, California 3/10/15