Analyzing Reliability in Hybrid Compute Units at IEEE CIC 2015

Analyzing Reliability in Hybrid Compute Units at IEEE CIC 2015
Slide Note
Embed
Share

This research presented at the IEEE CIC 2015 focuses on analyzing reliability in hybrid compute units, exploring the ability of systems to function correctly over time under defined conditions. The study delves into reliability analysis frameworks, implementation strategies, and future implications in the realm of hybrid computing systems.

  • Reliability Analysis
  • Hybrid Computing Systems
  • IEEE CIC
  • Compute Units
  • System Improvements

Uploaded on Mar 05, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. IEEE International Conference on Collaboration and Internet Computing (IEEE CIC 2015) October 28 - October 30, 2015, Hangzhou, China Analyzing Reliability in Hybrid Compute Units Muhammad Candra, Hong-Linh Truong, Schahram Dustdar Distributed Systems Group TU Wien Distributed Systems Group

  2. Outline Background Introduction to Hybrid Computing System Introduction to Reliability Analysis Motivation Models Reliability Analysis Framework Implementation and Experiments Conclusions and Future Works 2 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  3. [Background] Hybrid Computing System Software-based services Application Cloud-based services composition Workflows with human-tasks Crowdsourcing applications IoT applications Crowdsourcing platforms Social networks of experts On-premise experts Human-based Compute Units Hybrid Compute Units Quality Metrics? RELIABILITY 3 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  4. [Background] Reliability Analysis What is reliability? The ability of a system to function correctly over a specified period of time, mostly under predefined conditions Why do we need? SYSTEM IMPROVEMENTS for designer for resource provider for task owner How to measure? STOCHASTIC ANALYSIS 4 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  5. [Background] Reliability Analysis in HCS Problems for Reliability Analysis in HCS Non-continuous time space More ad-hoc inter-dependency Resources provisioning on The Cloud Our goal: To provide a set of tools for modeling and analyzing reliability for hybrid computing systems. 5 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  6. [Background] Motivating Scenario Infrastructures Infrastructure Maintenance Platform Sensing Human-Based Computing Platform HCU Collective Resources pool Sensors N/W Stream Analytic Dedicated Inspectors Citizens on the Cloud 6 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  7. [Models] Reliability of Individual Units The R(t) formula R(t)= The probability of failure free operations in [0..t] = 1 F(t) F(t) = 0 wheref(t) = The probability density function requires continuous operations, does not fit for human-based units The R(k) formula Discrete reliability model - based on task execution k f(k) = Pr{taskk fails | task1, task2, , taskk-1 succeed} ? f( )d 7 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  8. [Models] Collective Dependencies Human-based Computing Platform Infrastructure Management Platform Stream Analytic Server RA requires information on inter-dependencies between components. Sensor devices Sensor Network Role Comm. Provider Role Stream Analyzer Role Role Role Sensor Human Comp. Platform Infrastructure Manager Hardware Sensing Stream Analytics Collecting Data Assessing Data Coordinating People Sensors Infrastructure Management (1) Role Collector Role Assessor (1) (1) Dependency Alternate Dependency (n) Assignment Alternate Assignment (n) Citizens on the Cloud Inspectors 8 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  9. [Reliability Analysis Framework] System Overview Act2 Act4 Act1 Act3 Task Layer RoleA RoleB RoleC Req C Req A Req B ASSIGNMENT Task execution (according to Collective Dependency) RoleA RoleC RoleB Req A Req C Req B HCU Collective Layer Active Collective Static sets of resources COMPOSITION Resources discovered suitable for fulfilling a role Standby Resources Static Resources VSUA Req A VSUB Req B StaticSetC Req C ... Virtual Standby Units (VSU) ... Resources Layer Discovery Pool of resources ... ... ... People Machines and Software Services 9 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  10. [Reliability Analysis Framework] Reliability Calculation (1) Input: The individual reliability profile for each units Collective dependency Outcome: The reliability for executing a set of K tasks. Steps Obtain individual reliability on time t or on execution k Calculate the reliability for each role Calculate the reliability of the task executions 3 3 1 1 2 2 10 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  11. [Reliability Analysis Framework] Reliability Calculation (2) Obtain individual reliability (continuous) on time t (for machine-based units) or (discrete) on execution k (for human-based units) Domain-specific individual reliability model 1 1 For example (for human units), binomial distribution f(k) = (1 - p) k-1 p R(k) = (1 - p) k How to get p? 11 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  12. [Reliability Analysis Framework] Reliability Calculation (3) Calculate the reliability for each role 2 2 Reliability of statics set of unis Simplex Parallel / serial structure Static and dynamic redundancy Reliability of Virtual Standby Units (VSU) Similar to M-of-N redundancy RoleA Detection and Reconfiguration - detect failure - reconfigure Active (in HCU Collective) VSUA ... - select Standby 12 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  13. [Reliability Analysis Framework] Reliability Calculation (4) Calculate the reliability of the task executions 3 3 using Execution Spanning Tree (EST) (VSUSe) (SN) (SAS) (HCP) Human-based Computing Platform (IMP) Infrastructure Management Platform Stream Analytic Server Sensor devices Sensor Network VSUSe SN SAS Role Comm. Provider Role Stream Analyzer Role Role Role Sensor Human Comp. Platform Infrastructure Manager IMP Coll Coll VSU Cz VSU In (1) (1) Hardware Sensing Stream Analytics HCP Collecting Data Assessing Data Coordinating People Sensors Infrastructure Management (1) (1) (1) (1) Asses Asses VSU Cz VSU In Role Collector Role Assessor (1) (1) Dependency Alternate Dependency ESTs: (n) Assignment Alternate Assignment IMP, SAS, VSUSe, SN IMP, HCP, VSUCzColl, VSUCzAsses IMP, HCP, VSUCzColl, VSUInAsses IMP, HCP, VSUInColl, VSUCzAsses IMP, HCP, VSUInColl, VSUInAsses (n) Citizens on the Cloud Inspectors (VSUInColl) (VSUInAsses) (VSUCzColl) (VSUCzAsses) 13 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  14. [Reliability Analysis Framework] Reliability Calculation (5) Calculate the reliability of the task executions 3 3 using Execution Spanning Tree (EST) Given St, as a set of ESTs, e.g.: IMP, SAS, VSUSe, SN IMP, HCP, VSUCzColl, VSUCzAsses IMP, HCP, VSUCzColl, VSUInAsses IMP, HCP, VSUInColl, VSUCzAsses IMP, HCP, VSUInColl, VSUInAsses 14 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  15. [Implementation & Experiments] Prototype Implementation Runtime and Analytics for Hybrid Computing Systems (RAHYMS) Based on GridSim toolkit Features Simulate a pool of resources (machine-based and human-based units) Simulate task requests generation Strategies for HCU formation Reliability analysis tool 15 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  16. [Implementation & Experiments] Experiment Setup Focus on VSUs Sensors R(t) = e- t Human: Citizens and Inspectors R(k) = (1 - p)k t = k / 30 Assumed static: Infrastructure Management Platform (IMP) Human-based Computing Platform (HCP) Sensors Network (SN) 16 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  17. [Implementation & Experiments] Experiment 1 17 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  18. [Implementation & Experiments] Experiment 2 18 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  19. [Implementation & Experiments] Experiment 3 19 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  20. [Conclusions & Future Works] Conclusion Models Framework Individual Reliability (Continuous & Discrete) Collective Dependency (Collaboration for known structure) Tools for Reliability Analysis Experiments show how the RA can be used to obtain insights for system improvements. Future Works Dependable hybrid human-machine computing Dependability metrics: availability, performance, quality of results. 20 Analyzing Reliability in Hybrid Compute Units, IEEE CIC 2015, October 28 - 30, 2015, Hangzhou.

  21. Thank you Acknowledgments The first author of this paper is financially supported by Vienna PhD School of Informatics http://www.informatik.tuwien.ac.at/teaching/phdschool The work mentioned in this paper is partially supported by EU FP7 FET SmartSociety project http://www.smart-society-project.eu/ 21

More Related Content