Intra- and Inter-operator Agreement for Adiposity Measurements

Intra- and Inter-operator Agreement for Adiposity Measurements
Slide Note
Embed
Share

Bland-Altman plots depict the agreement between operators in measuring visceral and subcutaneous adipose tissue areas. Intra-operator agreement for visceral adipose tissue shows a mean difference of 0.059 cm2 while inter-operator agreement indicates a mean difference of 0.14 cm2. Similarly, for subcutaneous adipose tissue, the intra-operator mean difference is -0.059 cm2 and inter-operator mean difference is -0.11 cm2. These plots provide insights into the consistency and variation in adiposity measurements performed by different operators.

  • Adiposity Measurements
  • Inter-operator Agreement
  • Visceral Adipose Tissue
  • Subcutaneous Adipose Tissue

Uploaded on Feb 24, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Simplifying the Recovery Model of User- Level Failure Mitigation Wesley Bland ExaMPI 14 New Orleans, LA, USA November 17, 2014

  2. Why Process Fault Tolerance? Two (main) kinds of resilience Data resilience (full backups and silent errors) Process resilience Data resilience is handled outside of MPI Checkpoint/Restart, FTI, SCR, GVR, Containment Domains, etc. Process fault tolerance must be at least partially handled within MPI Must recover communication channels ExaMPI '14, Wesley Bland <wbland@anl.gov>

  3. What has been done before? Checkpoint/Restart Requires restarting the entire job, waiting in job queue FT-MPI (University of Tennessee) Automatic communicator repair, no customization Support dropped and never presented to the MPI Forum FA-MPI (University of Alabama, Birmingham) Try/catch semantics to simulate transactions Detects errors via timeouts Run-Through Stabilization (MPI Forum) Similar to ULFM, but too big to not fail ExaMPI '14, Wesley Bland <wbland@anl.gov>

  4. User-Level Failure Mitigation (ULFM) Handles (fail-stop) process failures General resilience framework designed to support a wide range of recovery models Encourages libraries to build more user-friendly resilience on top of ULFM Failure notification via return codes and/or MPI_Errhandlers Local failure notification to maintain performance ExaMPI '14, Wesley Bland <wbland@anl.gov>

  5. ULFM Continued Failure propagation happens manually when necessary MPI_COMM_REVOKE All communication operations on a communicator are interrupted New function call creates a replacement communicator without failed procs MPI_COMM_SHRINK Group of failed procs available via API calls Failure Propagation Failure Notification Failure Recovery Revoke Recv(1)Failed 0 0 1 MPI_Recv 1 Recv(3) Revoked Revoked MPI_ERR_PROC_FAILED 2 MPI_COMM_SHRINK() Revoked 3 Recv(0) Send(2) ExaMPI '14, Wesley Bland <wbland@anl.gov>

  6. FT Agreement Collective over communicator Allows procs to determine the status of an entire communicator in one call Bitwise AND over an integer value Ignores (acknowledged) failed processes Provides notification for unacknowledged failures Works on a revoked communicator Useful for validating previous work Communicator creation Application iteration(s) ExaMPI '14, Wesley Bland <wbland@anl.gov>

  7. ULFM in MPICH Experimentally available in 3.2a2, planned for full MPICH 3.2 release Still some known bugs www.mpich.org Un-optimized implementation Need to enable some setting for complete usage Configure flag: --enable-error-checking Environment variable: MPIR_CVAR_ENABLE_FT Also available in Open MPI branch from UTK www.fault-tolerance.org ExaMPI '14, Wesley Bland <wbland@anl.gov>

  8. Why is ULFM challenging? Feedback from users and developers: Challenging to track the state of an entire legacy application Checking all return codes / using error handler requires lots of code changes Much of this is not wrong ULFM has been designed to be flexible, not simple Eventually, applications should use libraries for FT, not ULFM itself Similar to how MPI was originally envisioned However all is not lost ExaMPI '14, Wesley Bland <wbland@anl.gov>

  9. BSP Applications Very common types of applications Typically involve some sort of iterative method Ends iterations with a collective operation to decide what to do next Commonly already have checkpoint/restart built-in for current gen systems Already protecting data and have functions to regenerate communicators ExaMPI '14, Wesley Bland <wbland@anl.gov>

  10. ULFM for BSP while (<still working>) { Take advantage of existing synchrony Perform agreement on the result of the ending collective If there s a problem, repair the communicator and data and re- execute Avoid restarting the job Sitting in the queue, restarting the job, re-reading from disk, etc. Application Code rc = MPI_Allreduce(value) if (rc == MPI_SUCCESS) success = 1 rc = MPI_Comm_agree(success) if (!success || !rc) repair() } ExaMPI '14, Wesley Bland <wbland@anl.gov>

  11. Recovery Model Failure Communication Recovery Dynamic processes (+ batch support) Shrink + Spawn No-dynamic processes Start the job with extra processes Rearrange after a failure Shrink Spawn Data Recovery Full checkpoint restart Already built into many apps High recovery and checkpointing overhead Application level checkpointing Low overhead Possible code modifications required 0 1 2 3 - 1 1 3 - - 1 1 2 3 ExaMPI '14, Wesley Bland <wbland@anl.gov>

  12. Code Intrusiveness Add 2-3 lines to check for failures Perform agreement to determine status Add function to repair application If the agreement says there s a failure, call the repair function Repair data and communicator(s) ExaMPI '14, Wesley Bland <wbland@anl.gov>

  13. Monte Carlo Communication Kernel (MCCK) Mini-app from the Center for Exascale Simulation of Advanced Reactors (CESAR) Monte-Carlo, domain decomposition stencil code Investigates communication costs of particle tracking ExaMPI '14, Wesley Bland <wbland@anl.gov>

  14. ULFM Modifications for MCCK Application Level Checkpointing Checkpoint particle position to disk On failure, restart job No job queue to introduce delay All processes restore data from disk together User Level Failure Mitigation Particle position exchanged with neighbors On failure, call repair function Shrink communicator Spawn replacement process Restore missing data from neighbor No disk contention ExaMPI '14, Wesley Bland <wbland@anl.gov>

  15. Performance Used Fusion cluster @ Argonne 2 Xeons per node (8 cores) 36 GB memory QDR InfiniBand Experiments up to 1024 processes 100,000 particles per process Checkpoints taken every 1, 3, and 5 iterations ExaMPI '14, Wesley Bland <wbland@anl.gov>

  16. Overall Runtime ExaMPI '14, Wesley Bland <wbland@anl.gov>

  17. Overhead ExaMPI '14, Wesley Bland <wbland@anl.gov>

  18. Whats next for ULFM? Fault tolerance libraries Incorporate data & communication recovery Abstract details from user applications Working on standardization in MPI Forum ExaMPI '14, Wesley Bland <wbland@anl.gov>

  19. MPICH BoF Tuesday 5:30 PM Room 386-87 MPI Forum BoF Wednesday 5:30 PM Room 293 Questions? ExaMPI '14, Wesley Bland <wbland@anl.gov>

Related


More Related Content