The Art of Fleet-Wide Kubernetes Observability

Slide Note

Core strategies for monitoring Kubernetes at scale, from identifying key metrics to implementing fleet-wide observability. Gain insights into actionable alerts, correlation techniques, and the future of observability

jveron Follow

Uploaded on Apr 03, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

The Art of Fleet-Wide Kubernetes Observability: 3 Core Strategies FOSDEM 2025 Monitoring and Observability Track Pratik Panda Site Reliability Engineer, RedHat Mitali Bhalla Site Reliability Engineer, RedHat

Agenda What is Fleet-Wide Observability? The Observability Challenge at Scale Looking at the 3 Strategy Metrics: Identifying What Matters Alerts: From Noise to Actionable Signals Correlation: Connecting the Dots Implementing Fleet wide Observability Looking Ahead The Future of Observability Q&A

What is Fleet-Wide Observability?

The Observability Challenge at Scale

Metrics: Identifying What Matters

The Metrics That Matter in Kubernetes

What Makes a Metric Useful?

Things That Mislead and How to Avoid Them Overcollection Focus on high-signal, high-value metrics. Lack of Standardization Adopt a consistent monitoring framework. Ignoring Cardinality Be selective with label usage. Reactive Monitoring Prioritize proactive, SLO-driven metrics.

Alerts: From Noise to Actionable Signals

What Makes an Alert Actionable?

Journey to Alerting Effectiveness

Correlation: Connecting the Dots

Logs and Traces: The Context and The Journey

The Complete Fleet Observability Picture

Implementing Fleet wide Observability Leveraging the concepts of SLIs, SLOs, SLAs

Setting SLOs for key services

Optional section marker or title Benefits of SLO Driven Approach Proactive issue Management Scalable Reliability Management Consistency across services Error Budget Framework Alerting and Insights Enhanced User Experience Aligned goals across teams Key focus on customer impact Unified Metrics Reduced Downtime Prioritized Efforts Continuous Improvement Business-Driven Observability 17

Looking Ahead - The future of Observability AI-Driven Insights Cloud Native and Cross Cloud Provider Automated Remediation Open Standards & Interoperability