
Effective Strategies for Debugging Microservices
Learn how to prevent, detect, and fix incidents in microservices efficiently. Explore challenges, observability, infrastructure monitoring, and deployment strategies for seamless operations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
DEBUGGING MICROSERVICES
Unimportant slide Work: Cisco ET&I SRE Engineering Technical Lead Small teams running lots of different applications OSS fan (see you at Hacktoberfest) GitHub/Twitter: sagikazarmark https://sagikazarmark.hu 2
The whole story Prevent Detect Fix Prevent incidents before they start. Detect incidents as quickly as possible, notify the responsible parties. Find and fix problems to resolve the incident ASAP. 3
Challenges Observability Agenda How it all works Incident debugging 4
Challenges Deployment history Request context Network Skills Dynamic infrastructure (Kubernetes) 5
OBSERVABILITY Data, data, data 6
Infrastructure / platform Monitoring Logging (Distributed) Tracing (Continuous) Profiling 7
Application Instrumentation Context propagation (request context) Correlation Annotations 8
Prevent incidents Deployment strategies Rolling release Blue/green deployment Automatic rollback Canary releases Alerting + automations 9
Detect early SLI / SLO Paging: SLO violations Mission-critical, user-facing services first Alerting != paging (alert fatigue) Iterate! 10
Everything is on fire Find what caused the problem Revert it (if possible) Investigate the problem Fix and deploy 11
Incident debugging: Monitoring Dashboards Use different aggregations (median, XY percentile) Correlate information Can t correlate individual requests (eg. latency vs payload size) 12
Incident debugging: Tracing Individual requests Visualize latency between components Correlate timing and events Sampling strategy (rate, error, latency) 14
Bonus: postmortems Best way to learn from our mistakes Focus on learning (not blaming) Blameless Postmortem Atlassian handbook 17
Recap Prevent Detect Fix Prevent incidents before they start. Detect incidents as quickly as possible, notify the responsible parties. Find and fix problems to resolve the incident ASAP. 18
Thanks! Any questions? @sagikazarmark https://sagikazarmark.hu 19