SLO告警策略
最近又进入可观测和SRE领域了,有意思,特有意思。
参考url:https://www.datadoghq.com/blog/monitor-service-performance-with-slo-alerts/
连续三篇,都好有用的。
一,datadog上设置的图
二,SLO告警的两种方案
Error budget和Burn rate告警。一个是低级别关注,一个可能会是高级别关注。
而Burn rate涉及多窗口多燃烧率,关键概念就是这些。
Error budget alerts can give you time to bring your service’s performance into SLO compliance, but they only notify you after the defined portion of the budget is gone. For an even more proactive approach to SLO monitoring, you can create a burn rate alert to notify you if you’re consuming your error budget more quickly than expected
Your burn rate fluctuates according to changes in your service’s performance. Datadog automatically [calculates your service’s burn rate](https://docs.datadoghq.com/monitors/service_level_objectives/burn_rate/#how-burn-rate-alerts-work) across a subset of your SLO’s time window, known as an **alerting window**. A burn rate alert uses a long alerting window (to help prevent flapping and alert fatigue) and a short alerting window (to allow the alert to recover quickly when the burn rate falls back below the threshold). The alert will trigger when the burn rate across both windows is above your alert’s threshold.
共有 0 条评论