Follow

my alertmanager memory use % really needs some hysteresis, helios is perpetually bouncing around 90% used, generating a lot of useless warning/resolved notifications

@f0x I like the recommendation of alerting on symptoms instead of causes. Because there's thousands of things that could possibly cause issues and each of them has a false positive rate. But there's usually only two external symptoms of all of those issues (error rate and latency).

@f0x (Also you might like the memory pressure metric, which measures how much time was wasted freeing up memory, it's a lot less noisy)

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.