my alertmanager memory use % really needs some hysteresis, helios is perpetually bouncing around 90% used, generating a lot of useless warning/resolved notifications
@f0x (Also you might like the memory pressure metric, which measures how much time was wasted freeing up memory, it's a lot less noisy)
@f0x I like the recommendation of alerting on symptoms instead of causes. Because there's thousands of things that could possibly cause issues and each of them has a false positive rate. But there's usually only two external symptoms of all of those issues (error rate and latency).