Speaker
Mr
Basil Lalli
(NERSC - LBNL)
Description
Last year at NERSC we retired our long-standing nagios-based HPSS monitoring deployment in favor of VictoriaMetrics, Loki and Alertmanager. We would like to share our experience and lessons learned on the way.
- Motivation for making this transition
- Limitations of Nagios-style monitoring
- How does VictoriaMetrics address these?
- General overview of our monitoring deployment
- 3rd party exporters
- Custom exporters/"plugins"
- Demonstration of some of the dashboards we use and alerts we generate.
- Future areas of improvement
- Standardizing our HPSS-specific data collection
- Service discovery
Timeslot | 15 min |
---|
Primary author
Mr
Basil Lalli
(NERSC - LBNL)