What's Observability?
In distributed systems, observability is the ability to collect data about programs' execution, modules' internal states, and the communication among components.
To improve observability, software engineers use a wide range of logging and tracing techniques to gather telemetry information, and tools to analyze and use it.
Observability is foundational to site reliability engineering, as it is the first step in triaging a service outage.[1]
What's monitoring? How is it related to Observability?
Google: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability".
What types of monitoring outputs are you familiar with and/or used in the past?
Alerts
Tickets
Logging
Can you mention what type of things are often montiored in the IT industry?
- Hardware (CPU, RAM, ...)
- Infrastructure (Disk capacity, Network latency, ...)
- App (Status code, Errors in logs, ...)
Explain "Time Series" data
Time series data is sequenced data, measuring certain parameter in ordered (by time) way.
An example would be CPU utilization every hour:
08:00 17
09:00 22
10:00 91
Explain data aggregation
In monitoring, aggregating data is basically combining collection of values. It can be done in different ways like taking the average of multiple values, the sum of them, the count of many times they appear in the collection and other ways that mainly depend on the type of the collection (e.g. time-series would be one type).
What is Application Performance Management?
- IT metrics translated into business insights
- Practices for monitoring applications insights so we can improve performances, reduce issues and improve overall user experience
Name three aspects of a project you can monitor with APM (e.g. backend)
- Frontend
- Backend
- Infra
- ...
What can be collected/monitored to perform APM monitoring?
- Metrics
- Logs
- Events
- Traces