Last modified: January 24, 2026
This article is written in: 🇺🇸
In modern distributed systems, the performance and reliability of communication channels, APIs, and network infrastructure are critical factors that determine user experience. Metrics and analysis offer insights into system behavior under varying loads, help identify bottlenecks, and guide capacity planning. Capturing the right metrics and interpreting them correctly leads to robust, scalable architectures. This document explores the core metrics for communication, API usage, and network layers, alongside common formulas, analysis techniques, and best practices in monitoring and diagnostics.
Data-driven decisions about scaling, optimization, and resource allocation rely on the proper collection and interpretation of metrics. Without measurable indicators, developers and operators are left guessing about system performance and reliability. Metrics also facilitate:
Latency is the time it takes for a request to travel from a client to a server and for the server to respond, often including network travel time and server processing. Commonly measured as:
A simplified latency equation for an HTTP request might be:
Total_Latency = RTT_network + Server_Processing_Time + (Possible_Queueing_Delay)
Throughput indicates how many requests or messages a system can handle over a given time, often expressed as Requests per Second (RPS) or Queries per Second (QPS). If N_req is the total number of requests within a measurement window T, throughput can be approximated as:
Throughput = N_req / T
Concurrency measures how many requests or connections a system is serving simultaneously. A system that supports large concurrency can handle many in-flight requests at once, but each active request may consume memory and CPU resources.
Common ways to measure errors:
(Number_of_Error_Responses) / (Total_Requests). Often expressed as a percentage of time the service is fully operational:
Availability (%) = 100 * (Uptime / Total_Time)
For critical services, SLOs might require 99.9% or 99.99% availability, corresponding to allowable downtime of minutes or seconds per month.
In lower-level network contexts (TCP, UDP), metrics like packet loss, retransmissions, and bandwidth utilization are key:
Packet_Loss_Rate = (Packets_Lost / Packets_Sent) * 100
Excessive packet loss degrades application performance, especially for real-time or streaming protocols.
For RESTful services, typical metrics include:
A representation of data flow with potential metric collection points:
Client (Browser/App) ----> [Load Balancer] ----> [API Server(s)] ----> [Database/Cache]
| | | |
| metrics?? | metrics?? | metrics?? | metrics??
v v v v
Logging & Monitoring Infrastructure (e.g., Prometheus, Grafana, ELK stack)
At each stage, logs and performance counters gather metrics on request durations, error rates, and resource usage (CPU/memory).
Similar to REST, but the queries can be more dynamic. Important metrics include:
Because gRPC uses HTTP/2 and Protobuf, typical metrics include:
Bandwidth is the theoretical max data rate, while utilization measures how much of that bandwidth is in use. Monitoring helps avoid saturation. If you see throughput near the link’s capacity, you risk increased latency and packet drops.
Systems like Prometheus, Graphite, InfluxDB, or DataDog store metrics. Tools like Grafana or Kibana help create real-time dashboards. A typical setup might ingest counters and histograms from applications, store them in a time-series database, and visualize them in charts.
Relying on averages can be misleading—some users might experience extreme delays while the average remains fine. Histograms reveal distribution across multiple buckets, giving better insight into tail latencies (p95, p99).
Example formula for an SLO around error rate:
Error_Rate_SLI = (Number_of_Error_Requests / Total_Requests)
Target: Error_Rate_SLI <= 0.1%
If the error rate goes beyond 0.1%, you exceed your error budget.
Performance analysis relies on simulating realistic traffic:
Tools like Apache JMeter, Locust, or k6 let you define test scripts that emulate real client behavior. Metrics from these tests guide capacity planning and highlight scaling bottlenecks.
In microservice architectures, a single request can span multiple services. Distributed tracing with solutions like Jaeger or Zipkin tracks how requests hop between services. The system collects timestamps and metadata at each node:
[Service A] -- calls --> [Service B] -- calls --> [Service C]
| | |
v v v
(Trace A) (Trace B) (Trace C)
Traces aggregated and visualized in a central UI
This reveals which segments of a request path consume the most time or fail often. Tracing complements standard metrics by delivering a request-centric timeline rather than aggregated counters.
Below is a conceptual diagram of how metrics might flow:
| Various Clients | -- Make Requests --> [Load Balancer] -> [API/Services]
+---------------------+ |
| Generate logs
| and metrics
v
+-----------------------+
| Metrics Exporter |
| (Prometheus, StatsD) |
+-----------+-----------+
|
v
+-----------------------+
| Time-Series DB |
| (Prometheus, InfluxDB)|
+-----------+-----------+
|
v
+-----------------------+
| Visualization |
| (Grafana, Kibana) |
+-----------------------+
Steps in this pipeline: