Last modified: October 10, 2024
This article is written in: 🇺🇸
Performance Monitoring
Performance monitoring helps you identify bottlenecks or issues that may be affecting your system's performance. We'll now explore some tools and techniques available for monitoring performance and explain some usage statistics, such as CPU and RAM usage.
Understanding Usage Statistics
Usage statistics provide insights into how your system resources are being utilized. These statistics include CPU usage, RAM usage, and disk usage. An increase in these statistics may be due to various factors, such as running resource-intensive applications, insufficient system resources, or a misconfiguration in your system settings.
- CPU Usage indicates how much processing power is being utilized by your system. When CPU usage is high, applications may become sluggish, and the system can become unresponsive.
- RAM Usage shows how much of your system's memory is currently in use. If the system runs out of RAM, it starts using swap space, which can significantly slow down performance.
- Disk Usage displays how much of your system's storage is being consumed. High disk usage can negatively affect overall system performance and responsiveness.
Top
The top
command is a fundamental tool for real-time system monitoring. It offers a dynamic view of the system's running processes, allowing you to see which processes are consuming the most resources. top
is particularly useful for diagnosing load and performance issues on a server or a local machine.
To start top
, simply enter the following in the terminal:
top
This command opens the top interface, which refreshes every few seconds to provide an up-to-date view of the system's state.
When you run top, the output is divided into two sections:
- System Summary is displayed at the top, showing key metrics such as CPU usage, memory usage, swap usage, load average, and system uptime.
- Process List appears below the summary, listing individual processes, typically sorted by CPU usage, with the most resource-intensive processes displayed at the top by default.
Main idea:
- Dynamic Update ensures that the display refreshes in real-time, providing an up-to-date snapshot of system performance.
- Sorting Options allow processes to be sorted by CPU usage by default, but pressing Shift + M enables sorting by memory usage instead.
- Process Monitoring can be done by using the -p flag followed by the process ID (PID), allowing you to monitor a specific process closely.
top -p 1234
An example of top output might look like this:
top - 15:00:02 up 1 day, 4:03, 2 users, load average: 0.42, 0.35, 0.30
Tasks: 180 total, 2 running, 178 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.1 us, 2.2 sy, 0.0 ni, 92.1 id, 0.4 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 8026792 total, 123456 free, 2345678 used, 5460658 buff/cache
KiB Swap: 2048000 total, 1755000 free, 293000 used, 1234567 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 user1 20 0 162956 2212 1124 R 25.0 0.3 0:15.03 my_process
5678 user2 20 0 161256 2024 1028 S 12.5 0.2 1:20.03 another_process
Understanding the output:
- The current time is 15:00:02.
- The system has been running for 1 day and 4 hours.
- There are 2 users logged in.
- The load average for the past 1 minute is 0.42, for the past 5 minutes is 0.35, and for the past 15 minutes is 0.30.
- There are 180 total tasks.
- 2 tasks are currently running.
- 178 tasks are sleeping.
- 0 tasks are stopped.
- 0 tasks are zombie processes.
- The CPU is 5.1% in user mode, 2.2% in system mode, 0.0% nice processes, 92.1% idle, 0.4% waiting for I/O, 0.0% servicing hardware interrupts, 0.2% servicing software interrupts, and 0.0% stolen by virtual machines.
- The total memory is 8026792 KiB.
- 123456 KiB of memory is free.
- 2345678 KiB of memory is used.
- 5460658 KiB of memory is used for buffers and cache.
- The total swap memory is 2048000 KiB.
- 1755000 KiB of swap is free.
- 293000 KiB of swap is used.
- 1234567 KiB of memory is available.
- Process with PID 1234 is run by user1 and is using 25% of the CPU and 0.3% of memory.
- Process with PID 5678 is run by user2 and is using 12.5% of the CPU and 0.2% of memory.
Bottom Section lists individual processes with the following columns:
Field | Description |
PID | Process ID. |
USER | User running the process. |
PR | Priority of the process. |
NI | Nice value - a user-space concept to tune the scheduling priority. |
VIRT | Virtual memory size of the process. |
RES | Resident size - the non-swapped physical memory the process is using. |
SHR | Shared memory size. |
S | Process status (e.g., running, sleeping). |
%CPU | Percentage of the CPU used by this process. |
%MEM | Percentage of physical memory used. |
TIME+ | Total CPU time used since the process started. |
COMMAND | Command that started this process. |
Tips for using top
:
- By default, processes are sorted by CPU usage. Press
Shift + M
to sort by memory usage. - Press
k
followed by the PID to kill a process. - Press
r
to change the priority (nice value) of a process. - The display updates automatically. Press
q
to quittop
.
Htop
htop
is an interactive system-monitor process viewer for Linux. It is a more advanced and user-friendly alternative to the traditional top
command. htop
provides a colorful and visually appealing interface, along with various features that enhance process management and system monitoring.
htop
is not pre-installed on most Linux distributions, but it can be easily installed through package managers.
I. Debian/Ubuntu:
sudo apt install htop
II. CentOS/RHEL:
sudo yum install htop
III. Fedora:
sudo dnf install htop
To start htop, simply type:
htop
Main idea:
- Displays all running processes. Unlike top, it updates in real-time and uses color to provide additional information.
- Shows CPU, memory, and swap usage along with load average.
- Allows filtering processes by user or text and searching for specific processes.
- Tree View An optional tree view to see parent-child relationships among processes.
- You can interact with processes (e.g., kill, renice) directly in the interface.
An example output of htop might look like this:
1 [||||||||||| 34.5%] Tasks: 65, 132 thr; 2 running
2 [|||||||||| 28.7%] Load average: 1.23 0.97 0.88
Mem[|||||||||||||||1.45G/3.84G]
Swp[| 0K/512M]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
1287 root 20 0 256M 4980 3192 R 28.6 0.1 0:03.41 /usr/bin/Xorg
2905 user1 20 0 517M 3720 2012 S 14.0 0.1 1:13.69 gnome-terminal
Understanding the output:
- CPU 1 is using 34.5% of its capacity.
- There are 65 tasks and 132 threads in total.
- 2 tasks are running.
- CPU 2 is using 28.7% of its capacity.
- The load average over the last 1 minute is 1.23, over 5 minutes is 0.97, and over 15 minutes is 0.88.
- 1.45 GB of the 3.84 GB of memory is being used.
- 0 KB of the 512 MB of swap space is being used.
- Process with PID 1287, run by root, is using 28.6% of the CPU and 0.1% of the memory. It is running
/usr/bin/Xorg
. - Process with PID 2905, run by user1, is using 14.0% of the CPU and 0.1% of the memory. It is running
gnome-terminal
.
Swap space
Swap space is an area on your hard drive used as virtual memory when your system runs out of physical RAM. It allows your system to continue running even when it has exhausted all available RAM, but it can negatively impact performance, as accessing data from the hard drive is slower than accessing it from RAM.
To view the amount of swap space available on your system, use the free
command:
free -h
Here's an example of what the output might look like:
total used free shared buff/cache available
Mem: 8G 3.2G 2.1G 101M 2.7G 4.4G
Swap: 2G 1.2G 800M
Understanding the output:
- The system has 8 GB of total RAM, out of which 3.2 GB is currently in use.
- A significant portion of RAM (2.7 GB) is dedicated to buffer/cache, which helps in speeding up processes by holding data in RAM for quick access.
- The swap space is relatively small compared to the total RAM, and a significant portion (1.2 GB) is in use, which could indicate heavy memory usage or potential memory pressure on the system.
- The 'available' memory (4.4 GB) is a more relevant indicator than 'free' memory for understanding how much memory is readily available for new applications. This is because Linux tends to use free memory for buffers and cache.
I. Mem (Memory) Section
Field | Description |
total |
Total physical RAM in the system. In this example, it's 8 gigabytes. |
used |
Amount of RAM currently being used. Here, 3.2 gigabytes are in use. |
free |
Amount of RAM that is not being used. This example shows 2.1 gigabytes of free memory. |
shared |
Memory used (mostly) by tmpfs (temporary file storage) and interprocess communication. In the example, it's 101 megabytes. |
buff/cache |
Memory used by the kernel for buffers and caching. In this case, it's 2.7 gigabytes. |
available |
An estimate of how much memory is available for starting new applications, without swapping. Here, it's about 4.4 gigabytes. |
II. Swap Section
Field | Description |
total |
Total swap space available. In this example, it's 2 gigabytes. |
used |
Amount of swap space currently in use. This example shows 1.2 gigabytes being used. |
free |
Amount of swap space not currently in use. In this case, it's 800 megabytes. |
Monitor RAM usage
Monitoring RAM usage is essential for managing system resources efficiently. To do this on Linux systems, the free -h
command is commonly used. It provides information on the total amount of RAM, along with how much is used and free. If the used memory approaches the total amount of RAM, this could signal a need for more RAM or optimization of current usage.
I. Resident Set Size (RSS)
- RSS indicates the current memory usage of a process.
- It excludes swap memory but includes all stack and heap memory.
- Memory from shared libraries is counted, but only if the pages are physically present in memory.
- Some memory can be shared among applications, so the sum of RSS values can exceed the actual RAM.
II. Virtual Set Size (VSZ)
- VSZ represents the total memory allocated to a process at its initiation.
- It encompasses memory that might be swapped out, unused, or shared from libraries.
- This is a broader measure of a process's memory footprint.
Example: Calculating RSS and VSZ
Consider a process with these details:
- Current usage: 450K (binary code), 800K (shared libraries), 120K (stack and heap).
- Initial allocation: 600K (binary code), 2200K (shared libraries), 150K (stack and heap).
Calculations:
I. RSS: Total physical memory usage.
- RSS = Binary Code + Shared Libraries + Stack/Heap
- RSS = 450K + 800K + 120K = 1370K
II. VSZ: Total memory allocation at start.
- VSZ = Initial Binary Code + Initial Shared Libraries + Initial Stack/Heap
- VSZ = 600K + 2200K + 150K = 2950K
Identifying Top Memory-Consuming Processes
To list the 10 processes consuming the most RAM, you can use the command:
ps -e -o pid,vsz,comm= | sort -n -k 2 -r | head 10
Example Output of Command:
PID VSZ COMMAND
1234 2048000 java
5678 1800000 mysqld
9101 1600000 apache2
1213 1500000 postgres
3141 1400000 python
2718 1300000 node
1928 1200000 nginx
2930 1100000 redis-server
3435 1000000 sshd
4756 900000 systemd
Understanding the output:
- The output displays the top 10 processes sorted by virtual memory size (VSZ) in descending order.
- PID is the process identifier.
- VSZ shows the virtual memory size in kilobytes (KB) used by each process.
- COMMAND represents the name of the command or process that is running.
Finding RAM Usage of a Specific Process
In addition to monitoring overall RAM usage, it's often necessary to track the memory usage of a specific process.
For example to check the memory usage of a process named nginx
:
ps -o %mem,rss,vsize,cmd -C nginx
Example Output of Command:
%MEM RSS VSZ CMD
2.3 12000 250000 nginx: master process /usr/sbin/nginx
1.2 6000 150000 nginx: worker process
1.2 6000 150000 nginx: worker process
Understanding the output:
- The first process listed is the nginx master process, while the others are worker processes.
- The master
nginx
process uses 2.3% of the system memory, while each worker process uses 1.2%. - The
nginx
master process is using 12000 KB of RAM, while each worker process is using 6000 KB. - The
nginx
master process is using 250000 KB of virtual memory, while each worker process is using 150000 KB.
The output constists of following columns:
Field | Description |
%MEM |
The percentage of the system's physical memory (RAM) used by the process. |
RSS |
The resident set size, which is the amount of physical memory (in kilobytes) the process is currently using. |
VSZ |
The virtual memory size (in kilobytes) allocated to the process. |
CMD |
The command that started the process, along with any arguments (e.g., the nginx processes). |
Vmstat
vmstat
displays information about system memory, swap, and CPU usage. It provides a snapshot of the current state of the system, as well as the average statistics over a period of time.
To view the current state of the system, use vmstat without any arguments. To view the average statistics over a period of time, use vmstat [interval] [count]
, where interval is the time in seconds between each snapshot and count is the number of snapshots to take.
vmstat 5 3
This command will display 3 snapshots at 5-second intervals.
Example output:
$ vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2723288 844288 5670316 0 0 14 42 49 39 7 5 88 0 0
2 0 0 2729716 844296 5670332 0 0 0 387 8888 12065 3 6 90 0 0
1 0 0 2735688 844304 5670364 0 0 0 436 9379 13069 4 6 90 0 0
Understanding the output:
- 1 process is running, 0 processes are blocked in the first sample.
- No swap is in use (swpd = 0 KB).
- 2723288 KB of memory is free.
- 844288 KB of memory is used for buffers.
- 5670316 KB of memory is used for cache.
- There is no swap-in (si = 0) or swap-out (so = 0) activity.
- 14 blocks were read from disk per second (bi).
- 42 blocks were written to disk per second (bo).
- 49 interrupts occurred per second (in).
- 39 context switches occurred per second (cs).
- 7% of CPU time was spent in user mode.
- 5% of CPU time was spent in system mode.
- 88% of CPU time was idle.
- 0% of CPU time was waiting for I/O (wa).
- 0% of CPU time was stolen by the hypervisor (st).
The output constists of following columns:
Field | Description |
r |
Number of processes waiting for runtime (running or runnable). |
b |
Number of processes in uninterruptible sleep (blocked). |
swpd |
Amount of virtual memory used (swap space) in kilobytes. |
free |
Amount of idle/free memory in kilobytes. |
buff |
Amount of memory used for buffers in kilobytes. |
cache |
Amount of memory used as cache in kilobytes. |
si |
Amount of memory swapped in from disk (swap in) per second in kilobytes. |
so |
Amount of memory swapped out to disk (swap out) per second in kilobytes. |
bi |
Blocks received from a block device (blocks in) per second. |
bo |
Blocks sent to a block device (blocks out) per second. |
in |
Number of interrupts per second. |
cs |
Number of context switches per second. |
us |
Percentage of CPU time spent in user mode. |
sy |
Percentage of CPU time spent in system (kernel) mode. |
id |
Percentage of CPU time spent idle. |
wa |
Percentage of CPU time spent waiting for I/O. |
st |
Percentage of CPU time stolen from the VM by the hypervisor (in virtualized environments). |
Challenges
- Run
top
during peak load times and identify any processes consistently using over 50% CPU. Document these processes and research ways to optimize or replace them for better performance. - Use
iotop
to monitor disk I/O. Select an application you suspect is causing high I/O, run it, and document its I/O usage pattern. Determine if the usage is justifiable or if it needs optimization. - Identify a running service or application suspected of a memory leak. Use
valgrind
or similar tools to trace its memory usage over time. Provide a report with findings and potential solutions. - Monitor a specific service using
nethogs
for real-time network bandwidth usage. Analyze its traffic patterns and propose optimizations to reduce unnecessary network load. - Write a bash script that alerts when disk usage goes beyond 80%. The script should identify the top five directories contributing to disk usage.
- When system load average exceeds 1.0, use a combination of
uptime
,vmstat
, anddmesg
to diagnose the root cause. Document the methodology and findings. - Create a cron job script to gather CPU, memory, and disk usage statistics every hour. Store this data in a log file for a week and analyze it to identify any patterns or anomalies.
- Set up Nagios to monitor a server. Configure it to send an email alert for critical conditions like CPU usage > 90%, disk space < 10%, and RAM usage > 90%.