Last modified: June 06, 2026
This article is written in: 🇺🇸
Performance monitoring is the process of observing how a system uses its resources.
The goal is to understand whether the system is healthy, overloaded, or waiting on a specific bottleneck.
A bottleneck is the resource that limits performance.
Common bottlenecks include:
A system may feel slow for many reasons. Performance monitoring helps avoid guessing.
Instead of saying:
The server is slow.
we want to answer:
A Linux system runs many processes. Those processes compete for CPU, memory, disk, and network resources.
+-------------------+
| Applications |
| nginx, database, |
| browser, scripts |
+---------+---------+
|
v
+-------------------+
| Linux Kernel |
| scheduler, memory |
| filesystem, I/O |
+---------+---------+
|
v
+-------------------+
| Hardware |
| CPU, RAM, disk, |
| network card |
+-------------------+
Monitoring tools observe these layers and show how busy they are.
The most common system usage statistics are:
Each statistic tells a different part of the story.
CPU usage shows how much processing work the system is doing.
High CPU usage can mean:
CPU usage is not automatically bad. A busy CPU may be normal if the system is doing useful work.
The important question is:
Is the CPU busy because of expected work,
or is one process consuming CPU unexpectedly?
RAM is fast working memory.
Linux uses RAM for:
Linux often uses available RAM for cache. This is usually good.
A system can show little “free” memory and still be healthy because cached memory can be reclaimed when applications need it.
The better field to watch is usually:
available memory
not just:
free memory
Swap is disk space used as overflow memory.
Swap helps prevent immediate crashes when RAM is full, but it is much slower than RAM.
Heavy swap usage can make a system feel extremely slow.
RAM is fast.
Swap is much slower because it uses disk.
Some swap usage is not always a problem. Continuous swap-in and swap-out activity is a problem.
Disk usage and disk I/O are different.
Disk usage means how much storage space is filled.
Example:
The filesystem is 95% full.
Disk I/O means how actively the disk is reading and writing.
Example:
The disk is writing 300 MB/s and is 100% busy.
A disk can be almost full but not busy.
A disk can have plenty of free space but still be overloaded with reads and writes.
Load average shows how many processes are running or waiting to run.
It is shown over three time periods:
Example:
load average: 0.42, 0.35, 0.30
On a single-core system, a load of 1.00 roughly means the CPU is fully occupied.
On a four-core system, a load of 4.00 may be normal under full CPU use.
However, load average can also increase when processes are waiting on disk I/O, not just CPU.
So high load means:
There is work waiting.
It does not always mean:
The CPU is the bottleneck.
A good performance investigation follows a structured path.
Useful starting commands:
uptime
top
free -h
vmstat 1
iostat -xz 1
df -h
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
topThe top command provides a live view of system activity.
Run:
top
It shows two main sections:
The system summary shows CPU, memory, swap, load average, task count, and uptime.
The process list shows running processes, usually sorted by CPU usage.
top Outputtop - 15:00:02 up 1 day, 4:03, 2 users, load average: 0.42, 0.35, 0.30
Tasks: 180 total, 2 running, 178 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.1 us, 2.2 sy, 0.0 ni, 92.1 id, 0.4 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 8026792 total, 123456 free, 2345678 used, 5460658 buff/cache
KiB Swap: 2048000 total, 1755000 free, 293000 used, 1234567 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 user1 20 0 162956 2212 1124 R 25.0 0.3 0:15.03 my_process
5678 user2 20 0 161256 2024 1028 S 12.5 0.2 1:20.03 another_process
topExample:
%Cpu(s): 5.1 us, 2.2 sy, 92.1 id, 0.4 wa
Interpretation:
topImportant process states:
top KeysTo monitor one process:
top -p 1234
htophtop is an interactive and more user-friendly alternative to top.
It shows CPU bars, memory bars, process lists, searching, filtering, tree view, and easier process management.
Install it on Debian or Ubuntu:
sudo apt install htop
On Red Hat or CentOS:
sudo yum install htop
On Fedora:
sudo dnf install htop
Run:
htop
htop View1 [||||||||||| 34.5%] Tasks: 65, 132 thr; 2 running
2 [|||||||||| 28.7%] Load average: 1.23 0.97 0.88
Mem[|||||||||||||||1.45G/3.84G]
Swp[| 0K/512M]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
1287 root 20 0 256M 4980 3192 R 28.6 0.1 0:03.41 /usr/bin/Xorg
2905 user1 20 0 517M 3720 2012 S 14.0 0.1 1:13.69 gnome-terminal
Interpretation:
htop is useful when you want to interactively inspect and manage processes.
freeThe free command shows memory and swap usage.
Run:
free -h
The -h option shows human-readable units.
Example output:
total used free shared buff/cache available
Mem: 8G 3.2G 2.1G 101M 2.7G 4.4G
Swap: 2G 1.2G 800M
free -hImportant memory fields:
Important swap fields:
Interpretation of the example:
The most important field for practical memory pressure is usually:
available
If available is low and swap activity is high, the system may be under memory pressure.
Linux process memory can be confusing because there are multiple memory measurements.
Two important fields are:
RSS means Resident Set Size.
It is the amount of physical RAM currently used by the process.
RSS is usually more useful than VSZ when asking:
How much real RAM is this process using right now?
However, RSS includes shared memory pages, so adding RSS values for many processes can overcount total RAM.
VSZ means Virtual Set Size, or virtual memory size.
It includes memory that may be:
VSZ can look large even when actual RAM use is modest.
A common mistake is to treat VSZ as real RAM usage. For physical RAM pressure, check RSS and %MEM.
Suppose a process currently uses:
RSS is:
450K + 800K + 120K = 1370K
Suppose the process has virtually allocated:
VSZ is:
600K + 2200K + 150K = 2950K
The process has a larger virtual memory footprint than physical resident memory.
To show processes sorted by real physical memory percentage:
ps aux --sort=-%mem | head -n 10
Example output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mysql 5678 12.0 18.5 2540000 1500000 ? Sl 10:00 3:20 mysqld
java 1234 25.0 15.0 4096000 1210000 ? Sl 09:50 8:10 java
postgres 1213 5.0 8.0 1500000 650000 ? Sl 09:55 2:30 postgres
Interpretation:
To sort by VSZ instead:
ps -e -o pid,vsz,rss,comm --sort=-vsz | head -n 10
Important note:
Example for nginx:
ps -o %mem,rss,vsz,cmd -C nginx
Example output:
%MEM RSS VSZ CMD
2.3 12000 250000 nginx: master process /usr/sbin/nginx
1.2 6000 150000 nginx: worker process
1.2 6000 150000 nginx: worker process
Interpretation:
vmstatvmstat shows process, memory, swap, disk I/O, system, and CPU statistics.
Run a single snapshot:
vmstat
Run updates every second:
vmstat 1
Run three samples five seconds apart:
vmstat 5 3
vmstat Outputprocs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2723288 844288 5670316 0 0 14 42 49 39 7 5 88 0 0
2 0 0 2729716 844296 5670332 0 0 0 387 8888 12065 3 6 90 0 0
1 0 0 2735688 844304 5670364 0 0 0 436 9379 13069 4 6 90 0 0
Important fields:
Interpretation of this example:
uptimeuptime is a quick way to check how long the system has been running and what the load average is.
Run:
uptime
Example output:
15:00:02 up 1 day, 4:03, 2 users, load average: 0.42, 0.35, 0.30
Interpretation:
iostatiostat reports CPU and disk I/O statistics.
Install it through sysstat if needed:
sudo apt install sysstat
Run:
iostat -xz 1
Important disk fields:
Example output:
Device r/s w/s rkB/s wkB/s await aqu-sz %util
sda 1.00 2.00 50.00 100.00 2.20 0.01 0.15
Interpretation:
iotopiotop shows disk I/O by process.
Install:
sudo apt install iotop
Run:
sudo iotop -o
The -o option shows only processes currently doing I/O.
Example output:
Total DISK READ: 100.00 K/s | Total DISK WRITE: 50.00 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
7890 be/4 user 50.00 K/s 25.00 K/s 0.00 % 10.00 % process_a
5678 be/4 user 50.00 K/s 25.00 K/s 0.00 % 5.00 % process_b
Interpretation:
iotop is useful when you know the disk is busy and want to know which process is responsible.
df and dudf shows filesystem space usage.
Run:
df -h
Example:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 100G 92G 8.0G 92% /
Interpretation:
du shows directory usage.
Example:
sudo du -h --max-depth=1 /var | sort -h
Example output:
100M /var/tmp
2.0G /var/log
12G /var/lib
15G /var
Interpretation:
Create high CPU usage and verify it with top, htop, and vmstat.
Install stress-ng if needed:
sudo apt install stress-ng
Run a CPU stress test:
stress-ng --cpu 4 --timeout 60s
This starts four CPU workers for 60 seconds.
toptop
Example output:
%Cpu(s): 96.0 us, 3.0 sy, 0.0 ni, 1.0 id, 0.0 wa
PID USER PR NI VIRT RES SHR S %CPU %MEM COMMAND
4321 user 20 0 50000 8000 2000 R 399.0 0.1 stress-ng-cpu
Interpretation: - CPU user time is very high. - Idle time is almost zero. - stress-ng is using about four CPU cores. - I/O wait is zero, so this is not a disk bottleneck.
vmstatvmstat 1
Example output:
r b swpd free buff cache si so bi bo in cs us sy id wa st
5 0 0 800000 20000 500000 0 0 0 1 3000 6000 95 4 1 0 0
Interpretation:
Example:
nice -n 10 command
Create memory pressure and observe it with free, top, and vmstat.
Run:
stress-ng --vm 2 --vm-bytes 70% --timeout 60s
This starts memory workers that allocate memory.
freefree -h
Example output:
total used free shared buff/cache available
Mem: 8.0G 6.9G 250M 120M 850M 600M
Swap: 2.0G 100M 1.9G
Interpretation: - Used memory is high. - Available memory is low. - Swap has started to be used. - The system is under memory pressure.
vmstatvmstat 1
Example output:
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 1 200000 100000 12000 200000 100 300 800 1500 2500 7000 30 15 45 10 0
Interpretation:
Show how heavy swap activity can slow a system.
Use a stronger memory test only on a lab system:
stress-ng --vm 4 --vm-bytes 90% --timeout 60s
vmstatvmstat 1
Example output:
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 6 1500000 50000 8000 90000 5000 7000 12000 18000 5000 15000 15 20 20 45 0
Interpretation: - swpd is high. - si and so are very high. - b is high, meaning blocked processes. - wa is high, meaning the CPU waits on disk. - This is swap thrashing.
The system may feel frozen because it is constantly moving memory pages between RAM and disk.
Create heavy disk writes and verify them with iostat, iotop, and vmstat.
Install tools:
sudo apt install fio sysstat iotop
Run a safe file-based write test:
mkdir -p ~/perf-lab
fio --name=write-test \
--directory=~/perf-lab \
--size=1G \
--rw=write \
--bs=1M \
--direct=1 \
--runtime=60 \
--time_based
iostatiostat -xz 1
Example output:
Device r/s w/s rkB/s wkB/s await aqu-sz %util
sda 0.00 350.00 0.00 350000.0 32.50 10.20 99.60
Interpretation: - w/s and wkB/s are high. - await is elevated. - aqu-sz shows queueing. - %util is close to 100%. - The disk is saturated by writes.
iotopsudo iotop -o
Example output:
Total DISK WRITE: 340.00 M/s
TID PRIO USER DISK READ DISK WRITE IO> COMMAND
5221 be/4 user 0.00 B/s 338.00 M/s 92% fio --name=write-test
Interpretation:
Example:
ionice -c3 backup-command
Create a nearly full filesystem in a safe test directory and diagnose it.
Create a large test file:
mkdir -p ~/perf-lab
fallocate -l 1G ~/perf-lab/bigfile.img
Check disk usage:
du -sh ~/perf-lab
Example output:
1.1G /home/user/perf-lab
Check filesystem space:
df -h ~
Example output:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 20G 18G 2.0G 90% /
Interpretation: - The filesystem is 90% full. - The test directory contributes about 1.1 GB. - If this were production, the system could soon fail writes or logs.
du -h --max-depth=1 ~ | sort -h
Example output:
100M /home/user/Documents
500M /home/user/Downloads
1.1G /home/user/perf-lab
2.0G /home/user
Interpretation:
perf-lab is one of the largest directories under the home directory.
rm -rf ~/perf-lab
Understand load average when CPU is the bottleneck.
stress-ng --cpu 4 --timeout 120s
uptime
Example output:
15:30:00 up 2 days, 1 user, load average: 4.20, 2.10, 1.00
Check CPU count:
nproc
Example output:
4
Interpretation: - The 1-minute load is about 4.20. - The system has 4 CPUs. - This indicates the CPU is near full utilization.
Confirm with top:
High us, low id, low wa = CPU-bound load.
Show that high load can come from I/O wait, not just CPU work.
Run a disk-heavy workload:
fio --name=randwrite-test \
--directory=~/perf-lab \
--size=1G \
--rw=randwrite \
--bs=4k \
--numjobs=4 \
--iodepth=32 \
--direct=1 \
--runtime=60 \
--time_based
uptime
vmstat 1
iostat -xz 1
Example vmstat output:
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 8 0 500000 20000 700000 0 0 0 75000 3000 9000 5 8 15 72 0
Example iostat output:
Device r/s w/s rkB/s wkB/s await aqu-sz %util
sda 0.00 5200.00 0.00 20800.0 48.30 25.60 99.90
Interpretation: - b is high, meaning blocked processes. - wa is high, meaning CPU is waiting for I/O. - Disk %util is near 100%. - This high load is caused by disk I/O wait, not CPU computation.
Find which process is consuming RAM.
Start a memory workload:
stress-ng --vm 1 --vm-bytes 1G --timeout 120s
psps aux --sort=-%mem | head -n 10
Example output:
USER PID %CPU %MEM VSZ RSS COMMAND
user 7001 80.0 12.5 1200000 1024000 stress-ng-vm
mysql 5678 10.0 8.0 2500000 650000 mysqld
Interpretation: - stress-ng-vm is using the most physical RAM. - RSS is about 1 GB. - This process is responsible for memory pressure.
ps -o pid,%mem,rss,vsz,cmd -p 7001
Example:
PID %MEM RSS VSZ CMD
7001 12.5 1024000 1200000 stress-ng-vm
Understand zombie processes and how to identify them.
A zombie process has finished running but still has an entry in the process table because its parent has not collected its exit status.
Create a file:
cat > /tmp/make-zombie.py <<'EOF'
import os
import time
pid = os.fork()
if pid == 0:
os._exit(0)
else:
time.sleep(60)
EOF
Run it:
python3 /tmp/make-zombie.py
In another terminal:
ps -eo pid,ppid,state,cmd | grep ' Z '
Example output:
8123 8122 Z [python3] <defunct>
Interpretation: - State Z means zombie. - The child process exited. - The parent process has not collected it yet. - A few short-lived zombies are usually harmless. - Many zombies may indicate a broken parent process.
Usually fix or restart the parent process.
In this simulation, wait 60 seconds or stop the parent script.
Create a simple script that warns when a filesystem is too full and lists the largest directories.
cat > ~/check-disk-usage.sh <<'EOF'
#!/bin/bash
THRESHOLD=80
TARGET="/"
USAGE=$(df -P "$TARGET" | awk 'NR==2 {gsub("%","",$5); print $5}')
if [ "$USAGE" -ge "$THRESHOLD" ]; then
echo "WARNING: $TARGET is ${USAGE}% full"
echo
echo "Top directories under /:"
sudo du -xhd1 / 2>/dev/null | sort -h | tail -n 5
else
echo "OK: $TARGET is ${USAGE}% full"
fi
EOF
chmod +x ~/check-disk-usage.sh
Run:
~/check-disk-usage.sh
Example output:
WARNING: / is 87% full
Top directories under /:
1.2G /opt
2.5G /home
4.0G /var
8.0G /usr
18G /
Interpretation: - The root filesystem is above the threshold. - The largest top-level directories are listed. - Investigate /var, /usr, or /home depending on what is unexpectedly large.
Collect simple performance statistics over time.
cat > ~/perf-snapshot.sh <<'EOF'
#!/bin/bash
LOG="$HOME/perf-history.log"
{
echo "===== $(date) ====="
echo "--- uptime ---"
uptime
echo "--- memory ---"
free -h
echo "--- disk space ---"
df -h /
echo "--- top CPU processes ---"
ps aux --sort=-%cpu | head -n 6
echo "--- top memory processes ---"
ps aux --sort=-%mem | head -n 6
echo
} >> "$LOG"
EOF
chmod +x ~/perf-snapshot.sh
Run manually:
~/perf-snapshot.sh
Add to cron:
crontab -e
Add:
0 * * * * /home/user/perf-snapshot.sh
Interpretation: - The script records a basic hourly snapshot. - After several days, compare timestamps to identify peak usage times.
Use this guide to interpret common patterns.
Example:
top: us = 95%, id = 1%, wa = 0%
Likely cause:
CPU-bound workload
Check:
ps aux --sort=-%cpu | head
Example:
top: wa = 70%
vmstat: b is high
iostat: %util is 99%
Likely cause:
disk I/O bottleneck
Check:
iostat -xz 1
sudo iotop -o
Example:
free: available memory is low
vmstat: si and so are high
Likely cause:
memory pressure or memory leak
Check:
ps aux --sort=-%mem | head
Example:
uptime: load average high
top: CPU mostly idle
vmstat: b high, wa high
Likely cause:
processes blocked on I/O
Check:
vmstat 1
iostat -xz 1
ps -eo pid,stat,cmd | awk '$2 ~ /D/ {print}'
Example:
df -h: Use% above 90%
Likely cause:
logs, cache, backups, database files, or user data consuming space
Check:
sudo du -xhd1 / | sort -h
General:
uptime
top
htop
vmstat 1
free -h
CPU:
ps aux --sort=-%cpu | head
top -p PID
Memory:
free -h
ps aux --sort=-%mem | head
ps -o pid,%mem,rss,vsz,cmd -p PID
Disk space:
df -h
du -h --max-depth=1 DIRECTORY | sort -h
Disk I/O:
iostat -xz 1
sudo iotop -o
vmstat 1
Process states:
ps -eo pid,ppid,state,cmd
ps -eo pid,stat,cmd | awk '$2 ~ /D/ {print}'
Stress testing in labs:
stress-ng --cpu 4 --timeout 60s
stress-ng --vm 2 --vm-bytes 70% --timeout 60s
fio --name=write-test --directory=~/perf-lab --size=1G --rw=write --bs=1M --direct=1 --runtime=60 --time_based
Before simulating bottlenecks:
Clean up test data:
rm -rf ~/perf-lab
top during normal system use. Identify the top CPU-consuming process and explain whether its usage is expected.htop and sort by memory usage. Compare the top memory process with the output of ps aux --sort=-%mem.free -h to record total, used, free, buff/cache, available, and swap usage. Explain why available is more useful than free.vmstat 1 during normal use and during a CPU stress test. Compare r, us, sy, id, and wa.stress-ng and observe free -h and vmstat 1.fio and observe iostat -xz 1 and iotop.df -h and du to identify the largest directories on a test filesystem./ is above 80% usage.