Hardware

Last modified: October 18, 2018

This article is written in: 🇺🇸

Hardware

Linux is a known for its ability to run on a broad range of hardware, from desktops and servers to embedded systems and IoT devices. Its modular kernel design allows efficient hardware management, enabling Linux to support various processors, GPUs, storage devices, and peripherals. With a vast collection of drivers, Linux can interface with both cutting-edge and older hardware, making it a popular choice for developers and organizations seeking flexibility, performance, and cost-effectiveness in managing diverse hardware environments.

Hardware Compatibility

Linux is renowned for its extensive hardware compatibility, supporting a vast array of devices ranging from modern desktops and laptops to servers, embedded systems, and even legacy hardware. This broad compatibility is largely due to the collaborative efforts of the global open-source community, which actively develops and maintains drivers for numerous hardware components.

However, there may be instances where Linux encounters challenges with certain hardware, particularly proprietary or newly released devices for which manufacturers have not provided drivers or specifications. In such cases, open-source developers may reverse-engineer drivers, but this can lead to delayed or incomplete support.

To mitigate compatibility issues:

Many hardware manufacturers now provide official Linux drivers or contribute directly to open-source drivers.
Linux distributions often include HCLs and tools to detect and configure hardware automatically.
Forums and community documentation can offer workarounds and solutions for unsupported hardware.

Hardware Architecture Support

One of Linux's significant strengths is its support for multiple hardware architectures, making it a versatile operating system suitable for various environments.

Supported Architectures Include:

Architecture	Description
x86 and x86_64 (AMD64)	Commonly used in personal computers and servers.
ARM and ARM64	Used in mobile devices, single-board computers (e.g., Raspberry Pi), and embedded systems.
PowerPC (PPC)	Employed in some enterprise servers and older Apple hardware.
MIPS	Found in routers and networking equipment.
SPARC	Used in high-end servers and workstations.
RISC-V	An emerging open-source hardware architecture gaining popularity.

Benefits:

Ability to deploy Linux across diverse hardware platforms.
Suitable for both resource-constrained devices and high-performance computing systems.
Encourages development on new and emerging architectures.

Challenges:

Variations in hardware may require architecture-specific optimizations.
Some architectures may have less community support or documentation.

Accessing Hardware

In Linux, hardware devices are represented as files within the /dev directory, adhering to the Unix philosophy that "everything is a file." This abstraction simplifies hardware interaction and allows for consistent access methods across different device types.

Device Files in /dev:

Device Type	Description	Examples
Block Devices	Represent devices that transfer data in fixed-size blocks (e.g., hard drives, SSDs).	`/dev/sda` (first SATA drive), `/dev/nvme0n1` (first NVMe SSD)
Character Devices	Represent devices that transfer data character by character (e.g., keyboards, serial ports).	`/dev/ttyS0` (first serial port), `/dev/input/event0` (first input device)
Special Devices	Virtual devices providing specific functionalities.	`/dev/null` (data sink), `/dev/zero` (infinite zero bytes), `/dev/random` (random data generator)

Access Control and Permissions:

Device files have standard Linux permissions (read, write, execute) that control user and group access.
udev Device Manager manages dynamic creation and removal of device files, handles permissions, and maintains persistent device naming.

Interacting with Device Files:

Programs use standard system calls (open, read, write, ioctl) to interact with devices.
Utilities like dd, cat, or specialized tools can read from or write to device files for testing or configuration.

Example: Reading from a Device File

# Read the first 512 bytes from a disk
sudo dd if=/dev/sda of=boot_sector.bin bs=512 count=1

Managing Hardware

Efficient hardware management ensures optimal system performance and stability. Linux provides a comprehensive set of tools and commands to manage hardware effectively.

Gathering Hardware Information

Internal Hardware Information

To obtain detailed information about the system's hardware components:

I. lspci lists all PCI devices and details.

lspci -vvv

Options:

-v increases verbosity (up to -vvv for maximum detail).
-k shows kernel drivers and modules handling each device.

Example Output:

00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
Subsystem: Dell Device 1234
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information: Len=10 <? >

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
Subsystem: Dell Device 5678
Flags: bus master, fast devsel, latency 0, IRQ 127
Memory at 00000000 (64-bit, non-prefetchable) [size=16M]
Capabilities: [40] Vendor Specific Information: Len=0c <? >

Each line represents a PCI device, identified by its bus address (e.g., 00:02.0).
The description provides the device type, manufacturer, and model.
Details like Flags, Subsystem, and Memory offer insights into device features and configurations.
Useful for identifying hardware components and verifying driver installations.

II. lsusb displays information about USB buses and connected devices.

lsusb -v

Options:

-v makes output verbose.

Example Output:

Bus 002 Device 003: ID 0bda:5689 Realtek Semiconductor Corp. Integrated Webcam
Device Descriptor:
bLength                18
bDescriptorType         1
bcdUSB               2.00
bDeviceClass          239 Miscellaneous Device
bDeviceSubClass         2
bDeviceProtocol         1
iManufacturer           1 Realtek
iProduct                2 Integrated Webcam

Lists USB devices with details like Bus and Device numbers.
ID shows the Vendor ID and Product ID, helpful for identifying the exact device model.
iManufacturer and iProduct provide human-readable names.
Use this to verify connected USB devices and troubleshoot recognition issues.

III. lscpu shows CPU architecture information.

lscpu

Example Output:

Architecture:            x86_64
CPU op-mode(s):          32-bit, 64-bit
Byte Order:              Little Endian
CPU(s):                  8
On-line CPU(s) list:     0-7
Thread(s) per core:      2
Core(s) per socket:      4
Socket(s):               1
Vendor ID:               GenuineIntel
Model name:              Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
CPU MHz:                 1992.000
L1d cache:               32K
L1i cache:               32K
L2 cache:                256K
L3 cache:                8192K

Provides detailed CPU information, including architecture and capabilities.
CPU(s): Total number of logical processors (cores × threads).
Model name: Specific CPU model installed.
Cache sizes impact performance; larger caches can improve speed for certain tasks.

IV. lsblk lists block devices (storage devices) and their mount points.

lsblk -a

Options:

-a includes empty devices.

Example Output:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 238.5G  0 disk 
├─sda1   8:1    0   512M  0 part /boot/efi
├─sda2   8:2    0     1G  0 part /boot
└─sda3   8:3    0 237G    0 part /
sr0     11:0    1  1024M  0 rom

Displays a tree of storage devices (sda, sr0) and their partitions (sda1, sda2, sda3).
TYPE indicates if it's a disk, partition, or optical drive (rom).
MOUNTPOINT shows where partitions are mounted in the filesystem.
Useful for understanding disk layout and managing storage.

V. lshw provides comprehensive hardware details (CPU, memory, disks, network, etc.).

sudo lshw -short

Example Output:

H/W path        Device      Class          Description
======================================================
        system         Latitude 5480
/0                           bus           0C7KXG
/0/0                         memory        64KiB BIOS
/0/4                         processor     Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
/0/4/5                       memory        256KiB L1 cache
/0/4/6                       memory        1MiB L2 cache
/0/4/7                       memory        3MiB L3 cache
/0/1c                        memory        8GiB System Memory
/0/1c/0                      memory        8GiB SODIMM DDR4 Synchronous 2133 MHz
/0/100                       bridge        Sunrise Point-LP PCI Express Root Port
/0/100/14                    bus           Sunrise Point-LP USB 3.0 xHCI Controller
/0/100/14/0      usb1        bus           xHCI Host Controller
/0/100/14/1      usb2        bus           xHCI Host Controller
/0/100/1d                    bridge        Sunrise Point-LP PCI Express Root Port
/0/100/1d/0      wlp2s0      network       Wireless 8265 / 8275

Shows a hierarchical listing of hardware components.
Class indicates the type of hardware (e.g., processor, memory, network).
Use this for a quick overview of the system's hardware configuration.

Plugged Devices Information

Monitoring connected devices and system events:

I. dmesg prints kernel ring buffer messages, useful for viewing system events and hardware-related messages.

dmesg | tail -50

Options:

-T shows human-readable timestamps.

Example Output:

[ 456.789123] usb 1-1: new high-speed USB device number 3 using xhci_hcd
[ 456.928756] usb 1-1: New USB device found, idVendor=0781, idProduct=5591
[ 456.928763] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 456.928767] usb 1-1: Product: Ultra USB 3.0
[ 456.928771] usb 1-1: Manufacturer: SanDisk
[ 456.928774] usb 1-1: SerialNumber: 4C530001260920112145
[ 456.929456] usb-storage 1-1:1.0: USB Mass Storage device detected
[ 456.929678] scsi host6: usb-storage 1-1:1.0
[ 457.930123] scsi 6:0:0:0: Direct-Access     SanDisk  Ultra USB 3.0   1.00 PQ: 0 ANSI: 6
[ 457.931456] sd 6:0:0:0: Attached scsi generic sg2 type 0
[ 457.932345] sd 6:0:0:0: [sdb] 60062500 512-byte logical blocks: (30.7 GB/28.6 GiB)
[ 457.932567] sd 6:0:0:0: [sdb] Write Protect is off
[ 457.932570] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00
[ 457.932789] sd 6:0:0:0: [sdb] No Caching mode page found
[ 457.932793] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 457.935678]  sdb: sdb1
[ 457.936789] sd 6:0:0:0: [sdb] Attached SCSI removable disk

Displays recent kernel messages, particularly useful after plugging in a device.
Shows device detection steps, including driver assignments and storage allocations.
Use this to troubleshoot hardware recognition and driver issues.

II. udevadm monitor monitors udev events for real-time hardware changes.

sudo udevadm monitor --environment --udev

Options:

--kernel monitors kernel events.
--udev monitors udev events.
--environment prints the environment for each event.

Example Output:

UDEV  [456.789123] add /devices/pci0000:00/0000:00:14.0/usb1/1-1 (usb)
ACTION=add
DEVNAME=/dev/bus/usb/001/003
DEVNUM=003
DEVPATH=/devices/pci0000:00/0000:00:14.0/usb1/1-1
DEVTYPE=usb_device
ID_BUS=usb
ID_MODEL=Ultra_USB_3.0
ID_VENDOR=SanDisk
...

UDEV  [457.930123] add /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.0 (usb)
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.0
DEVTYPE=usb_interface
...

Real-time monitoring of device events.
ACTION indicates what's happening (add, remove, change).
Environment variables provide detailed information about each event.
Helpful for diagnosing issues with device recognition and udev rules.

Monitoring Hardware

Monitoring hardware performance and system health is crucial for proactive maintenance.

I. top and htop display real-time system processes and resource usage.

htop

Example Output:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
1234 user      20   0  162476   9568   5824 S   0.7  0.1   0:03.17 bash
5678 user      20   0  322820  25520  19340 S   0.3  0.3   0:05.61 gnome-terminal
9101 user      20   0 1629380 119964  73452 S   2.0  1.5   1:12.34 firefox

Field	Description
`PID`	Process ID
`%CPU`	Percentage of CPU usage
`%MEM`	Percentage of RAM usage
`TIME+`	Total CPU time consumed

Use this to identify resource-intensive processes and manage them accordingly.

II. vmstat reports virtual memory statistics and system processes.

vmstat 5

Outputs data every 5 seconds.

Example Output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0      0  823456  23456 345678    0    0     1     2    3    4  5  1 93  1  0
0  0      0  823400  23450 345690    0    0     0     1  250  500  4  1 95  0  0

Field	Description
`r`	Number of processes waiting to run
`free`	Amount of free memory
`buff` and `cache`	Memory used for buffers and cache
`us`, `sy`, `id`, `wa`	User, system, idle, and wait CPU percentages

Monitor overall system performance and identify bottlenecks.

III. iostat provides CPU and I/O statistics for devices and partitions.

iostat -xz 1

Options:

-x: Extended statistics.
-z: Omit devices with no activity.
1: Update every second.

Example Output:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
2.00    0.00    1.00    0.50    0.00   96.50

Device            r/s     w/s     rkB/s   wkB/s  rrqm/s  wrqm/s  %util
sda              1.00    2.00     50.00  100.00    0.00    0.00   0.15

Field	Description
`r/s`, `w/s`	Read/write operations per second
`rkB/s`, `wkB/s`	Kilobytes read/written per second
`%util`	How busy the device is (100% means fully utilized)

Helps in identifying disk I/O bottlenecks.

IV. netstat and ss provide network statistics and socket information.

netstat -tulnp
ss -tunap

Option	Description
`-t`	TCP connections
`-u`	UDP connections
`-l`	Listening sockets
`-n`	Show numerical addresses
`-p`	Show process using the socket

Example Output (netstat -tulnp):

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1234/sshd
tcp6       0      0 :::80                   :::*                    LISTEN      5678/nginx
udp        0      0 0.0.0.0:68              0.0.0.0:*                           9101/dhclient

Shows services listening on network ports.
Local Address: Port on which the service is listening.
PID/Program name: Process ID and associated program.
Use this to ensure that only intended services are running.

V. sensors (from lm_sensors package) monitors system temperatures, voltages, and fan speeds.

sensors

Requires configuration with sensors-detect.

Example Output:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +55.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +54.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +53.0°C  (high = +80.0°C, crit = +100.0°C)

Displays temperatures for CPU package and individual cores.
high and crit values indicate warning and critical temperature thresholds.
Useful for monitoring system thermals and preventing overheating.

VI. glances is a cross-platform monitoring tool integrating various system metrics.

glances

Comprehensive overview including CPU, memory, disks, network, processes.

Example Output:

Field	Description
CPU Usage	Real-time graph and percentage
Memory Usage	Shows used and free memory
Network Traffic	Displays current upload and download rates
Processes	Lists top processes by CPU or memory usage

Provides a holistic view of system performance.
Color-coded indicators help identify potential issues quickly.
Use keyboard shortcuts to navigate and customize the display.

VII. nmon is a performance monitoring tool providing detailed statistics.

nmon

Interactive interface for real-time monitoring.

Example Output:

Field	Description
CPU Statistics	Press `c` to view CPU usage graphs
Memory Usage	Press `m` to display memory and swap usage
Disk I/O	Press `d` to see disk read/write statistics
Network	Press `n` for network interface statistics

Interactive keys allow you to focus on specific metrics.
Graphs and numerical data provide insight into resource utilization.
Ideal for performance tuning and identifying system bottlenecks.

Configuring Hardware

Proper configuration ensures hardware devices operate efficiently and according to system requirements.

I. hdparm can be used to get/set SATA/IDE device parameters.

sudo hdparm -I /dev/sda

Options:

-I displays detailed device information.

Example Output:

/dev/sda:

ATA device, with non-removable media
Model Number:       Samsung SSD 860 EVO 500GB
Serial Number:      S3Z9NB0K123456X
Firmware Revision:  RVT02B6Q
Transport:          Serial, ATA8-AST, SATA III
Standards:
Used: ATA8-ACS revision 4
Supported: 8 7 6 5
Configuration:
Logical         max     current
cylinders       16383   16383
heads           16      16
sectors/track   63      63
--
CHS current addressable sectors:   16514064
LBA    user addressable sectors:  268435455
LBA48  user addressable sectors:  976773168
Logical  Sector size:                   512 bytes
Physical Sector size:                   512 bytes
device size with M = 1024*1024:      476940 MBytes
device size with M = 1000*1000:      500107 MBytes (500 GB)

Field	Description
Model and Serial Number	Identifies the exact disk installed
Firmware Revision	Indicates the firmware version, which may be relevant for updates
Standards Supported	Shows the ATA/ATAPI standards the device complies with
Device Size	Confirms the storage capacity
Sector Sizes	Important for alignment when partitioning disks

II. sdparm controls SCSI device parameters.

sudo sdparm --all /dev/sdb

Example Output:

/dev/sdb: ATA       WDC WD10EZEX-08WN4A0  01.01A01

Peripheral device type: disk
Mode parameter header:
Mode data length=0x00, Medium type=0x00, Device-specific parameter=0x00, Block descriptor length=0x00

Caching (SBC) mode page:
IC       (Initiator Control):  0
ABPF     (Abort Pre-fetch):    0
CAP      (Caching Analysis Permitted):  0
DISC     (Discontinuity):      0
SIZE     (Size Enable):        0
WCE      (Write Cache Enable): 1
MF       (Multiplication Factor): 0
RCD      (Read Cache Disable): 0

Peripheral Device Type confirms the type (disk).
Caching Mode Page displays cache settings like write cache enable (WCE).
Parameters can be adjusted to optimize performance or behavior.

III. xrandr configures display settings on systems using X11.

xrandr --output HDMI-1 --mode 1920x1080 --rate 60 --primary

Options:

--output specifies the display output.
--mode sets the resolution.
--rate sets the refresh rate.
--primary sets as primary display.

Example Output:

Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
HDMI-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 510mm x 290mm
1920x1080     60.00*+  59.94    50.00
1680x1050     59.88
1280x1024     75.02    60.02
1024x768      75.03    70.07    60.00

Shows which display outputs are connected (e.g., HDMI-1).
Lists resolutions and refresh rates; an asterisk (*) indicates the current mode.
Confirms which display is set as primary.

IV. alsamixer is an interactive command-line mixer for ALSA sound system.

alsamixer

Navigation:

Use arrow keys to adjust volumes.
Press M to mute/unmute channels.
Press F6 to select different sound cards.

Example Output:

┌────────────────────────────── AlsaMixer v1.2.2 ──────────────────────────────────┐
│ Card: PulseAudio                                      F1:  Help                  │
│ Chip: PulseAudio                                      F2:  System information    │
│ View: F3:[Playback] F4: Capture  F5: All              F6:  Select sound card     │
│ Item: Master [dB gain: 0.00]                          Esc: Exit                  │
│                                                                                  │
│     ┌──┐     ┌──┐     ┌──┐                                                       │
│     │▐▐│     │▐▐│     │▐▐│                                                       │
│     │▐▐│     │▐▐│     │▐▐│                                                       │
│     │▐▐│     │▐▐│     │▐▐│                                                       │
│     │▐▐│     │▐▐│     │▐▐│                                                       │
│     └──┘     └──┘     └──┘                                                       │
│      100       100       100                                                     │
│     Master     PCM      Mic                                                      │
└──────────────────────────────────────────────────────────────────────────────────┘

Master controls the overall system volume.
PCM adjusts the volume for digital audio.
Mic adjusts the microphone input level.
Bars are visual representation of volume levels; the filled areas represent the current setting.

V. amixer is a scriptable mixer for automation and scripting.

amixer set Master unmute
amixer set Master 75%

Example Output:

Simple mixer control 'Master',0
Capabilities: pvolume pswitch pswitch-joined
Playback channels: Front Left - Front Right
Limits: Playback 0 - 65536
Mono:
Front Left: Playback 49152 [75%] [on]
Front Right: Playback 49152 [75%] [on]

Playback Channels indicates stereo channels (Front Left and Front Right).
Limits and Levels shows the range and current volume setting.
[on]/[off] indicates whether the channel is muted.

VI. ip is a modern tool to configure network interfaces.

sudo ip addr show
sudo ip link set eth0 up

Example Output (ip addr show):

2: eth0: <no-carrier,broadcast,multicast,up> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether 00:0a:95:9d:68:16 brd ff:ff:ff:ff:ff:ff

eth0 is the network interface.
Indicates the interface is not active.
ip link set eth0 up brings the interface up.

VII. iwconfig configures wireless network interfaces.

sudo iwconfig wlan0 essid "YourSSID" key s:YourPassword

Example Output (iwconfig):

wlan0     IEEE 802.11  ESSID:"YourSSID"  Nickname:"<wifi@realtek>"
Mode:Managed  Frequency:2.437 GHz  Access Point: 00:14:D1:1A:2B:3C
Bit Rate:72.2 Mb/s   Sensitivity:0/0
Retry:off   RTS thr:off   Fragment thr:off
Power Management:off
Link Quality=70/70  Signal level=-40 dBm  Noise level=-96 dBm

ESSID is the name of the wireless network you're connected to.
Mode indicates the operation mode (Managed means it's a client).
Signal Level shows the strength of the connection.

VIII. rfkill is used to enable or disable wireless devices.

rfkill list
rfkill unblock bluetooth

Example Output (rfkill list):

0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
1: hci0: Bluetooth
Soft blocked: yes
Hard blocked: no

Soft Blocked is a software-level block (can be toggled via rfkill).
Hard Blocked is a physical switch or BIOS setting (cannot be changed via software).
rfkill unblock bluetooth removes the software block.

Managing Drivers in Linux

Drivers in Linux are typically part of the kernel, either built-in or as loadable kernel modules (LKMs). Understanding how to manage these modules is essential for hardware management.

I. lsmod displays currently loaded modules.

lsmod | grep modulename

Example Output:

e1000e                245760  0
intel_cstate           20480  0

e1000e is the module for Intel network cards.
245760 is the memory footprint of the module.
Last column shows the number of instances using the module (0 means it's not in use).

II. modprobe adds modules to the kernel, resolving dependencies.

sudo modprobe modulename

Example Usage:

sudo modprobe e1000e

Loads the e1000e network driver module.
Automatically handles any dependencies required.

III. modprobe -r removes modules from the kernel.

sudo modprobe -r modulename

Example Usage:

sudo modprobe -r e1000e

Unloads the e1000e module.
Will fail if the module is currently in use.

IV. insmod inserts a module into the kernel.

sudo insmod /path/to/module.ko

Note: Does not resolve dependencies; prefer modprobe when possible.

Example Usage:

sudo insmod /lib/modules/$(uname -r)/kernel/drivers/net/e1000e/e1000e.ko

Manually loads a specific module file.
Use when testing custom or third-party modules.

V. rmmod removes a module from the kernel.

sudo rmmod modulename

Example Usage:

sudo rmmod e1000e

Forcefully removes the module.
Should be used with caution as it doesn't handle dependencies.

VI. modinfo displays information about a kernel module.

modinfo modulename

Example Output:

filename:       /lib/modules/5.4.0-42-generic/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version:        3.2.6-k
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>

Filename is the location of the module.
Version and License may be useful for compatibility checks.
Description and Author provides context about the module's purpose.

VII. Kernel Module Configuration

Place configuration files in /etc/modprobe.d/.
To set options for a module:

echo "options modulename option=value" | sudo tee /etc/modprobe.d/modulename.conf

Example Usage:

echo "options e1000e InterruptThrottleRate=3000" | sudo tee /etc/modprobe.d/e1000e.conf

InterruptThrottleRate adjusts how frequently the network card interrupts the CPU.
Settings will apply on boot or when the module is loaded.

Graphics Drivers

I. NVIDIA

Install via package manager or download from NVIDIA's website.
Use ubuntu-drivers devices (Ubuntu) to identify the appropriate driver.

sudo apt install nvidia-driver-470

Example Output (nvidia-smi):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   35C    P8    N/A /  N/A |    200MiB /  4040MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

Driver Version confirms the installed NVIDIA driver.
GPU Details provides information about the GPU's status and usage.
Memory Usage shows how much GPU memory is in use.

II. AMD

Use open-source amdgpu driver or proprietary amdgpu-pro.

sudo apt install xserver-xorg-video-amdgpu

Example Output (lspci | grep VGA):

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1)

Device Identification confirms the AMD GPU model.
Driver Installation ensures that the appropriate driver is installed for the GPU.

Troubleshooting Hardware Issues

Effective troubleshooting involves systematic diagnostics to identify and resolve hardware-related problems.

I. Check Kernel Messages:

Use dmesg to look for hardware-related errors or warnings.

dmesg | grep -i 'error\|fail\|warn'

Example Output:

[   12.345678] usb 1-1: USB disconnect, device number 2
[   12.345789] xhci_hcd 0000:00:14.0: Cannot set link state.
[   12.345800] usb usb1-port1: cannot disable (err = -32)
[   15.678912] ata1.00: failed command: READ FPDMA QUEUED
[   15.678923] ata1.00: status: { DRDY ERR }
[   15.678929] ata1.00: error: { UNC }
[   20.123456] NVRM: GPU at PCI:0000:01:00: GPU has fallen off the bus.
[   20.123467] NVRM: A GPU crash dump has been created.

Each line includes a timestamp and a message from the kernel.
Look for keywords like error, fail, or warn to identify potential issues.
The example shows errors related to USB devices, disk read failures (ata1.00), and GPU problems (NVRM).
These messages help pinpoint hardware components experiencing issues.

II. Review System Logs:

Examine /var/log/syslog, /var/log/messages, or /var/log/kern.log.

sudo less /var/log/syslog

Example Output (excerpt from /var/log/syslog):

Sep 25 10:15:32 hostname kernel: [  15.678912] ata1.00: failed command: READ FPDMA QUEUED
Sep 25 10:15:32 hostname kernel: [  15.678923] ata1.00: status: { DRDY ERR }
Sep 25 10:15:32 hostname kernel: [  15.678929] ata1.00: error: { UNC }
Sep 25 10:15:35 hostname NetworkManager[1234]: <warn>  [1632560135.1234] device (wlan0): link timed out.
Sep 25 10:15:40 hostname kernel: [  20.123456] NVRM: GPU at PCI:0000:01:00: GPU has fallen off the bus.
Sep 25 10:15:40 hostname kernel: [  20.123467] NVRM: A GPU crash dump has been created.

The logs contain timestamped entries from various system components.
Messages with kernel: prefix come from the kernel, similar to dmesg.
Other services (e.g., NetworkManager) may report warnings or errors.
Use these logs to gather more context around hardware issues, correlating timestamps with observed problems.

III. Verify Driver Loading:

Ensure the correct drivers are loaded for devices.

lspci -k | less

Example Output (excerpt):

00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode]
Subsystem: Dell Device 05a4
Kernel driver in use: ahci
Kernel modules: ahci
01:00.0 VGA compatible controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)
Subsystem: Dell Device 3810
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Each device is listed with its driver information.
Kernel driver in use shows the driver currently handling the device.
Kernel modules lists available modules for the device.
Verify that the appropriate driver is in use. For instance, if using an NVIDIA GPU, ensure nvidia is the driver in use, not nouveau (the open-source alternative).
If a device lacks a driver, it may not function correctly.

IV. Check Hardware Status:

Disk Health:

Use smartctl from smartmontools to check S.M.A.R.T. status.

sudo smartctl -H /dev/sda

Example Output:

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

The PASSED result indicates that the disk considers itself healthy.
If the result is FAILED, the disk may be experiencing issues and should be backed up and replaced.
For more detailed information, you can run sudo smartctl -a /dev/sda.

Memory Test:

Use memtest86+ by booting from installation media or via GRUB menu.

Memtest86+ runs a series of tests on your RAM to detect errors.
If errors are found, it may indicate faulty RAM modules, which should be replaced.

CPU Stress Test:

Use stress or stress-ng to test CPU stability.

sudo stress-ng --cpu 4 --timeout 60s

Example Output:

stress-ng: info:  [1234] setting to a timeout of 60 seconds
stress-ng: info:  [1234] dispatching hogs: 4 cpu
stress-ng: info:  [1234] successful run completed in 60.00s

The tool stresses the CPU for the specified duration.
If the system remains stable and no errors occur, the CPU is likely functioning properly.
If the system crashes or reports errors, there may be issues with CPU stability or cooling.

V. Monitor System Resources:

Identify processes consuming excessive resources.

top

Example Output (partial):

top - 10:30:01 up 2 days,  4:15,  2 users,  load average: 1.25, 1.10, 1.05
Tasks: 245 total,   2 running, 243 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.0 us,  5.0 sy,  0.0 ni, 65.0 id,  5.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   2048.0 free,   8192.0 used,   6144.0 buff/cache
MiB Swap:   8192.0 total,   6144.0 free,   2048.0 used.   7168.0 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
5678 user      20   0 2625488 1.2g  9456 S 100.0  7.5  60:00.00 process_name
1234 user      20   0  162476  9568  5824 R   5.0  0.1   0:03.17 bash

process_name is consuming 100% of one CPU core.
High CPU usage may indicate a runaway process or an application under heavy load.
You may decide to investigate or terminate the process if necessary.

Use iotop to monitor disk I/O usage.

sudo iotop

Example Output:

Total DISK READ: 100.00 K/s | Total DISK WRITE: 50.00 K/s
PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
7890 be/4  user      50.00 K/s   25.00 K/s  0.00 %  10.00 %  process_a
5678 be/4  user      50.00 K/s   25.00 K/s  0.00 %   5.00 %  process_b

process_a and process_b are performing significant disk I/O.
High disk usage can slow down the system or indicate an application issue.
Investigate if the disk activity is expected or if it needs attention.

VI. Inspect Physical Connections:

For removable devices, verify all cables and connectors are securely attached.
Reseat components like RAM modules and expansion cards if necessary.

Note:

This step involves physically checking the hardware.
Ensure the system is powered off and unplugged before touching internal components.
Use anti-static precautions to prevent damage to components.

VII. Update System and Drivers:

Keep the system updated to benefit from the latest hardware support.

sudo apt update && sudo apt upgrade

Example Output:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
linux-image-generic linux-headers-generic
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 10.5 MB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n]

Lists packages that will be upgraded, including kernel images or drivers.
Updating may resolve hardware compatibility issues.
Confirm to proceed with the upgrade.

Update the kernel if necessary.

sudo apt install linux-generic

Note:

Installing a newer kernel may provide better hardware support.
Ensure compatibility with your system and applications before upgrading.

VIII. Blacklist Conflicting Modules:

If a module is causing issues, blacklist it to prevent loading.

echo "blacklist faulty_module" | sudo tee /etc/modprobe.d/blacklist-faulty_module.conf

Example Usage:

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf

This command creates a configuration file that tells the system not to load the nouveau module.
Used when nouveau (open-source NVIDIA driver) conflicts with the proprietary nvidia driver.
After blacklisting, you may need to update the initramfs and reboot:

sudo update-initramfs -u
sudo reboot

IX. Use Live Environment:

Boot from a live USB to determine if the issue is hardware or software-related.

How to Proceed:

Create a bootable USB with a Linux distribution.
Boot the system using the USB drive.
Check if the hardware functions correctly in the live environment.
If issues persist, the problem is likely hardware-related.

X. Consult Documentation and Support Resources:

Refer to official hardware manuals, Linux documentation, and community forums.
Search for known issues with specific hardware models.

XI. Reboot the System:

Rebooting can resolve temporary glitches or hardware initialization issues.

sudo reboot

Safety Precautions:

Always back up important data before performing hardware troubleshooting.
Use anti-static precautions when handling internal components.
Disconnect power before opening the system chassis.

Challenges

Use lspci, lsusb, and lsblk commands to identify all hardware connected to your system. For each device listed, try to determine what it is and what role it serves in the system. Discuss how these commands help identify different hardware components and what information they reveal.
Monitor system resources using top or htop and identify the processes consuming the most CPU and memory. Analyze the impact of these resource-hungry processes on system performance and discuss how these tools provide insights into system health and load.
Run the lscpu command to gather detailed information about your CPU. Identify the number of cores, clock speed, and architecture type, then discuss the role of these characteristics in determining overall system performance.
Use the lspci -k or lsusb -v commands to examine hardware devices and identify any components that may be missing drivers. Explain why drivers are essential for hardware functionality and how missing drivers can impact device performance or usability.
Choose a device from the /dev directory, such as /dev/null or /dev/random, and use the file command to identify whether it is a character or block device. Then, safely read from or write to the device and discuss the purpose of device files and how they facilitate interaction between the operating system and hardware.
Use the df command to view disk usage statistics, focusing on the filesystem mounted on the root (/). Identify its total size, used space, and available space, and discuss how disk usage monitoring can help with system maintenance and preventing storage-related issues.
If available, use diagnostic tools such as smartctl (from smartmontools) to check the health status of your hard drives or memtest to perform a memory test on your RAM. Explain how these tools help detect hardware issues and the types of problems they can identify.
Simulate a hardware change, such as unplugging a USB device, and use dmesg to view system messages generated by this action. Analyze the output to identify the sequence of events logged, and discuss how dmesg can be used to troubleshoot hardware-related issues in real time.
Explore the temperature and fan speed sensors on your system by installing and using lm-sensors. Run sensors to display current temperatures and fan speeds, then discuss how monitoring hardware temperature is crucial for preventing overheating and ensuring long-term component reliability.
Investigate the memory usage on your system using the free -h command. Examine how much memory is used, free, and available, and explain the significance of these metrics. Discuss the differences between RAM and swap space and how the system manages them.

Hardware

Hardware Compatibility

Hardware Architecture Support

Accessing Hardware

Managing Hardware

Gathering Hardware Information

Internal Hardware Information

Plugged Devices Information

Monitoring Hardware

Configuring Hardware

Managing Drivers in Linux

Graphics Drivers

Troubleshooting Hardware Issues

Challenges

Table of Contents

Related Articles