Last modified: October 10, 2024
This article is written in: πΊπΈ
Files and Filesystems
In Unix, files and filesystems are fundamental components of the operating system's structure. A file is a collection of data stored on disk, which can include anything from text documents and images to executable programs. Files are organized within directories in a hierarchical structure, allowing for efficient data management and retrieval.
A filesystem, on the other hand, is a method and data structure that the operating system uses to manage files on a disk or partition. It provides a way to store, retrieve, and organize files, supporting features like file permissions, links, and metadata. Common Unix filesystems include ext4, XFS, and Btrfs, each offering different capabilities and optimizations. Understanding these concepts is essential for managing data and system resources effectively in a Unix environment.
Types of Files in a UNIX Filesystem
Unix and Unix-like systems, including Linux, organize files in a hierarchical structure called a filesystem. Files can be classified based on their purpose, storage method, and visibility.
Classification Based on Purpose
- Ordinary files are the most common type, containing text, data, or program code. They cannot contain other files or directories.
- Directory files function as folders to organize other files, with the root directory (
/
) being the top-level directory of the entire filesystem. Users' files are usually stored in their respective home directories, such as/home/adam/
. - Device files represent hardware devices as if they were files. Block-oriented devices, like hard drives, transfer data in large blocks, whereas character-oriented devices, such as keyboards or modems, handle data one byte at a time.
- Link files serve as references to other files. Hard links are essentially duplicates of the original file and behave identically, while soft links (or symbolic links) act as indirect pointers to a file or directory, similar to shortcuts in Windows.
The file
command in Linux is a powerful tool that helps identify and classify these file types by analyzing their content and structure. Below are insights on how the file
command interprets and reports on these various classifications:
Classification | file Command Example | file Command Output Example | Explanation |
Text Files | file document.txt |
document.txt: ASCII text |
The file command detects text encoding (e.g., ASCII, UTF-8) or identifies binary format for executables. |
Binary Files | file program |
program: ELF 64-bit LSB executable, x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2 |
Provides detailed information about binary files, including architecture and format. |
Directory Files | file /home/user |
/home/user: directory |
Identified as a "directory". Contains pointers to other files and directories. |
Block Devices | file /dev/sda |
/dev/sda: block special |
The file command distinguishes between block devices (e.g., "block special") and character devices (e.g., "character special"). |
Character Devices | file /dev/tty |
/dev/tty: character special |
Useful for identifying the type of device a file represents. |
Symbolic Links | file /usr/bin/python |
/usr/bin/python: symbolic link to /usr/bin/python3.8 |
Symbolic links are reported with their target file or directory. |
Hard Links | file hardlinkfile |
(Output identical to the original file) | Hard links are identical to the original file in the file command output. |
Classification Based on Storage
- Regular files contain text, data, or program code and are stored directly in the file system.
- Virtual files provide an interface to other programs or the kernel. They do not contain traditional data but rather information about processes and system parameters, typically found in directories like
/proc
and/sys
. - Remote files are stored on a remote Network File System (NFS) server. They can be accessed and manipulated as if they were stored locally.
The table below provides the most effective commands for identifying each type of file based on their specific attributes and locations:
Classification | Command Example | Command Output Example | Explanation |
Regular Files | file document.txt |
document.txt: ASCII text |
The file command detects the type of content in regular files, such as text, binary, or executable. |
Virtual Files | stat /proc/cpuinfo |
File: /proc/cpuinfo\nSize: 0\tBlocks: 0\tIO Block: 4096 regular file\nDevice: 0,5\tInode: 4026532255 |
The stat command shows a typical file size of 0, located in /proc or /sys , indicating it's a virtual file. |
Remote Files | df -T /mnt/nfs/remote_file |
Filesystem Type 1K-blocks Used Available Use% Mounted on\nnfsserver:/export nfs 1024000 102400 924000 10% /mnt/nfs |
The df -T command displays the filesystem type as nfs , identifying it as a remote file on an NFS server. |
Classification Based on Visibility
- Visible files are displayed when you list the contents of a directory using commands like
ls
. - Hidden files are not displayed in a standard directory listing. They start with a period (
.
) and typically store configuration data or system files. They can be revealed using thels -a
command.
Note About File Names
Filenames are case-sensitive. This means the operating system treats "Test," "TEST," and "test" as different files. Also, most file types in Linux are determined by file content and not by the file extension, unlike systems like Windows.
Special Directory Names
In a filesystem, certain directory names have special meanings that simplify navigation and file management:
- The
./
notation refers to the current directory, meaning the directory where you are presently located. It is commonly used when executing a script or a program located in the current directory. For instance, if you have a script namedscript.sh
in the current directory, you can run it using./script.sh
. This tells the system to look forscript.sh
in the current directory. - The
../
notation refers to the parent directory, which is the directory one level up from the current directory in the filesystem hierarchy. It is useful for navigating upwards in the directory structure. For example, if you are in/home/user/Documents
and you usecd ../
, you will move up to/home/user
. - The
~/
notation is a shorthand for the current user's home directory. The home directory is a personal space allocated to a user where personal files and settings are stored. For example,~/Documents
would refer to theDocuments
directory within the home directory of the current user. This is especially useful for referencing files and directories in a user's home space without needing to specify the full path.
These notations provide a convenient way to navigate and manage files and directories efficiently in a command-line environment. They are particularly useful in scripting and automation tasks, where paths need to be specified dynamically or relative to the current context.
Directory Structure
Linux organizes everything within a single directory hierarchy that starts with the root directory (/
).
/
βββ bin
βββ boot
βββ dev
βββ etc
β βββ network
β βββ ssh
βββ home
β βββ [user]
β βββ Documents
β βββ Downloads
β βββ Music
β βββ Pictures
β βββ Videos
β βββ Desktop
βββ lib
βββ media
βββ mnt
βββ opt
βββ proc
βββ root
βββ run
βββ sbin
βββ srv
βββ sys
βββ tmp
βββ usr
β βββ bin
β βββ include
β βββ lib
β βββ local
β βββ share
βββ var
βββ cache
βββ lib
βββ local
βββ lock
βββ log
βββ tmp
Key directories within the Linux file system include:
Directory | Description |
/ |
The root directory: the starting point of the file system hierarchy, all other directories branch off from here. |
/bin |
Essential low-level system utilities, executable by all users. |
/usr/bin |
A place for user commands and applications typically used after the system boot process. |
/sbin |
Contains system binaries crucial for booting, restoring, recovering, and/or repairing the system, supplementing the binaries in /bin . |
/lib |
Contains shared libraries essential for the binaries found in /bin and /sbin . |
/usr/lib |
Libraries necessary for /usr/bin binaries and applications. |
/tmp |
A directory for storing temporary files, which are usually cleared at each reboot. |
/home |
The home ground for personal directories of each user, where they can store their personal files. |
/etc |
A repository for system-wide configuration files and scripts utilized during the boot process. |
/dev |
A space hosting device nodes that correspond to hardware devices connected to the system. |
/var |
Manages variable data such as system logs, mail and printer spool directories, alongside transient and temporary files. |
/root |
Serves as the home directory for the root (superuser) account, distinct from the root directory (/ ). |
/boot |
Stores files vital for the boot process, including the Linux kernel and the boot loader. |
/media and /mnt |
Serve as mount points for file systems and removable devices like CDs, USB drives, etc. |
File System Types
A file system is a method of organizing, storing, and retrieving data on a storage device, like a hard drive, SSD, or USB drive. It manages the available space on the device, keeping track of which sectors belong to which files and directories.
Several types of file systems can be used on Linux systems, each designed with specific use-cases and features:
- The ext2 (Second Extended Filesystem) is among the first file systems crafted specifically for Linux, and while it is efficient, it lacks advanced features like journaling or encryption.
- An improved version of ext2, ext3 (Third Extended Filesystem), includes support for journaling, which helps protect against data loss by maintaining a log of changes that are yet to be committed to the file system.
- Currently the default Linux file system, ext4 (Fourth Extended Filesystem) supports larger file sizes and file systems, provides improved performance and reliability, and includes features such as delayed allocation and journal checksumming.
- Originally developed by IBM, the JFS (Journaled File System) is optimized to handle large file systems efficiently and includes journaling capabilities.
- The NFS (Network File System) is not a traditional file system but rather a protocol that allows a system to access files over a network as though they were on its local hard drive.
- Serving as a software layer in the kernel, the VFS (Virtual File System) provides a common interface to various file systems, enabling the operating system to uniformly access and manage different types of file systems.
- The FAT (File Allocation Table) system is simple and widely used, particularly on removable storage devices, making it a suitable choice for interoperability across various operating systems.
- A standard for Windows NT and its later versions, NTFS (New Technology File System) can be accessed on Linux but is not native, which may result in a lack of full functionality.
- Known for its strong performance and reliability, ReiserFS (Reiser File System) is a general-purpose, journaled file system that efficiently handles large numbers of small files, making it a popular choice on servers.
- The Btrfs (B-tree File System) is a copy-on-write (CoW) file system for Linux, designed to provide advanced features focused on fault tolerance, repair, and simplified administration, with capabilities like snapshots, subvolumes, and built-in RAID.
- Developed by Silicon Graphics, Inc., XFS is a high-performance, journaling file system that excels at parallel I/O, making it particularly effective for applications involving large files that require high-performance I/O operations.
Category | ext2 | ext3 | ext4 | JFS | NFS | VFS | FAT | NTFS | ReiserFS | Btrfs | XFS |
Design Purpose | Simple Linux file system | Improved ext2 with journaling | Default Linux file system | Large file systems | Network file access | Common interface for FS | Removable storage | Windows file system | Small files, servers | Advanced Linux features | High-performance, large files |
Journaling | No | Yes | Yes | Yes | N/A (protocol) | N/A | No | Yes | Yes | Yes | Yes |
Performance | Efficient for small FS | Better than ext2 | Better than ext3 | High efficiency | Depends on network | N/A | Simple, lower performance | Good, Windows optimized | Good for small files | Good, advanced features | Excellent, parallel I/O |
Maximum File/FS Size | Smaller than ext3/4 | Larger than ext2 | Very large | Very large | N/A | N/A | Limited by design | Very large | Large | Very large | Very large |
Suitability | Basic, older Linux systems | General Linux use | Modern Linux systems | Enterprise, large data | Network environments | Kernel-level operations | Wide compatibility | Windows environments | Server use | Linux, advanced use | Large file handling |
Encryption Support | No | No | Yes (since 4.1) | No | N/A | N/A | No | Yes | No | Yes | No |
Data Recovery | Harder | Easier than ext2 | Easier than ext3 | Good | Depends on implementation | N/A | Simpler but riskier | Good | Good | Very good | Good |
Use in Large Servers | Less common | Common | Very common | Yes | Yes, for shared storage | N/A | Less common | Less common | Yes | Yes | Yes |
Use in Personal Devices | Less common | Less common | Common | Less common | Less common | N/A | Very common | Common in Windows | Less common | Growing | Less common |
Managing File Systems
Managing file systems is a fundamental skill that involves various operations such as checking existing file systems, installing necessary tools, creating new file systems on fresh partitions or drives, and modifying existing ones. This comprehensive guide provides detailed steps for each of these tasks, complete with commands, expected outputs, and practical considerations.
Checking Existing File Systems
Before performing any file system operations, it's essential to understand the current state of your system.
1. List Mounted File Systems
Use the df -T
command to display all mounted file systems along with their types.
df -T
Expected Output:
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/sda1 ext4 492G 215G 253G 46% /
udev devtmpfs 16G 0 16G 0% /dev
tmpfs tmpfs 3.2G 1.3M 3.2G 1% /run
2. List Block Devices with File System Information
Use lsblk
with the -f
option to display file system types for all block devices.
lsblk -f
Expected Output:
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
ββsda1 ext4 rootfs a1b2c3d4-e5f6-7890-abcd-ef1234567890 /
ββsda2 ext4 home 12345678-90ab-cdef-1234-567890abcdef /home
ββsda3 swap 1a2b3c4d-5e6f-7890-abcd-ef1234567890 [SWAP]
3. Check Supported File Systems
List all file systems currently supported by the kernel:
cat /proc/filesystems
Expected Output:
nodev sysfs
nodev tmpfs
nodev bdev
ext3
ext4
vfat
xfs
Ensuring File System Support
Before creating a new file system, ensure that your system has the necessary support for it. This involves checking for kernel module support, installing the required utilities, and verifying that your system is compatible with the file system you intend to use.
1. Check Kernel Support
To confirm that your kernel supports the desired file system, check if the module is loaded:
lsmod | grep xfs
If the output includes lines referencing xfs
, it indicates that the XFS module is loaded. If no output appears, the module isnβt loaded and may need to be installed or enabled.
2. Install File System Utilities
Once kernel support is confirmed, install any necessary tools. For example, to manage the XFS file system, you need the xfsprogs
package:
sudo apt update
sudo apt install xfsprogs
Expected Output:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
xfsprogs
...
Setting up xfsprogs (5.4.0-1ubuntu2) ...
This output indicates that xfsprogs
has been successfully installed, and you can now proceed with XFS-specific file system tasks.
3. Verify Compatibility
Ensure that your systemβs kernel and hardware support the file system by checking the module information:
modinfo xfs
Expected Output:
This command provides details on the XFS module, including its dependencies, supported versions, and any required firmware. Reviewing this information confirms compatibility with your current setup.
Identifying the Device
Accurate identification of the target device is crucial to avoid modifying the wrong disk, which could result in data loss. These commands will help you list and inspect block devices.
1. List All Block Devices
To get an overview of all attached storage devices, use:
lsblk
Expected Output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 100G 0 disk
ββsda1 8:1 0 50G 0 part /
sdb 8:16 0 200G 0 disk
ββsdb1 8:17 0 50G 0 part /mnt/data
The lsblk
output lists all block devices and their partitions, along with their sizes and mount points. This lets you identify the device you intend to work with (e.g., /dev/sdb
).
2. Detailed Device Information
To gather further details about each device and its partitions, use fdisk
:
sudo fdisk -l
Expected Output:
Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
...
This output provides detailed information on each disk, including size, sector count, and partitioning scheme. Use this to confirm the correct device before making any modifications.
3. Identify Unpartitioned Space
If your target device is new and unpartitioned, youβll need to partition it first. Use fdisk
, gdisk
, or parted
to create new partitions.
Unmounting the Device (if applicable)
Before modifying a device, itβs essential to ensure that it is not in use. Unmounting prevents accidental data corruption during the process.
1. Check if the Device is Mounted
Determine if the target device is currently mounted:
mount | grep sdb1
Expected Output:
/dev/sdb1 on /mnt/data type xfs (rw)
If the output lists the device, it means itβs mounted. Note the mount point (e.g., /mnt/data
) so you can unmount it in the next step.
2. Unmount the Device
Unmount the device before making any changes:
sudo umount /dev/sdb1
Expected Output:
No output indicates the device was unmounted successfully.
3. Handle Busy Devices
If you encounter a βdevice is busyβ error, identify processes using the device:
sudo lsof /dev/sdb1
Expected Output:
The output lists any processes using the device. For example:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 1234 user cwd DIR 8,17 4096 2 /mnt/data
Terminate these processes or stop the services to release the device, allowing it to be unmounted.
Creating a File System
Once the target device is unmounted, you can proceed with creating the new file system. The file system type you choose will depend on factors like performance, reliability, and feature support.
1. Choose the File System Type
Select a file system type that meets your needs. For example, ext4
is commonly used for general-purpose storage due to its balance of performance and features. Alternatively, you may choose xfs
for large filesystems or high-performance needs.
2. Create the File System
To format the device with a specific file system type, use the appropriate mkfs
command. For ext4
, for instance:
sudo mkfs.ext4 /dev/sdb1
Expected Output:
mke2fs 1.45.5 (07-Jan-2020)
/dev/sdb1 contains a ext4 file system
Proceed anyway? (y,N) y
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 123e4567-e89b-12d3-a456-426614174000
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, ...
Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done
This output shows the mkfs
process. It provides details on block allocation, journal creation, and the UUID for the new file system. If asked to proceed, type y
to continue. Successful completion indicates the file system is now ready for use.
3. Label the File System (Optional)
Assigning a label to your file system makes it easier to identify, especially when managing multiple disks. To label an ext4
file system:
sudo e2label /dev/sdb1 mydata
No output means the label was applied successfully. You can verify the label by running sudo e2label /dev/sdb1
.
Changing Existing File Systems
Modifying existing file systems requires extra caution. Backup any essential data before proceeding to avoid data loss. Here are common operations:
1. Resizing a File System
To resize a file system, unmount it first:
sudo umount /dev/sdb1
Then, adjust its size. For example, to shrink an ext4
file system to 50 GB:
sudo resize2fs /dev/sdb1 50G
Expected Output:
resize2fs 1.45.5 (07-Jan-2020)
Resizing the filesystem on /dev/sdb1 to 13107200 (4k) blocks.
The filesystem on /dev/sdb1 is now 13107200 (4k) blocks long.
This output confirms the new size of the file system. Note that shrinking can cause data loss if the specified size is smaller than the amount of data stored on the partition.
2. Converting File Systems
Some file systems support in-place conversions. For example, converting ext2
to ext3
to enable journaling can be done as follows:
sudo tune2fs -O has_journal /dev/sdb1
Expected Output:
tune2fs 1.45.5 (07-Jan-2020)
Setting filesystem feature 'has_journal'
Creating journal inode: done
This filesystem will be automatically checked every 29 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
This output shows that journaling has been enabled on the file system. This feature enhances data integrity but may slightly reduce performance.
3. Backup Before Changes
Backing up important data is always recommended before altering file systems. Tools like rsync
, tar
, or dd
can be used for this purpose. For example:
sudo rsync -av /mnt/mydata/ /mnt/backup/
Mounting the New File System
To make the new file system accessible to your system, youβll need to mount it. Mounting allows you to interact with the file system and its contents.
1. Create a Mount Point
Choose a directory to serve as the mount point. If the directory does not exist, create it with:
sudo mkdir -p /mnt/mydata
Expected Output:
No output indicates successful directory creation.
2. Mount the File System
Once the mount point exists, mount the file system:
sudo mount /dev/sdb1 /mnt/mydata
Expected Output:
Thereβs no output if the mount is successful. You can verify by listing mounted file systems or checking the mount point with df
.
3. Verify the Mount
To confirm that the file system is mounted correctly, use:
df -h /mnt/mydata
Expected Output:
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 99G 60M 94G 1% /mnt/mydata
The output shows disk usage information for /dev/sdb1
, confirming it is mounted on /mnt/mydata
. Youβre now ready to use the new file system for storage or data operations.
Additional Considerations
When working with file systems, automating mounting, ensuring file system health, and managing permissions are essential practices. Hereβs how to handle these additional considerations.
1. Automate Mounting at Boot
To ensure your new file system is automatically mounted at system startup, add it to the /etc/fstab
file.
First, edit the file:
sudo nano /etc/fstab
Add the following line to the end of the file (replace the UUID and mount point as needed):
UUID=123e4567-e89b-12d3-a456-426614174000 /mnt/mydata ext4 defaults 0 2
Entry Components:
- The UUID is a unique identifier for the partition, which can be obtained using the
blkid
command, ensuring precise identification of the partition across reboots. - /mnt/mydata is the mount point, specifying the directory where the partition will be accessible once mounted.
- The ext4 refers to the file system type, indicating how data is organized on the partition, with
ext4
being a common choice for Linux systems. - defaults are the mount options, which typically include settings like read-write access and automatic mounting during startup.
- A value of 0 means the partition will skip the dump backup process, as backups are not required for this entry.
- The 2 indicates the order in which
fsck
performs checks on the partition, with2
meaning it will be checked after the root partition, providing structure for file system checks.
Find the UUID for the device by running:
sudo blkid /dev/sdb1
Expected Output:
/dev/sdb1: UUID="123e4567-e89b-12d3-a456-426614174000" TYPE="ext4"
This output provides the UUID for /dev/sdb1
. Use this identifier in the /etc/fstab
entry to avoid issues if device names change on reboot. After adding this to /etc/fstab
, the system will automatically mount the file system at /mnt/mydata
on startup.
2. File System Maintenance
Keeping your file system healthy and monitored is crucial for data integrity and performance.
Check for Errors:
Use fsck
to check the file system for errors and repair any issues.
sudo fsck /dev/sdb1
Expected Output:
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/sdb1: clean, 10/6553600 files, 262144/26214400 blocks
This output confirms that the file system check is complete. If fsck
finds no errors, it will indicate that the file system is clean. If it detects issues, fsck
will attempt to repair them based on the options you specify.
Monitor Disk Usage:
Regularly monitor available space and usage with:
df -h
Expected Output:
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 99G 60M 94G 1% /mnt/mydata
This output shows disk usage for all mounted file systems, including the newly created one. Use this information to ensure the file system has enough space for your needs.
3. Permissions and Ownership
To control who can access your mounted file system, set ownership and permissions.
Change ownership to a specific user and group:
sudo chown user:group /mnt/mydata
Expected Output:
No output means the command succeeded.
Set permissions:
sudo chmod 755 /mnt/mydata
Expected Output:
Again, no output indicates success.
These commands set the ownership of /mnt/mydata
to a specified user
and group
. The chmod
command assigns permissions, where 755
means the owner has read, write, and execute permissions, while others have read and execute only.
4. Security Considerations
When storing sensitive data, consider additional security measures, such as encryption and access control.
Encryption:
Use LUKS (Linux Unified Key Setup) to encrypt the partition:
sudo cryptsetup luksFormat /dev/sdb1
Expected Output:
WARNING!
========
This will overwrite data on /dev/sdb1 irrevocably.
Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/sdb1:
Verify passphrase:
LUKS prompts for confirmation before encrypting the partition. Follow the prompts to set a passphrase, which will be required for future access.
Access Control (ACL):
To set fine-grained permissions, enable and configure Access Control Lists (ACLs). For example, to grant a user read access:
sudo setfacl -m u:username:r /mnt/mydata
Expected Output:
No output, indicating success.
This command grants username
read-only access to /mnt/mydata
. Use getfacl
to verify ACLs or to modify permissions for other users as needed.
Challenges
- Can you explain what the root directory is in Linux? How is it different from the root user's home directory?
- In the context of the
echo
command in Linux, what is the relationship between/bin/echo
and typingecho
at the shell prompt? - Using
/dev/random
or/dev/urandom
, how would you create a file filled with 100 lines of random characters? Can you write a shell command to do this? - What is the purpose of the
/bin
and/sbin
directories in a Linux system? Can you list some of the files you would typically find in these directories and briefly describe what they do? - In a Linux system, what is the purpose of
/usr/bin
? What kind of files are typically stored in this directory and why? - Can you explain the difference between character and block device drivers in UNIX? Can the
ls
command be used to determine whether a device file represents a character device or a block device? If so, how? - By looking at the contents of
/proc/cpuinfo
, can you determine the model of your CPU? What command would you use to display this file's contents? - What hidden files might you expect to find in the
/root
directory? Why might these files be hidden? - What command can you use to create a new file system in Linux? What options does this command typically have, and how are they used?
- How can you check disk information and partitions on a device in Linux? Please write down the command you would use and briefly explain its output.
- What steps and commands would you use to mount a file system in Linux? Please provide an example command and explain what it does.