Last modified: October 10, 2024

This article is written in: πŸ‡ΊπŸ‡Έ

Files and Filesystems

In Unix, files and filesystems are fundamental components of the operating system's structure. A file is a collection of data stored on disk, which can include anything from text documents and images to executable programs. Files are organized within directories in a hierarchical structure, allowing for efficient data management and retrieval.

A filesystem, on the other hand, is a method and data structure that the operating system uses to manage files on a disk or partition. It provides a way to store, retrieve, and organize files, supporting features like file permissions, links, and metadata. Common Unix filesystems include ext4, XFS, and Btrfs, each offering different capabilities and optimizations. Understanding these concepts is essential for managing data and system resources effectively in a Unix environment.

Types of Files in a UNIX Filesystem

Unix and Unix-like systems, including Linux, organize files in a hierarchical structure called a filesystem. Files can be classified based on their purpose, storage method, and visibility.

Classification Based on Purpose

  1. Ordinary files are the most common type, containing text, data, or program code. They cannot contain other files or directories.
  2. Directory files function as folders to organize other files, with the root directory (/) being the top-level directory of the entire filesystem. Users' files are usually stored in their respective home directories, such as /home/adam/.
  3. Device files represent hardware devices as if they were files. Block-oriented devices, like hard drives, transfer data in large blocks, whereas character-oriented devices, such as keyboards or modems, handle data one byte at a time.
  4. Link files serve as references to other files. Hard links are essentially duplicates of the original file and behave identically, while soft links (or symbolic links) act as indirect pointers to a file or directory, similar to shortcuts in Windows.

The file command in Linux is a powerful tool that helps identify and classify these file types by analyzing their content and structure. Below are insights on how the file command interprets and reports on these various classifications:

Classification file Command Example file Command Output Example Explanation
Text Files file document.txt document.txt: ASCII text The file command detects text encoding (e.g., ASCII, UTF-8) or identifies binary format for executables.
Binary Files file program program: ELF 64-bit LSB executable, x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2 Provides detailed information about binary files, including architecture and format.
Directory Files file /home/user /home/user: directory Identified as a "directory". Contains pointers to other files and directories.
Block Devices file /dev/sda /dev/sda: block special The file command distinguishes between block devices (e.g., "block special") and character devices (e.g., "character special").
Character Devices file /dev/tty /dev/tty: character special Useful for identifying the type of device a file represents.
Symbolic Links file /usr/bin/python /usr/bin/python: symbolic link to /usr/bin/python3.8 Symbolic links are reported with their target file or directory.
Hard Links file hardlinkfile (Output identical to the original file) Hard links are identical to the original file in the file command output.

Classification Based on Storage

  1. Regular files contain text, data, or program code and are stored directly in the file system.
  2. Virtual files provide an interface to other programs or the kernel. They do not contain traditional data but rather information about processes and system parameters, typically found in directories like /proc and /sys.
  3. Remote files are stored on a remote Network File System (NFS) server. They can be accessed and manipulated as if they were stored locally.

The table below provides the most effective commands for identifying each type of file based on their specific attributes and locations:

Classification Command Example Command Output Example Explanation
Regular Files file document.txt document.txt: ASCII text The file command detects the type of content in regular files, such as text, binary, or executable.
Virtual Files stat /proc/cpuinfo File: /proc/cpuinfo\nSize: 0\tBlocks: 0\tIO Block: 4096 regular file\nDevice: 0,5\tInode: 4026532255 The stat command shows a typical file size of 0, located in /proc or /sys, indicating it's a virtual file.
Remote Files df -T /mnt/nfs/remote_file Filesystem Type 1K-blocks Used Available Use% Mounted on\nnfsserver:/export nfs 1024000 102400 924000 10% /mnt/nfs The df -T command displays the filesystem type as nfs, identifying it as a remote file on an NFS server.

Classification Based on Visibility

  1. Visible files are displayed when you list the contents of a directory using commands like ls.
  2. Hidden files are not displayed in a standard directory listing. They start with a period (.) and typically store configuration data or system files. They can be revealed using the ls -a command.

Note About File Names

Filenames are case-sensitive. This means the operating system treats "Test," "TEST," and "test" as different files. Also, most file types in Linux are determined by file content and not by the file extension, unlike systems like Windows.

Special Directory Names

In a filesystem, certain directory names have special meanings that simplify navigation and file management:

  1. The ./ notation refers to the current directory, meaning the directory where you are presently located. It is commonly used when executing a script or a program located in the current directory. For instance, if you have a script named script.sh in the current directory, you can run it using ./script.sh. This tells the system to look for script.sh in the current directory.
  2. The ../ notation refers to the parent directory, which is the directory one level up from the current directory in the filesystem hierarchy. It is useful for navigating upwards in the directory structure. For example, if you are in /home/user/Documents and you use cd ../, you will move up to /home/user.
  3. The ~/ notation is a shorthand for the current user's home directory. The home directory is a personal space allocated to a user where personal files and settings are stored. For example, ~/Documents would refer to the Documents directory within the home directory of the current user. This is especially useful for referencing files and directories in a user's home space without needing to specify the full path.

These notations provide a convenient way to navigate and manage files and directories efficiently in a command-line environment. They are particularly useful in scripting and automation tasks, where paths need to be specified dynamically or relative to the current context.

Directory Structure

Linux organizes everything within a single directory hierarchy that starts with the root directory (/).

/
β”œβ”€β”€ bin
β”œβ”€β”€ boot
β”œβ”€β”€ dev
β”œβ”€β”€ etc
β”‚   β”œβ”€β”€ network
β”‚   └── ssh
β”œβ”€β”€ home
β”‚   └── [user]
β”‚       β”œβ”€β”€ Documents
β”‚       β”œβ”€β”€ Downloads
β”‚       β”œβ”€β”€ Music
β”‚       β”œβ”€β”€ Pictures
β”‚       β”œβ”€β”€ Videos
β”‚       └── Desktop
β”œβ”€β”€ lib
β”œβ”€β”€ media
β”œβ”€β”€ mnt
β”œβ”€β”€ opt
β”œβ”€β”€ proc
β”œβ”€β”€ root
β”œβ”€β”€ run
β”œβ”€β”€ sbin
β”œβ”€β”€ srv
β”œβ”€β”€ sys
β”œβ”€β”€ tmp
β”œβ”€β”€ usr
β”‚   β”œβ”€β”€ bin
β”‚   β”œβ”€β”€ include
β”‚   β”œβ”€β”€ lib
β”‚   β”œβ”€β”€ local
β”‚   └── share
└── var
    β”œβ”€β”€ cache
    β”œβ”€β”€ lib
    β”œβ”€β”€ local
    β”œβ”€β”€ lock
    β”œβ”€β”€ log
    └── tmp

Key directories within the Linux file system include:

Directory Description
/ The root directory: the starting point of the file system hierarchy, all other directories branch off from here.
/bin Essential low-level system utilities, executable by all users.
/usr/bin A place for user commands and applications typically used after the system boot process.
/sbin Contains system binaries crucial for booting, restoring, recovering, and/or repairing the system, supplementing the binaries in /bin.
/lib Contains shared libraries essential for the binaries found in /bin and /sbin.
/usr/lib Libraries necessary for /usr/bin binaries and applications.
/tmp A directory for storing temporary files, which are usually cleared at each reboot.
/home The home ground for personal directories of each user, where they can store their personal files.
/etc A repository for system-wide configuration files and scripts utilized during the boot process.
/dev A space hosting device nodes that correspond to hardware devices connected to the system.
/var Manages variable data such as system logs, mail and printer spool directories, alongside transient and temporary files.
/root Serves as the home directory for the root (superuser) account, distinct from the root directory (/).
/boot Stores files vital for the boot process, including the Linux kernel and the boot loader.
/media and /mnt Serve as mount points for file systems and removable devices like CDs, USB drives, etc.

File System Types

A file system is a method of organizing, storing, and retrieving data on a storage device, like a hard drive, SSD, or USB drive. It manages the available space on the device, keeping track of which sectors belong to which files and directories.

Several types of file systems can be used on Linux systems, each designed with specific use-cases and features:

Category ext2 ext3 ext4 JFS NFS VFS FAT NTFS ReiserFS Btrfs XFS
Design Purpose Simple Linux file system Improved ext2 with journaling Default Linux file system Large file systems Network file access Common interface for FS Removable storage Windows file system Small files, servers Advanced Linux features High-performance, large files
Journaling No Yes Yes Yes N/A (protocol) N/A No Yes Yes Yes Yes
Performance Efficient for small FS Better than ext2 Better than ext3 High efficiency Depends on network N/A Simple, lower performance Good, Windows optimized Good for small files Good, advanced features Excellent, parallel I/O
Maximum File/FS Size Smaller than ext3/4 Larger than ext2 Very large Very large N/A N/A Limited by design Very large Large Very large Very large
Suitability Basic, older Linux systems General Linux use Modern Linux systems Enterprise, large data Network environments Kernel-level operations Wide compatibility Windows environments Server use Linux, advanced use Large file handling
Encryption Support No No Yes (since 4.1) No N/A N/A No Yes No Yes No
Data Recovery Harder Easier than ext2 Easier than ext3 Good Depends on implementation N/A Simpler but riskier Good Good Very good Good
Use in Large Servers Less common Common Very common Yes Yes, for shared storage N/A Less common Less common Yes Yes Yes
Use in Personal Devices Less common Less common Common Less common Less common N/A Very common Common in Windows Less common Growing Less common

Managing File Systems

Managing file systems is a fundamental skill that involves various operations such as checking existing file systems, installing necessary tools, creating new file systems on fresh partitions or drives, and modifying existing ones. This comprehensive guide provides detailed steps for each of these tasks, complete with commands, expected outputs, and practical considerations.

Checking Existing File Systems

Before performing any file system operations, it's essential to understand the current state of your system.

1. List Mounted File Systems

Use the df -T command to display all mounted file systems along with their types.

df -T

Expected Output:

Filesystem     Type     1K-blocks     Used Available Use% Mounted on
/dev/sda1      ext4      492G  215G  253G  46% /
udev           devtmpfs   16G     0   16G   0% /dev
tmpfs          tmpfs     3.2G  1.3M  3.2G   1% /run

2. List Block Devices with File System Information

Use lsblk with the -f option to display file system types for all block devices.

lsblk -f

Expected Output:

NAME   FSTYPE LABEL    UUID                                 MOUNTPOINT
sda                                                      
β”œβ”€sda1 ext4   rootfs   a1b2c3d4-e5f6-7890-abcd-ef1234567890 /
β”œβ”€sda2 ext4   home     12345678-90ab-cdef-1234-567890abcdef /home
└─sda3 swap            1a2b3c4d-5e6f-7890-abcd-ef1234567890 [SWAP]

3. Check Supported File Systems

List all file systems currently supported by the kernel:

cat /proc/filesystems

Expected Output:

nodev   sysfs
nodev   tmpfs
nodev   bdev
        ext3
        ext4
        vfat
        xfs

Ensuring File System Support

Before creating a new file system, ensure that your system has the necessary support for it. This involves checking for kernel module support, installing the required utilities, and verifying that your system is compatible with the file system you intend to use.

1. Check Kernel Support

To confirm that your kernel supports the desired file system, check if the module is loaded:

lsmod | grep xfs

If the output includes lines referencing xfs, it indicates that the XFS module is loaded. If no output appears, the module isn’t loaded and may need to be installed or enabled.

2. Install File System Utilities

Once kernel support is confirmed, install any necessary tools. For example, to manage the XFS file system, you need the xfsprogs package:

sudo apt update
sudo apt install xfsprogs

Expected Output:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  xfsprogs
...
Setting up xfsprogs (5.4.0-1ubuntu2) ...

This output indicates that xfsprogs has been successfully installed, and you can now proceed with XFS-specific file system tasks.

3. Verify Compatibility

Ensure that your system’s kernel and hardware support the file system by checking the module information:

modinfo xfs

Expected Output:

This command provides details on the XFS module, including its dependencies, supported versions, and any required firmware. Reviewing this information confirms compatibility with your current setup.

Identifying the Device

Accurate identification of the target device is crucial to avoid modifying the wrong disk, which could result in data loss. These commands will help you list and inspect block devices.

1. List All Block Devices

To get an overview of all attached storage devices, use:

lsblk

Expected Output:

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  100G  0 disk 
└─sda1   8:1    0   50G  0 part /
sdb      8:16   0  200G  0 disk 
└─sdb1   8:17   0   50G  0 part /mnt/data

The lsblk output lists all block devices and their partitions, along with their sizes and mount points. This lets you identify the device you intend to work with (e.g., /dev/sdb).

2. Detailed Device Information

To gather further details about each device and its partitions, use fdisk:

sudo fdisk -l

Expected Output:

Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
...

This output provides detailed information on each disk, including size, sector count, and partitioning scheme. Use this to confirm the correct device before making any modifications.

3. Identify Unpartitioned Space

If your target device is new and unpartitioned, you’ll need to partition it first. Use fdisk, gdisk, or parted to create new partitions.

Unmounting the Device (if applicable)

Before modifying a device, it’s essential to ensure that it is not in use. Unmounting prevents accidental data corruption during the process.

1. Check if the Device is Mounted

Determine if the target device is currently mounted:

mount | grep sdb1

Expected Output:

/dev/sdb1 on /mnt/data type xfs (rw)

If the output lists the device, it means it’s mounted. Note the mount point (e.g., /mnt/data) so you can unmount it in the next step.

2. Unmount the Device

Unmount the device before making any changes:

sudo umount /dev/sdb1

Expected Output:

No output indicates the device was unmounted successfully.

3. Handle Busy Devices

If you encounter a β€œdevice is busy” error, identify processes using the device:

sudo lsof /dev/sdb1

Expected Output:

The output lists any processes using the device. For example:

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
bash     1234 user cwd    DIR   8,17    4096   2 /mnt/data

Terminate these processes or stop the services to release the device, allowing it to be unmounted.

Creating a File System

Once the target device is unmounted, you can proceed with creating the new file system. The file system type you choose will depend on factors like performance, reliability, and feature support.

1. Choose the File System Type

Select a file system type that meets your needs. For example, ext4 is commonly used for general-purpose storage due to its balance of performance and features. Alternatively, you may choose xfs for large filesystems or high-performance needs.

2. Create the File System

To format the device with a specific file system type, use the appropriate mkfs command. For ext4, for instance:

sudo mkfs.ext4 /dev/sdb1

Expected Output:

mke2fs 1.45.5 (07-Jan-2020)
/dev/sdb1 contains a ext4 file system
Proceed anyway? (y,N) y
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 123e4567-e89b-12d3-a456-426614174000
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, ...

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

This output shows the mkfs process. It provides details on block allocation, journal creation, and the UUID for the new file system. If asked to proceed, type y to continue. Successful completion indicates the file system is now ready for use.

3. Label the File System (Optional)

Assigning a label to your file system makes it easier to identify, especially when managing multiple disks. To label an ext4 file system:

sudo e2label /dev/sdb1 mydata

No output means the label was applied successfully. You can verify the label by running sudo e2label /dev/sdb1.

Changing Existing File Systems

Modifying existing file systems requires extra caution. Backup any essential data before proceeding to avoid data loss. Here are common operations:

1. Resizing a File System

To resize a file system, unmount it first:

sudo umount /dev/sdb1

Then, adjust its size. For example, to shrink an ext4 file system to 50 GB:

sudo resize2fs /dev/sdb1 50G

Expected Output:

resize2fs 1.45.5 (07-Jan-2020)
Resizing the filesystem on /dev/sdb1 to 13107200 (4k) blocks.
The filesystem on /dev/sdb1 is now 13107200 (4k) blocks long.

This output confirms the new size of the file system. Note that shrinking can cause data loss if the specified size is smaller than the amount of data stored on the partition.

2. Converting File Systems

Some file systems support in-place conversions. For example, converting ext2 to ext3 to enable journaling can be done as follows:

sudo tune2fs -O has_journal /dev/sdb1

Expected Output:

tune2fs 1.45.5 (07-Jan-2020)
Setting filesystem feature 'has_journal'
Creating journal inode: done
This filesystem will be automatically checked every 29 mounts or 180 days, whichever comes first.  Use tune2fs -c or -i to override.

This output shows that journaling has been enabled on the file system. This feature enhances data integrity but may slightly reduce performance.

3. Backup Before Changes

Backing up important data is always recommended before altering file systems. Tools like rsync, tar, or dd can be used for this purpose. For example:

sudo rsync -av /mnt/mydata/ /mnt/backup/

Mounting the New File System

To make the new file system accessible to your system, you’ll need to mount it. Mounting allows you to interact with the file system and its contents.

1. Create a Mount Point

Choose a directory to serve as the mount point. If the directory does not exist, create it with:

sudo mkdir -p /mnt/mydata

Expected Output:

No output indicates successful directory creation.

2. Mount the File System

Once the mount point exists, mount the file system:

sudo mount /dev/sdb1 /mnt/mydata

Expected Output:

There’s no output if the mount is successful. You can verify by listing mounted file systems or checking the mount point with df.

3. Verify the Mount

To confirm that the file system is mounted correctly, use:

df -h /mnt/mydata

Expected Output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        99G   60M   94G   1% /mnt/mydata

The output shows disk usage information for /dev/sdb1, confirming it is mounted on /mnt/mydata. You’re now ready to use the new file system for storage or data operations.

Additional Considerations

When working with file systems, automating mounting, ensuring file system health, and managing permissions are essential practices. Here’s how to handle these additional considerations.

1. Automate Mounting at Boot

To ensure your new file system is automatically mounted at system startup, add it to the /etc/fstab file.

First, edit the file:

sudo nano /etc/fstab

Add the following line to the end of the file (replace the UUID and mount point as needed):

UUID=123e4567-e89b-12d3-a456-426614174000  /mnt/mydata  ext4  defaults  0  2

Entry Components:

Find the UUID for the device by running:

sudo blkid /dev/sdb1

Expected Output:

/dev/sdb1: UUID="123e4567-e89b-12d3-a456-426614174000" TYPE="ext4"

This output provides the UUID for /dev/sdb1. Use this identifier in the /etc/fstab entry to avoid issues if device names change on reboot. After adding this to /etc/fstab, the system will automatically mount the file system at /mnt/mydata on startup.

2. File System Maintenance

Keeping your file system healthy and monitored is crucial for data integrity and performance.

Check for Errors:

Use fsck to check the file system for errors and repair any issues.

sudo fsck /dev/sdb1

Expected Output:

fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/sdb1: clean, 10/6553600 files, 262144/26214400 blocks

This output confirms that the file system check is complete. If fsck finds no errors, it will indicate that the file system is clean. If it detects issues, fsck will attempt to repair them based on the options you specify.

Monitor Disk Usage:

Regularly monitor available space and usage with:

df -h

Expected Output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1        99G   60M   94G   1% /mnt/mydata

This output shows disk usage for all mounted file systems, including the newly created one. Use this information to ensure the file system has enough space for your needs.

3. Permissions and Ownership

To control who can access your mounted file system, set ownership and permissions.

Change ownership to a specific user and group:

sudo chown user:group /mnt/mydata

Expected Output:

No output means the command succeeded.

Set permissions:

sudo chmod 755 /mnt/mydata

Expected Output:

Again, no output indicates success.

These commands set the ownership of /mnt/mydata to a specified user and group. The chmod command assigns permissions, where 755 means the owner has read, write, and execute permissions, while others have read and execute only.

4. Security Considerations

When storing sensitive data, consider additional security measures, such as encryption and access control.

Encryption:

Use LUKS (Linux Unified Key Setup) to encrypt the partition:

sudo cryptsetup luksFormat /dev/sdb1

Expected Output:

WARNING!
========
This will overwrite data on /dev/sdb1 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/sdb1: 
Verify passphrase:

LUKS prompts for confirmation before encrypting the partition. Follow the prompts to set a passphrase, which will be required for future access.

Access Control (ACL):

To set fine-grained permissions, enable and configure Access Control Lists (ACLs). For example, to grant a user read access:

sudo setfacl -m u:username:r /mnt/mydata

Expected Output:

No output, indicating success.

This command grants username read-only access to /mnt/mydata. Use getfacl to verify ACLs or to modify permissions for other users as needed.

Challenges

  1. Can you explain what the root directory is in Linux? How is it different from the root user's home directory?
  2. In the context of the echo command in Linux, what is the relationship between /bin/echo and typing echo at the shell prompt?
  3. Using /dev/random or /dev/urandom, how would you create a file filled with 100 lines of random characters? Can you write a shell command to do this?
  4. What is the purpose of the /bin and /sbin directories in a Linux system? Can you list some of the files you would typically find in these directories and briefly describe what they do?
  5. In a Linux system, what is the purpose of /usr/bin? What kind of files are typically stored in this directory and why?
  6. Can you explain the difference between character and block device drivers in UNIX? Can the ls command be used to determine whether a device file represents a character device or a block device? If so, how?
  7. By looking at the contents of /proc/cpuinfo, can you determine the model of your CPU? What command would you use to display this file's contents?
  8. What hidden files might you expect to find in the /root directory? Why might these files be hidden?
  9. What command can you use to create a new file system in Linux? What options does this command typically have, and how are they used?
  10. How can you check disk information and partitions on a device in Linux? Please write down the command you would use and briefly explain its output.
  11. What steps and commands would you use to mount a file system in Linux? Please provide an example command and explain what it does.

Table of Contents

    Files and Filesystems
    1. Types of Files in a UNIX Filesystem
      1. Classification Based on Purpose
      2. Classification Based on Storage
      3. Classification Based on Visibility
      4. Note About File Names
      5. Special Directory Names
    2. Directory Structure
    3. File System Types
    4. Managing File Systems
      1. Checking Existing File Systems
      2. Ensuring File System Support
      3. Identifying the Device
      4. Unmounting the Device (if applicable)
      5. Creating a File System
      6. Changing Existing File Systems
      7. Mounting the New File System
      8. Additional Considerations
    5. Challenges