Last modified: April 19, 2025
This article is written in: 🇺🇸
Working with files on Unix-based systems often involves managing multiple files and directories, especially when it comes to storage or transferring data. Tools like tar
and gzip
are invaluable for packaging and compressing files efficiently. Understanding how to use these commands can simplify tasks like backing up data, sharing files, or deploying applications.
Imagine you have a collection of files and folders that you want to bundle together into a single package. Think of it as packing items into a suitcase for a trip—tar
acts as the suitcase that holds everything together.
Files and Directories:
+-----------+ +-----------+ +-----------+
| Folder1 | | Folder2 | | File1 |
+-----------+ +-----------+ +-----------+
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\ | /
\| /
+-----------------+
| Tar Archive |
+-----------------+
In this diagram, multiple folders and files are combined into a single tar archive. Now, to make this package even more manageable, especially for transferring over networks or saving space, we can compress it using gzip
. This is akin to vacuum-sealing your suitcase to make it as compact as possible.
Tar Archive:
+-----------------+
| Tar Archive |
+-----------------+
|
v
+----------------------+
| Gzipped Tar Archive |
+----------------------+
By compressing the tar archive, we reduce its size, making it faster to transfer and requiring less storage space.
The tar
command stands for "tape archive," a name that harks back to when data was stored on magnetic tapes. Despite its historical name, tar
remains a powerful utility for creating and manipulating archive files on modern systems. It consolidates multiple files and directories into a single archive file while preserving important metadata like file permissions, ownership, and timestamps.
Some common options used with the tar
command include:
Option | Description |
-c |
Create a new archive |
-v |
Verbosely list files processed |
-f |
Specify the filename of the archive |
-x |
Extract files from an archive |
-t |
List the contents of an archive |
-z |
Compress or decompress the archive using gzip |
-j |
Compress or decompress the archive using bzip2 |
-C |
Change to a directory before performing actions |
For example, to create a tar archive named archive.tar
containing the directories dir1
, dir2
, and the file file1.txt
, you would use:
tar -cvf archive.tar dir1 dir2 file1.txt
Breaking down this command:
-c
tells tar
to create a new archive.-v
enables verbose mode, so it lists the files being processed.-f archive.tar
specifies the name of the archive file to create.Upon running this command, you might see output like:
dir1/
dir1/file2.txt
dir2/
dir2/file3.txt
file1.txt
This output shows that tar
is including each specified file and directory into the archive.
gzip
While tar
itself does not compress files, it can be combined with compression utilities like gzip
to reduce the size of the archive. This is often done by adding the -z
option to the tar
command.
To create a compressed tar archive (often called a "tarball") using gzip, you would run:
tar -czvf archive.tar.gz dir1 dir2 file1.txt
Here, the -z
option tells tar
to compress the archive using gzip. The resulting file archive.tar.gz
is both an archive and compressed.
To extract files from a tar archive, you use the -x
option. For example:
tar -xvf archive.tar
This command extracts all files from archive.tar
into the current directory. If the archive was compressed with gzip, you can still extract it in one step:
tar -xzvf archive.tar.gz
Again, the -z
option is used to indicate that the archive is compressed with gzip.
Before extracting files, you might want to see what's inside an archive. You can do this with the -t
option:
tar -tvf archive.tar
Or for a compressed archive:
tar -tzvf archive.tar.gz
This command lists all the files contained in the archive without extracting them. The output might look like:
-rw-r--r-- user/group 1024 2024-10-10 12:00 dir1/file2.txt
-rw-r--r-- user/group 2048 2024-10-10 12:01 dir2/file3.txt
-rw-r--r-- user/group 512 2024-10-10 12:02 file1.txt
gzip
IndependentlyThe gzip
command can also be used on its own to compress individual files. For example, to compress a file named largefile.txt
, you can use:
gzip largefile.txt
This command replaces largefile.txt
with a compressed file named largefile.txt.gz
.
To decompress the file, you can use:
gzip -d largefile.txt.gz
Or equivalently:
gunzip largefile.txt.gz
Suppose you have a directory called project
that you want to back up. You can create a compressed archive of the directory with:
tar -czvf project_backup.tar.gz project
This command creates a compressed tarball named project_backup.tar.gz
containing the entire project
directory.
If you want to extract the contents of an archive to a specific directory, you can use the -C
option. For example:
tar -xzvf project_backup.tar.gz -C /path/to/destination
This command extracts the contents of project_backup.tar.gz
into /path/to/destination
.
One of the strengths of using tar
is that it preserves file permissions and ownership by default. This is important when you're archiving files that need to maintain their original access rights.
For instance, if a file is owned by user1
and has specific permissions, when you extract the archive as a different user, tar
will attempt to preserve the original ownership and permissions. If you have the necessary permissions (e.g., running as root), the files will retain their original ownership.
Sometimes, you might want to exclude certain files or directories when creating an archive. You can use the --exclude
option to do this.
For example:
tar -czvf archive.tar.gz dir1 --exclude='dir1/tmp/*'
This command archives dir1
but excludes all files in the dir1/tmp
directory.
You can create an archive and transfer it over SSH in one step. This is useful for backing up data from a remote server.
ssh user@remotehost "tar -czvf - /path/to/dir" > archive.tar.gz
In this command:
ssh user@remotehost
connects to the remote host."tar -czvf - /path/to/dir"
runs the tar
command on the remote host, with -
as the filename, which means the output is sent to stdout.> archive.tar.gz
redirects the output to a file on the local machine.For very large archives, you might need to split the archive into smaller pieces. You can do this using the split
command.
First, create the archive without compression:
tar -cvf large_archive.tar dir_to_archive
Then split the archive into pieces of 100MB:
split -b 100M large_archive.tar "archive_part_"
This command creates files named archive_part_aa
, archive_part_ab
, etc.
To reconstruct the original archive, you can concatenate the parts:
cat archive_part_* > large_archive.tar
Then extract the archive as usual.
tar
While gzip
is commonly used, tar
can work with other compression tools like bzip2
and xz
for better compression ratios.
bzip2
To create a tar archive compressed with bzip2
, use the -j
option:
tar -cjvf archive.tar.bz2 dir1 dir2 file1.txt
To extract:
tar -xjvf archive.tar.bz2
xz
For xz
compression, use the -J
option:
tar -cJvf archive.tar.xz dir1 dir2 file1.txt
To extract:
tar -xJvf archive.tar.xz
-czvf
is not the same as -cfvz
. Typically, you should specify the action (-c
, -x
, -t
) first, followed by other options..tar.gz
for gzip-compressed archives, the extension does not affect how the file is processed. However, using standard extensions helps others understand the file format.tar
with sudo
can help preserve ownership.tar
will overwrite existing files when extracting. Use the --keep-old-files
option to prevent this.tar
. To check that everything was included, copy your archives to the /tmp
folder and extract the files there. Delete the copies from /tmp
when done.tar
with and without the -z
option to create an archive of any folder. Compare the sizes of your original folder, the archive, and the compressed archive..txt
files in a given folder using gzip
. The script should skip already compressed files (with .gz
extension).tar
and gzip
commands to create a compressed archive of a folder. Then, extract the archive to a new location and compare the contents of the original folder and the extracted folder to ensure they are identical.gzip -l
command to view the compression ratio and other details about the compressed archive.gzip
, then decompress it using the same tools. Time how long each operation takes and compare the results.file
command to determine the types of files in the compressed archive without extracting it.gzip
. The script should display the file name and its size before and after compression.