Last modified: October 10, 2024
This article is written in: 🇺🇸
Working with Data Streams
Input redirection (<
) allows a command to read from a file, while output redirection (>
) sends a command's output to a file. Streams like stdin, stdout, and stderr control the flow of data between commands and the system, where stdin is the input, stdout is the standard output, and stderr is the error output. Pipes (|
) connect the output of one command directly into the input of another, enabling you to chain commands together seamlessly. Filters, such as grep
and awk
, process these data streams, allowing you to search, manipulate, and extract information efficiently.
Standard Streams
Unix and Unix-like operating systems use three primary standard streams for program interaction. These streams are set up at the start of a terminal session and act as the main channels for communication between a program and its environment:
I. Standard Input
- stdin is the input stream where data is fed into a program, acting as the primary source for reading input data.
- The default source for stdin is usually the keyboard.
- Programs commonly use stdin to read user input from the terminal, though this input stream can also be redirected from files.
II. Standard Output
- stdout serves as the primary output stream for a program, where it sends data that needs to be displayed.
- The default destination for stdout is typically the terminal screen or console.
- Programs use stdout to display results, messages, or general output data, and this output can be redirected to files or piped to other programs.
III.Standard Error
- stderr is a dedicated output stream for error messages and diagnostics, which are kept separate from regular output.
- Like stdout, the default destination for stderr is usually the terminal screen.
- Programs send error messages, such as those generated by failed operations like accessing a non-existent file, to stderr, which can be independently redirected from stdout.
Pipe
The pipe (|
) character is an essential tool that allows for data to flow from one command to another. It's a form of redirection that captures the standard output (stdout) of one command and feeds it as the standard input (stdin) to another.
Example 1: Filtering User Details
Suppose you want to see details about a person named "user_name" using the w
command and subsequently modify "user_name" to "admin". This can be done with:
w | grep user_name | sed s/user_name/admin/g
Here, the grep command filters the output of w to only lines containing "user_name", and then sed changes "user_name" to "admin".
Example 2: Sending Email with Current Date
You can combine the output of the date command (which gives the current date and time) with the mail command to send an email:
date | mail -s "This is a remote test" user1@rhhost1.localnet.com
Advanced Piping
- The traditional pipe
|
allows you to take the standard output (stdout) from one command and send it as input to another command, effectively chaining commands together while excluding any errors or standard error (stderr) streams. - When both the standard output and standard error need to be captured and passed to another command, the
|&
syntax is utilized. This feature is particularly useful when you want to process both successful output and errors together in a pipeline.
Example: Searching for Text Files with Error Inclusion
Suppose you want to list all text files using ls -l
and search for .txt
files using grep
. By including both output and error messages, you can ensure that any issues encountered during listing are also captured:
ls -l |& grep "\.txt$"
In this example, ls -l
may produce both regular output and error messages (such as "Permission denied" errors). The |&
operator ensures that both are passed to grep
, which then filters the output for lines ending with .txt
.
Example: Displaying and Saving Output
To display both stdout and stderr on the screen while saving them to a file named output.txt
, you can use:
ls -l |& tee output.txt
Here, ls -l |&
captures both the regular output and any errors, which are then passed to tee
. The tee
command displays the combined output on the terminal and writes it to output.txt
.
Redirection
Redirection is a mechanism that controls the destination of a command's output, directing it to another command, a file, or even discarding it. It also allows commands to receive input from files instead of the keyboard.
I. Redirecting Standard Output
The >
symbol redirects the standard output of a command to a file. For example:
echo "hello" > file.txt
If the file already exists, it will be overwritten. To append to an existing file, use >>
:
echo "Hello" > file.txt
echo "World!" >> file.txt
II. Redirecting Standard Error
Errors can be separately redirected using 2>
:
less non_existent_file 2> errors.txt
To append errors to an existing file, use 2>>
:
less non_existent_file 2>> errors.txt
III. Redirecting Both Standard Output and Error
Use &>
to overwrite a file with both outputs or &>>
to append both to the file:
command &> output.txt
command &>> output.txt
IV. Redirecting Standard Input
The <
symbol redirects the standard input of a command to come from a file instead of the keyboard. For example:
sort < unsorted_list.txt
In this example, the sort
command takes its input from unsorted_list.txt
instead of waiting for user input.
V. Using Input and Output Redirection Together
Commands can utilize both input and output redirection simultaneously. For example:
sort < unsorted.txt > sorted.txt
In this case, the sort
command reads the contents of unsorted.txt
, sorts the lines, and writes the sorted output to sorted.txt
. This demonstrates how input redirection (<
) takes data from a file, while output redirection (>
) sends the processed result to another file.
VI. Here-Documents with <<
The <<
operator, known as a here-document, allows you to provide multi-line input directly within the shell script or command line, ending the input with a specified delimiter. For example:
cat <<EOF
This is a test file
with multiple lines
of text.
EOF
In this example, everything between <<EOF
and EOF
is treated as input to the cat
command. The delimiter EOF
can be replaced with any token, and it marks the end of the input block.
VII. View and Save Output Simultaneously
The tee
command is useful for displaying output on the screen while also saving it to a file:
command | tee output.txt # overwrite the file
command | tee -a output.txt # append to the file
VIII. Handling Buffering Issues
Sometimes, programs buffer their output, causing delays or issues when trying to redirect. The script
command can be a solution:
output=$(script -c your_command /dev/null)
echo "$output"
Here, the -c
option specifies the command to run, while /dev/null
discards any input. The result is captured in the output
variable.
Summary Table
Syntax | StdOut Visible | StdErr Visible | StdOut in File | StdErr in File | Existing File Behavior |
> |
No | Yes | Yes | No | Overwrite |
>> |
No | Yes | Yes | No | Append |
2> |
Yes | No | No | Yes | Overwrite |
2>> |
Yes | No | No | Yes | Append |
&> |
No | No | Yes | Yes | Overwrite |
&>> |
No | No | Yes | Yes | Append |
tee |
Yes | Yes | Yes | No | Overwrite |
tee -a |
Yes | Yes | Yes | No | Append |
\|& tee |
Yes | Yes | Yes | Yes | Overwrite |
\|& tee -a |
Yes | Yes | Yes | Yes | Append |
Filters
Filters are specialized commands designed to process text, typically working with streams of text data. They are predominantly used with pipes (|
) to modify or analyze the output of another command. A filter reads input line by line, transforms it in some way, and then outputs the result. This processing method is particularly useful in Unix-like operating systems, where filters can be combined with other commands in a pipeline to perform complex text transformations and data analysis. Common examples of filters include grep
for searching text, sort
for arranging lines in a particular order, and awk
for pattern scanning and processing. Filters are a fundamental part of command-line data manipulation, allowing users to efficiently process large amounts of text with simple, concise commands.
Common Unix Filters
Command | Description | Basic Usage | Common Options | Examples |
sort |
Orders lines in text alphabetically or numerically. | sort [options] [file] |
- -n : Sort numerically. - -r : Reverse order. - -k : Specify sort key. |
sort -n numbers.txt sorts numbers.txt numerically. |
uniq |
Filters out repeated lines in adjacent positions, simplifying repeated content. | uniq [options] [file] |
- -c : Count occurrences. - -d : Only show duplicates. - -u : Only show unique lines. |
uniq -c sorted.txt counts occurrences of unique lines in sorted.txt . |
cut |
Extracts specific columns or fields from each line, useful for structured text. | cut [options] [file] |
- -f : Specify delimiter. - -d : Use a custom delimiter. - -c : Choose column or range of characters. |
cut -f1,3 -d',' data.csv extracts columns 1 and 3 from data.csv , using ',' as a delimiter. |
tr |
Transforms characters into others or removes specific characters. | tr [options] [string1] [string2] |
- -d : Delete characters in string1 . - -s : Squeeze repeated characters. - -c : Compliment string1 . |
tr 'a-z' 'A-Z' < input.txt converts lowercase to uppercase in input.txt . |
wc |
Counts lines, words, and characters in text. | wc [options] [file] |
- -l : Line count. - -w : Word count. - -c : Character count. |
wc -l file.txt returns the line count for file.txt . |
grep |
Searches input for lines matching a pattern or regular expression. | grep [options] pattern [file] |
- -i : Ignore case. - -v : Invert match. - -r : Search recursively in directories. |
grep 'error' logfile.txt searches for 'error' in logfile.txt . |
awk |
Processes text by extracting fields and performing actions based on conditions. | awk 'pattern {action}' [file] |
- -F : Specify field separator. - -v : Invert match. - -f : Use file for program script. |
awk '{print $1, $3}' data.txt prints columns 1 and 3 from data.txt . |
Examples
I. Combine and sort the content of file1.txt and file2.txt, and redirect the sorted output to sorted.txt:
sort file1.txt file2.txt > sorted.txt
II. Eliminate any adjacent duplicate lines from sorted.txt and save the result in deduped.txt:
uniq sorted.txt > deduped.txt
III. Display lines containing the word "error" from deduped.txt:
grep 'error' deduped.txt
IV. Show lines from deduped.txt that contain the pattern "error", along with the line number:
awk '/error/ {print NR, $0}' deduped.txt
V. Replace all occurrences of 'old_word' with 'new_word' in file.txt:
sed 's/old_word/new_word/g' file.txt
Combining Filters
Filters become even more powerful when combined. By chaining together multiple filters using the pipe (|), you can perform complex text transformations and analyses with a single command.
# Sort the content of a file, eliminate duplicates, and then display only lines containing "error"
cat file.txt | sort | uniq | grep 'error'
Filters are foundational components in the Unix philosophy of creating simple, modular tools that do one job and do it well. When used effectively, they provide powerful text processing capabilities with just a few keystrokes.
Challenges
- Find the number of users currently logged in. Hint: Use the
who
orw
command followed by a line count. - Generate a sorted list of all system users. Hint: The
/etc/passwd
file contains user information. - List
.conf
filenames in the/etc
directory and sort them by string length. You may need to usels
,awk
, andsort
. - Print the first and seventh columns of the
/etc/passwd
file. These columns represent the username and the user's shell, respectively. - Display each word from the
/etc/fstab
file on a separate line, and then count the total number of lines in the file. This file provides information on disk drives and their mount points. - Find out how many users have a unique shell (i.e., they're the only ones using a particular shell). Use
/etc/passwd
as your source. - From any text file of your choice, identify the ten most frequently occurring words and display their counts.
- Examine the
/etc/systemd/system
directory and list the service files that are currently active on the system. - Search for words in a text file that are longer than 7 characters, contain the letter 'z', and display them sorted in reverse alphabetical order.
- Find the top five directories that consume the most disk space in your home directory. Hint: Use the
du
andsort
commands. - Starting from your home directory, list all files (recursively, including subdirectories) that were modified in the last 24 hours, sorted by their modification time.