Last modified: July 24, 2022
This article is written in: πΊπΈ
Working with Data Streams
Input redirection (<
) allows a command to read from a file, while output redirection (>
) sends a command's output to a file. Streams like stdin, stdout, and stderr control the flow of data between commands and the system, where stdin is the input, stdout is the standard output, and stderr is the error output. Pipes (|
) connect the output of one command directly into the input of another, enabling you to chain commands together seamlessly. Filters, such as grep
and awk
, process these data streams, allowing you to search, manipulate, and extract information efficiently.
Standard Streams
Unix and Unix-like operating systems use three primary standard streams for program interaction. These streams are set up at the start of a terminal session and act as the main channels for communication between a program and its environment:
I. Standard Input
- stdin is the input stream where data is fed into a program, acting as the primary source for reading input data.
- The default source for stdin is usually the keyboard.
- Programs commonly use stdin to read user input from the terminal, though this input stream can also be redirected from files.
II. Standard Output
- stdout serves as the primary output stream for a program, where it sends data that needs to be displayed.
- The default destination for stdout is typically the terminal screen or console.
- Programs use stdout to display results, messages, or general output data, and this output can be redirected to files or piped to other programs.
III. Standard Error
- stderr is a dedicated output stream for error messages and diagnostics, which are kept separate from regular output.
- Like stdout, the default destination for stderr is usually the terminal screen.
- Programs send error messages, such as those generated by failed operations like accessing a non-existent file, to stderr, which can be independently redirected from stdout.
+----------------------------+
| Terminal / Screen |
+----------------------------+
^ ^
| |
stdout stderr
| |
+------------------------------+
| Process |
| (Unix Program) |
+------------------------------+
^
|
stdin
|
+------------------+
| Keyboard/File |
+------------------+
Pipe
The pipe (|
) character is an essential tool that allows for data to flow from one command to another. It's a form of redirection that captures the standard output (stdout) of one command and feeds it as the standard input (stdin) to another.
+--------------+ +--------------+
| Process A | | Process B |
| (Producer) | | (Consumer) |
+--------------+ +--------------+
| ^
| stdout | stdin
| |
+-----------[PIPE]-------+
Example 1: Filtering User Details
Suppose you want to see details about a person named "user_name" using the w
command and subsequently modify "user_name" to "admin". This can be done with:
w | grep user_name | sed s/user_name/admin/g
Here, the grep command filters the output of w to only lines containing "user_name", and then sed changes "user_name" to "admin".
Example 2: Sending Email with Current Date
You can combine the output of the date command (which gives the current date and time) with the mail command to send an email:
date | mail -s "This is a remote test" user1@rhhost1.localnet.com
Advanced Piping
- The traditional pipe
|
allows you to take the standard output (stdout) from one command and send it as input to another command, effectively chaining commands together while excluding any errors or standard error (stderr) streams. - When both the standard output and standard error need to be captured and passed to another command, the
|&
syntax is utilized. This feature is particularly useful when you want to process both successful output and errors together in a pipeline.
Example: Searching for Text Files with Error Inclusion
Suppose you want to list all text files using ls -l
and search for .txt
files using grep
. By including both output and error messages, you can ensure that any issues encountered during listing are also captured:
ls -l |& grep "\.txt$"
In this example, ls -l
may produce both regular output and error messages (such as "Permission denied" errors). The |&
operator ensures that both are passed to grep
, which then filters the output for lines ending with .txt
.
Example: Displaying and Saving Output
To display both stdout and stderr on the screen while saving them to a file named output.txt
, you can use:
ls -l |& tee output.txt
Here, ls -l |&
captures both the regular output and any errors, which are then passed to tee
. The tee
command displays the combined output on the terminal and writes it to output.txt
.
Redirection
Redirection is a mechanism that controls the destination of a command's output, directing it to another command, a file, or even discarding it. It also allows commands to receive input from files instead of the keyboard.
+------------------------+
| Process |
|------------------------|
| stdin: from file |ββββββββββ input.txt
| stdout: to file |ββββββββββΊ output.txt
| stderr: to file |ββββββββββΊ error.txt
+------------------------+
I. Redirecting Standard Output
The >
symbol redirects the standard output of a command to a file. For example:
echo "hello" > file.txt
If the file already exists, it will be overwritten. To append to an existing file, use >>
:
echo "Hello" > file.txt
echo "World!" >> file.txt
II. Redirecting Standard Error
Errors can be separately redirected using 2>
:
less non_existent_file 2> errors.txt
To append errors to an existing file, use 2>>
:
less non_existent_file 2>> errors.txt
III. Redirecting Both Standard Output and Error
Use &>
to overwrite a file with both outputs or &>>
to append both to the file:
command &> output.txt
command &>> output.txt
IV. Redirecting Standard Input
The <
symbol redirects the standard input of a command to come from a file instead of the keyboard. For example:
sort < unsorted_list.txt
In this example, the sort
command takes its input from unsorted_list.txt
instead of waiting for user input.
V. Using Input and Output Redirection Together
Commands can utilize both input and output redirection simultaneously. For example:
sort < unsorted.txt > sorted.txt
In this case, the sort
command reads the contents of unsorted.txt
, sorts the lines, and writes the sorted output to sorted.txt
. This demonstrates how input redirection (<
) takes data from a file, while output redirection (>
) sends the processed result to another file.
VI. Here-Documents with <<
The <<
operator, known as a here-document, allows you to provide multi-line input directly within the shell script or command line, ending the input with a specified delimiter. For example:
cat <<EOF
This is a test file
with multiple lines
of text.
EOF
In this example, everything between <<EOF
and EOF
is treated as input to the cat
command. The delimiter EOF
can be replaced with any token, and it marks the end of the input block.
VII. View and Save Output Simultaneously
The tee
command is useful for displaying output on the screen while also saving it to a file:
command | tee output.txt # overwrite the file
command | tee -a output.txt # append to the file
VIII. Handling Buffering Issues
Sometimes, programs buffer their output, causing delays or issues when trying to redirect. The script
command can be a solution:
output=$(script -c your_command /dev/null)
echo "$output"
Here, the -c
option specifies the command to run, while /dev/null
discards any input. The result is captured in the output
variable.
Summary Table
Syntax | StdOut Visible | StdErr Visible | StdOut in File | StdErr in File | Existing File Behavior |
> |
No | Yes | Yes | No | Overwrite |
>> |
No | Yes | Yes | No | Append |
2> |
Yes | No | No | Yes | Overwrite |
2>> |
Yes | No | No | Yes | Append |
&> |
No | No | Yes | Yes | Overwrite |
&>> |
No | No | Yes | Yes | Append |
tee |
Yes | Yes | Yes | No | Overwrite |
tee -a |
Yes | Yes | Yes | No | Append |
\|& tee |
Yes | Yes | Yes | Yes | Overwrite |
\|& tee -a |
Yes | Yes | Yes | Yes | Append |
Filters
Filters are specialized commands designed to process text, typically working with streams of text data. They are predominantly used with pipes (|
) to modify or analyze the output of another command. A filter reads input line by line, transforms it in some way, and then outputs the result. This processing method is particularly useful in Unix-like operating systems, where filters can be combined with other commands in a pipeline to perform complex text transformations and data analysis. Common examples of filters include grep
for searching text, sort
for arranging lines in a particular order, and awk
for pattern scanning and processing. Filters are a fundamental part of command-line data manipulation, allowing users to efficiently process large amounts of text with simple, concise commands.
+---------------+ +-------------+ +---------------+
| Producer | | Filter | | Consumer |
| (Process) |ββββββββββΆ| (Transformer|ββββββββββΆ| (Process) |
| stdout |ββββββββββΆ| process) |ββββββββββΆ| stdin |
+---------------+ +-------------+ +---------------+
Common Unix Filters
Command | Description | Basic Usage | Common Options | Examples |
sort |
Orders lines in text alphabetically or numerically. | sort [options] [file] |
- -n : Sort numerically. - -r : Reverse order. - -k : Specify sort key. |
sort -n numbers.txt sorts numbers.txt numerically. |
uniq |
Filters out repeated lines in adjacent positions, simplifying repeated content. | uniq [options] [file] |
- -c : Count occurrences. - -d : Only show duplicates. - -u : Only show unique lines. |
uniq -c sorted.txt counts occurrences of unique lines in sorted.txt . |
cut |
Extracts specific columns or fields from each line, useful for structured text. | cut [options] [file] |
- -f : Specify delimiter. - -d : Use a custom delimiter. - -c : Choose column or range of characters. |
cut -f1,3 -d',' data.csv extracts columns 1 and 3 from data.csv , using ',' as a delimiter. |
tr |
Transforms characters into others or removes specific characters. | tr [options] [string1] [string2] |
- -d : Delete characters in string1 . - -s : Squeeze repeated characters. - -c : Compliment string1 . |
tr 'a-z' 'A-Z' < input.txt converts lowercase to uppercase in input.txt . |
wc |
Counts lines, words, and characters in text. | wc [options] [file] |
- -l : Line count. - -w : Word count. - -c : Character count. |
wc -l file.txt returns the line count for file.txt . |
grep |
Searches input for lines matching a pattern or regular expression. | grep [options] pattern [file] |
- -i : Ignore case. - -v : Invert match. - -r : Search recursively in directories. |
grep 'error' logfile.txt searches for 'error' in logfile.txt . |
awk |
Processes text by extracting fields and performing actions based on conditions. | awk 'pattern {action}' [file] |
- -F : Specify field separator. - -v : Invert match. - -f : Use file for program script. |
awk '{print $1, $3}' data.txt prints columns 1 and 3 from data.txt . |
Examples
Let's take a look at a few examples:
I. Combine and sort the content of file1.txt and file2.txt, and redirect the sorted output to sorted.txt:
This command uses the sort
utility to combine the contents of both file1.txt
and file2.txt
while sorting all the lines alphabetically (or numerically, if options are provided). By using the redirection operator >
, the sorted output is saved into a new file called sorted.txt
.
Suppose file1.txt contains:
banana
apple
And file2.txt contains:
cherry
apple
Running the command:
sort file1.txt file2.txt > sorted.txt
The content in sorted.txt might be:
CODE_BLOCK_PLACEHOLDER
II. Eliminate any adjacent duplicate lines from sorted.txt and save the result in deduped.txt:
After sorting, duplicate lines become adjacent. The uniq
command then reads the sorted file and removes any consecutive duplicate lines. The output, which has duplicates eliminated, is redirected into a new file called deduped.txt
. This is particularly useful when you need a list where each line is unique.
Given the sorted.txt content from the previous step:
CODE_BLOCK_PLACEHOLDER
Running:
CODE_BLOCK_PLACEHOLDER
The resulting deduped.txt would be:
CODE_BLOCK_PLACEHOLDER
III. Display lines containing the word "error" from deduped.txt:
The grep
command is used to search within deduped.txt for lines that include the word "error". It is case-sensitive by default, so only lines with exactly "error" (all lowercase) will be matched. The matched lines are then printed to the terminal.
If deduped.txt contains:
CODE_BLOCK_PLACEHOLDER
Running:
CODE_BLOCK_PLACEHOLDER
The output will be:
CODE_BLOCK_PLACEHOLDER
(Note: "Error: system crash" is not matched because grep
is case-sensitive.)
IV. Show lines from deduped.txt that contain the pattern "error", along with the line number:
Using awk
, this command searches for lines containing "error" in deduped.txt and prints the line number (NR
, which represents the current record or line number) followed by the entire line. This gives you context on where each occurrence is located in the file.
Consider deduped.txt with the following content:
apple
apple
banana
cherry
Executing:
CODE_BLOCK_PLACEHOLDER
Expected output:
CODE_BLOCK_PLACEHOLDER
V. Replace all occurrences of 'old_word' with 'new_word' in file.txt:
The sed
(Stream Editor) command in this example performs a substitution. The command will search through file.txt for every instance of old_word
and replace it with new_word
. The g
flag at the end ensures that all occurrences on each line are replaced. Note that this command writes the changes to standard output; to update the file itself, you may need to use the -i
(in-place) option depending on your shell or operating system.
Assume file.txt contains:
apple
apple
banana
cherry
Running:
CODE_BLOCK_PLACEHOLDER
Would output:
uniq sorted.txt > deduped.txt
Combining Filters
When you chain multiple filters using the pipe operator (|
), the output of one command becomes the input of the next. This allows you to perform complex text transformations and analyses using just one command line. Let's take a look a the following example:
apple
banana
cherry
I. cat arduino_log.txt
The cat
command reads the contents of the file arduino_log.txt
and sends it to standard output. This file might contain log messages from your Arduino or sensor readings.
Suppose arduino_log.txt contains:
apple
error: file not found
banana
cherry
Error: system crash
The output is exactly whatβs contained in the file:
grep 'error' deduped.txt
II. sort
The sort
command reads the incoming text and arranges the lines in lexicographical order. Sorting helps to group identical or similar log messages together.
After sorting:
error: file not found
III. uniq
The uniq
command removes any adjacent duplicate lines. Since the logs have been sorted, identical messages are now consecutive, so this command filters out the duplicates.
After removing duplicates:
apple
error: file not found
banana
error: network timeout
cherry
IV. grep 'error'
The grep
command then filters the output, selecting only the lines that contain the word "error". This command is case-sensitive by default.
After filtering:
awk '/error/ {print NR, $0}' deduped.txt
Filters are very important components in the Unix philosophy of creating simple, modular tools that do one job and do it well. When used effectively, they provide powerful text processing capabilities with just a few keystrokes.
Challenges
- Find the number of users currently logged in. Hint: Use the
who
orw
command followed by a line count. - Generate a sorted list of all system users. Hint: The
/etc/passwd
file contains user information. - List
.conf
filenames in the/etc
directory and sort them by string length. You may need to usels
,awk
, andsort
. - Print the first and seventh columns of the
/etc/passwd
file. These columns represent the username and the user's shell, respectively. - Display each word from the
/etc/fstab
file on a separate line, and then count the total number of lines in the file. This file provides information on disk drives and their mount points. - Find out how many users have a unique shell (i.e., they're the only ones using a particular shell). Use
/etc/passwd
as your source. - From any text file of your choice, identify the ten most frequently occurring words and display their counts.
- Examine the
/etc/systemd/system
directory and list the service files that are currently active on the system. - Search for words in a text file that are longer than 7 characters, contain the letter 'z', and display them sorted in reverse alphabetical order.
- Find the top five directories that consume the most disk space in your home directory. Hint: Use the
du
andsort
commands. - Starting from your home directory, list all files (recursively, including subdirectories) that were modified in the last 24 hours, sorted by their modification time.