Last modified: August 05, 2024

This article is written in: 🇺🇸

Working with Unix Data Streams

In Unix, input redirection, streams, pipes, and filters are fundamental concepts for efficient data processing. Input redirection (<) allows commands to read from files, while output redirection (>) sends output to files. Streams (stdin, stdout, stderr) manage the flow of data between commands and the system. Pipes (|) connect the output of one command to the input of another, enabling seamless command chaining. Filters, such as grep and awk, are commands that process data streams, allowing users to manipulate and extract information efficiently. Together, these tools offer powerful ways to handle and transform data in Unix.

Standard Streams

Unix and Unix-like operating systems use three primary standard streams for program interaction. These streams are set up at the start of a terminal session and act as the main channels for communication between a program and its environment:

I. stdin (Standard Input)

II. stdout (Standard Output)

III. stderr (Standard Error)

Pipe

The pipe (|) character is an essential tool that allows for data to flow from one command to another. It's a form of redirection that captures the standard output (stdout) of one command and feeds it as the standard input (stdin) to another.

Example 1: Filtering User Details

Suppose you want to see details about a person named "user_name" using the w command and subsequently modify "user_name" to "admin". This can be done with:

w | grep user_name | sed s/user_name/admin/g

Here, the grep command filters the output of w to only lines containing "user_name", and then sed changes "user_name" to "admin".

Example 2: Sending Email with Current Date

You can combine the output of the date command (which gives the current date and time) with the mail command to send an email:

date | mail -s "This is a remote test" user1@rhhost1.localnet.com

Advanced Piping

Example: Searching for Text Files with Error Inclusion

Suppose you want to list all text files using ls -l and search for .txt files using grep. By including both output and error messages, you can ensure that any issues encountered during listing are also captured:

ls -l |& grep "\.txt$"

In this example, ls -l may produce both regular output and error messages (such as "Permission denied" errors). The |& operator ensures that both are passed to grep, which then filters the output for lines ending with .txt.

Example: Displaying and Saving Output

To display both stdout and stderr on the screen while saving them to a file named output.txt, you can use:

ls -l |& tee output.txt

Here, ls -l |& captures both the regular output and any errors, which are then passed to tee. The tee command displays the combined output on the terminal and writes it to output.txt.

Redirection

Redirection is a mechanism that controls the destination of a command's output, directing it to another command, a file, or even discarding it. It also allows commands to receive input from files instead of the keyboard.

I. Redirecting Standard Output

The > symbol redirects the standard output of a command to a file. For example:

echo "hello" > file.txt

If the file already exists, it will be overwritten. To append to an existing file, use >>:

echo "Hello" > file.txt
echo "World!" >> file.txt

II. Redirecting Standard Error

Errors can be separately redirected using 2>:

less non_existent_file 2> errors.txt

To append errors to an existing file, use 2>>:

less non_existent_file 2>> errors.txt

III. Redirecting Both Standard Output and Error

Use &> to overwrite a file with both outputs or &>> to append both to the file:

command &> output.txt
command &>> output.txt

IV. Redirecting Standard Input

The < symbol redirects the standard input of a command to come from a file instead of the keyboard. For example:

sort < unsorted_list.txt

In this example, the sort command takes its input from unsorted_list.txt instead of waiting for user input.

V. Using Input and Output Redirection Together

Commands can utilize both input and output redirection simultaneously. For example:

sort < unsorted.txt > sorted.txt

In this case, the sort command reads the contents of unsorted.txt, sorts the lines, and writes the sorted output to sorted.txt. This demonstrates how input redirection (<) takes data from a file, while output redirection (>) sends the processed result to another file.

VI. Here-Documents with <<

The << operator, known as a here-document, allows you to provide multi-line input directly within the shell script or command line, ending the input with a specified delimiter. For example:

cat <<EOF
This is a test file
with multiple lines
of text.
EOF

In this example, everything between <<EOF and EOF is treated as input to the cat command. The delimiter EOF can be replaced with any token, and it marks the end of the input block.

VII. View and Save Output Simultaneously

The tee command is useful for displaying output on the screen while also saving it to a file:

command | tee output.txt      # overwrite the file
command | tee -a output.txt   # append to the file

VIII. Handling Buffering Issues

Sometimes, programs buffer their output, causing delays or issues when trying to redirect. The script command can be a solution:

output=$(script -c your_command /dev/null)
echo "$output"

Here, the -c option specifies the command to run, while /dev/null discards any input. The result is captured in the output variable.

Summary Table

Syntax StdOut Visible StdErr Visible StdOut in File StdErr in File Existing File Behavior
> No Yes Yes No Overwrite
>> No Yes Yes No Append
2> Yes No No Yes Overwrite
2>> Yes No No Yes Append
&> No No Yes Yes Overwrite
&>> No No Yes Yes Append
tee Yes Yes Yes No Overwrite
tee -a Yes Yes Yes No Append
\|& tee Yes Yes Yes Yes Overwrite
\|& tee -a Yes Yes Yes Yes Append

Filters

Filters are specialized commands designed to process text, typically working with streams of text data. They are predominantly used with pipes (|) to modify or analyze the output of another command. A filter reads input line by line, transforms it in some way, and then outputs the result. This processing method is particularly useful in Unix-like operating systems, where filters can be combined with other commands in a pipeline to perform complex text transformations and data analysis. Common examples of filters include grep for searching text, sort for arranging lines in a particular order, and awk for pattern scanning and processing. Filters are a fundamental part of command-line data manipulation, allowing users to efficiently process large amounts of text with simple, concise commands.

Common Unix Filters

Examples

I. Combine and sort the content of file1.txt and file2.txt, and redirect the sorted output to sorted.txt:

sort file1.txt file2.txt > sorted.txt

II. Eliminate any adjacent duplicate lines from sorted.txt and save the result in deduped.txt:

uniq sorted.txt > deduped.txt

III. Display lines containing the word "error" from deduped.txt:

grep 'error' deduped.txt

IV. Show lines from deduped.txt that contain the pattern "error", along with the line number:

awk '/error/ {print NR, $0}' deduped.txt

V. Replace all occurrences of 'old_word' with 'new_word' in file.txt:

sed 's/old_word/new_word/g' file.txt

Combining Filters

Filters become even more powerful when combined. By chaining together multiple filters using the pipe (|), you can perform complex text transformations and analyses with a single command.

# Sort the content of a file, eliminate duplicates, and then display only lines containing "error"
cat file.txt | sort | uniq | grep 'error'

Filters are foundational components in the Unix philosophy of creating simple, modular tools that do one job and do it well. When used effectively, they provide powerful text processing capabilities with just a few keystrokes.

Challenges

  1. Find the number of users currently logged in. Hint: Use the who or w command followed by a line count.
  2. Generate a sorted list of all system users. Hint: The /etc/passwd file contains user information.
  3. List .conf filenames in the /etc directory and sort them by string length. You may need to use ls, awk, and sort.
  4. Print the first and seventh columns of the /etc/passwd file. These columns represent the username and the user's shell, respectively.
  5. Display each word from the /etc/fstab file on a separate line, and then count the total number of lines in the file. This file provides information on disk drives and their mount points.
  6. Find out how many users have a unique shell (i.e., they're the only ones using a particular shell). Use /etc/passwd as your source.
  7. From any text file of your choice, identify the ten most frequently occurring words and display their counts.
  8. Examine the /etc/systemd/system directory and list the service files that are currently active on the system.
  9. Search for words in a text file that are longer than 7 characters, contain the letter 'z', and display them sorted in reverse alphabetical order.
  10. Find the top five directories that consume the most disk space in your home directory. Hint: Use the du and sort commands.
  11. Starting from your home directory, list all files (recursively, including subdirectories) that were modified in the last 24 hours, sorted by their modification time.

Table of Contents

  1. Working with Unix Data Streams
    1. Standard Streams
    2. Pipe
      1. Example 1: Filtering User Details
      2. Example 2: Sending Email with Current Date
      3. Advanced Piping
    3. Redirection
      1. I. Redirecting Standard Output
      2. II. Redirecting Standard Error
      3. III. Redirecting Both Standard Output and Error
      4. IV. Redirecting Standard Input
      5. V. Using Input and Output Redirection Together
      6. VI. Here-Documents with <<
      7. VII. View and Save Output Simultaneously
      8. VIII. Handling Buffering Issues
      9. Summary Table
    4. Filters
      1. Common Unix Filters
      2. Examples
      3. Combining Filters
  2. Challenges