Last modified: October 11, 2024

This article is written in: 🇺🇸

Command-Line Stream Editors

sed (Stream Editor) and awk are powerful command-line utilities that originated from Unix and have become indispensable tools in Unix-like operating systems, including Linux and macOS. They are designed for processing and transforming text, allowing users to perform complex text manipulations with simple commands. This guide provides a comprehensive overview of both utilities, including their history, usage, syntax, options, and practical examples.

Sed

Developed in the 1970s by Lee E. McMahon of Bell Labs, sed is a non-interactive stream editor used to perform basic text transformations on an input stream (a file or input from a pipeline). It was designed to support scripting and command-line usage, automating repetitive editing tasks.

Main idea:

Syntax

The basic syntax of sed is:

sed [OPTIONS] 'SCRIPT' INPUTFILE...

How sed Works

User
 |
 | Uses 'sed' with a script and input file(s)
 v
+-------------------------------+
| sed Command                   |
|  - Reads Input Line by Line   |
|  - Applies Script to Each Line|
|  - Outputs Modified Lines     |
+-------------------------------+
 |
 | Outputs to Terminal or File
 v

sed reads the input text line by line, applies the specified commands to each line, and outputs the result. If no input file is specified, sed reads from standard input.

Common Operations with sed

Substitution

The substitution command replaces text matching a pattern with new text.

sed 's/pattern/replacement/flags' inputfile

Common Flags:

Flag Description
g Global replacement in the line.
i Case-insensitive matching.
p Prints the line if a substitution occurred.

Example: Replace 'apple' with 'orange' globally:

sed 's/apple/orange/g' fruits.txt

Deletion

Delete lines matching a pattern or at a specific line number.

sed '/pattern/d' inputfile

Example: Remove empty lines:

sed '/^$/d' file.txt

Explanation:

Insertion and Appending

Insert or append text before or after a line matching a pattern.

Insert (i):

sed '/pattern/i\text to insert' inputfile

Example: Insert a header before the first line:

sed '1i\Header Text' file.txt

Append (a):

sed '/pattern/a\text to append' inputfile

Transformation

Transform characters using the y command, similar to tr.

sed 'y/source/destination/' inputfile

Example: Convert lowercase to uppercase:

sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' file.txt

Advanced sed Techniques

Regular Expressions in sed

sed supports regular expressions for pattern matching.

Metacharacters:

Symbol Description
. Matches any single character.
* Matches zero or more occurrences of the preceding character.
[] Character class; matches any one character inside the brackets.
^ Matches the start of a line.
$ Matches the end of a line.
\ Escapes a metacharacter.

Example: Replace lines starting with 'Error':

sed '/^Error/s/^Error/Warning/' logs.txt

Addressing

Specify lines to apply commands to using line numbers or patterns.

sed 'address command' inputfile

Address Types:

Holding Space

sed has a pattern space and a hold space for complex text manipulations.

Example: Swap adjacent lines:

sed 'N; s/\(.*\)\n\(.*\)/\2\n\1/' file.txt

Explanation:

Practical Examples and Use Cases

I. Replace All Occurrences of a String:

sed 's/old/new/g' file.txt

II. Delete Lines Containing a Pattern:

sed '/unwanted_pattern/d' file.txt

III. Insert Text After a Line Matching a Pattern:

sed '/pattern/a\New line of text' file.txt

IV. Edit Files In-Place with Backup:

sed -i.bak 's/foo/bar/g' file.txt

-i.bak edits the file in-place and creates a backup with .bak extension.

V. Change Delimiters in CSV Files:

sed 's/,/|/g' data.csv > data.psv

Converts comma-separated values to pipe-separated values.

Tips and Best Practices

Example:

sed -E 's/([0-9]{3})-([0-9]{2})-([0-9]{4})/XXX-XX-\3/' ssn.txt

Awk

Developed in the 1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan at Bell Labs (hence the name awk), awk is a powerful text-processing language. It is designed for data extraction and reporting, offering a programming language with C-like syntax and features.

Main idea:

Syntax

The basic syntax of awk is:

awk 'PATTERN { ACTION }' INPUTFILE

How awk Works

User
 |
 | Uses 'awk' with a program and input file(s)
 v
+-------------------------------+
| awk Command                   |
|  - Reads Input Line by Line   |
|  - Splits Line into Fields    |
|  - Applies Pattern and Action |
+-------------------------------+
 |
 | Outputs to Terminal or File
 v

awk reads the input file line by line, splits each line into fields based on a delimiter (default is whitespace), and then executes the specified actions on lines matching the pattern.

Common Operations with awk

Field and Record Processing

Example: Print the first and third fields:

awk '{ print $1, $3 }' data.txt

Patterns and Actions

Execute actions only on lines matching a pattern.

Example: Print lines where the second field equals 'Error':

awk '$2 == "Error" { print }' logs.txt

Variables and Operators

awk supports arithmetic and string operations.

Example: Sum values in the third field:

awk '{ sum += $3 } END { print "Total:", sum }' data.txt

Advanced awk Techniques

Control Structures

awk supports if, while, for, and other control structures.

Example: Conditional Processing

awk '{
  if ($3 > 100) {
    print $1, $2, "High"
  } else {
    print $1, $2, "Low"
  }
}' data.txt

Built-in Functions

awk provides numerous built-in functions for mathematical and string operations.

String Functions:

Function Description
length(str) Returns the length of str.
substr(str, start, length) Extracts substring.
tolower(str) Converts to lowercase.
toupper(str) Converts to uppercase.

Example: Convert the second field to uppercase:

awk '{ $2 = toupper($2); print }' data.txt

User-Defined Functions

Define custom functions for reuse.

Example: Define a function to calculate the square:

awk 'function square(x) { return x * x }
     { print $1, square($2) }' data.txt

Practical Examples and Use Cases

I. Calculate Average of a Column:

awk '{ total += $3; count++ } END { print "Average:", total/count }' data.txt

II. Filter Rows Based on Field Value:

awk '$4 >= 50 { print }' scores.txt

III. Reformat Output:

awk '{ printf "%-10s %-10s %5.2f\n", $1, $2, $3 }' data.txt

Formats output with fixed-width columns and two decimal places.

IV. Count Occurrences of Unique Values:

awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' words.txt

V. Process Delimited Data with Custom Field Separator:

awk -F ':' '{ print $1, $3 }' /etc/passwd

Tips and Best Practices

Example:

sed -E 's/([0-9]{3})-([0-9]{2})-([0-9]{4})/XXX-XX-\3/' ssn.txt

Example:

awk 'BEGIN { print "Start Processing" } { print $0 } END { print "End Processing" }' file.txt

Challenges

  1. Research and describe the core differences between sed and awk, focusing on their primary functionalities. Compare how each tool handles text streams and structured data, and discuss when it might be more appropriate to use sed versus awk.
  2. Describe the sequence of operations sed performs on a text stream, including how it reads input, processes it in the pattern space, and outputs results. Explain the purpose of the pattern space and how sed uses it to manage the text transformations on each line.
  3. By default, awk uses whitespace as a delimiter to separate columns. Explain how to modify this default setting to use other delimiters, such as a comma (,) or a colon (:). Provide examples demonstrating how to set these delimiters with the -F option in awk.
  4. Demonstrate how to extract data from multiple columns in awk with a specific example. Show how to retrieve data from the first, third, and fifth columns simultaneously, and explain how this syntax differs from extracting each column individually.
  5. Use awk to filter lines where a particular column does not match a specific pattern. Provide an example that demonstrates this process, such as displaying lines from a file where the second column does not contain the word "error."
  6. Explain how awk can perform data aggregation tasks, such as calculating the sum of values in a specific column. Provide an example of using awk to read a file with multiple rows of numeric data and compute the sum of all values in a given column.
  7. Show how to use sed to replace all occurrences of a word in a text file with another word. Demonstrate how sed can be used both to perform a global replacement within each line and to limit replacements to the first occurrence of the word on each line.
  8. Explain how to use awk to format and print data in a specific way. For instance, given a file with names and scores, demonstrate how you could use awk to print the names and scores in a formatted table with aligned columns.
  9. Use sed to delete lines containing a specific pattern from a text file. Describe the command you used and explain how sed processes the file to selectively remove lines based on pattern matching.
  10. Combine sed and awk in a pipeline to perform more complex text transformations. For example, use sed to remove blank lines from a file, and then use awk to calculate the average of numeric values in a specific column. Explain how combining these tools in a pipeline can solve more advanced text processing tasks.

Table of Contents

    Command-Line Stream Editors
    1. Sed
      1. Syntax
      2. How sed Works
      3. Common Operations with sed
    2. Advanced sed Techniques
      1. Practical Examples and Use Cases
      2. Tips and Best Practices
    3. Awk
      1. Syntax
      2. How awk Works
      3. Common Operations with awk
      4. Advanced awk Techniques
      5. Practical Examples and Use Cases
      6. Tips and Best Practices
    4. Challenges