Last modified: October 11, 2024
This article is written in: 🇺🇸
Command-Line Stream Editors
sed
(Stream Editor) and awk
are powerful command-line utilities that originated from Unix and have become indispensable tools in Unix-like operating systems, including Linux and macOS. They are designed for processing and transforming text, allowing users to perform complex text manipulations with simple commands. This guide provides a comprehensive overview of both utilities, including their history, usage, syntax, options, and practical examples.
Sed
Developed in the 1970s by Lee E. McMahon of Bell Labs, sed
is a non-interactive stream editor used to perform basic text transformations on an input stream (a file or input from a pipeline). It was designed to support scripting and command-line usage, automating repetitive editing tasks.
Main idea:
- Performs editing operations automatically without user interaction.
- Processes input line by line, making it efficient for large files.
- Supports powerful pattern matching using regular expressions.
- Allows the use of scripts for complex editing tasks.
Syntax
The basic syntax of sed
is:
sed [OPTIONS] 'SCRIPT' INPUTFILE...
- OPTIONS are used to modify the behavior of
sed
, allowing customization of how the command processes and interprets input. - The SCRIPT consists of one or more editing commands that
sed
applies to the input, defining specific transformations or modifications to be executed. - INPUTFILE refers to one or more files that
sed
processes, enabling batch or single-file editing based on the specified script commands.
How sed
Works
User
|
| Uses 'sed' with a script and input file(s)
v
+-------------------------------+
| sed Command |
| - Reads Input Line by Line |
| - Applies Script to Each Line|
| - Outputs Modified Lines |
+-------------------------------+
|
| Outputs to Terminal or File
v
sed
reads the input text line by line, applies the specified commands to each line, and outputs the result. If no input file is specified, sed
reads from standard input.
Common Operations with sed
Substitution
The substitution command replaces text matching a pattern with new text.
sed 's/pattern/replacement/flags' inputfile
- The
s
command initiates a substitution, signalingsed
to replace matched patterns within the input. - The
pattern
represents the regular expression or specific text thatsed
searches for in each line. - The
replacement
is the text that replaces any matches found by the pattern, allowing for content modification. flags
are optional modifiers that alter the behavior of the substitution, such as making it global withg
or applying other specific actions.
Common Flags:
Flag | Description |
g |
Global replacement in the line. |
i |
Case-insensitive matching. |
p |
Prints the line if a substitution occurred. |
Example: Replace 'apple' with 'orange' globally:
sed 's/apple/orange/g' fruits.txt
Deletion
Delete lines matching a pattern or at a specific line number.
sed '/pattern/d' inputfile
Example: Remove empty lines:
sed '/^$/d' file.txt
Explanation:
^$
matches empty lines.d
deletes the matching lines.
Insertion and Appending
Insert or append text before or after a line matching a pattern.
Insert (i
):
sed '/pattern/i\text to insert' inputfile
Example: Insert a header before the first line:
sed '1i\Header Text' file.txt
Append (a
):
sed '/pattern/a\text to append' inputfile
Transformation
Transform characters using the y
command, similar to tr
.
sed 'y/source/destination/' inputfile
Example: Convert lowercase to uppercase:
sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' file.txt
Advanced sed
Techniques
Regular Expressions in sed
sed
supports regular expressions for pattern matching.
Metacharacters:
Symbol | Description |
. |
Matches any single character. |
* |
Matches zero or more occurrences of the preceding character. |
[] |
Character class; matches any one character inside the brackets. |
^ |
Matches the start of a line. |
$ |
Matches the end of a line. |
\ |
Escapes a metacharacter. |
Example: Replace lines starting with 'Error':
sed '/^Error/s/^Error/Warning/' logs.txt
Addressing
Specify lines to apply commands to using line numbers or patterns.
sed 'address command' inputfile
Address Types:
- Using single line deletion, the command
sed '5d' file.txt
deletes only line 5 from the specified file. - For line range deletion, the command
sed '2,4d' file.txt
removes lines 2 through 4, allowing targeted multi-line deletion. - With a pattern range, the command
sed '/start/,/end/d' file.txt
deletes all lines from the first occurrence of 'start' up to and including 'end', enabling deletion based on matching patterns.
Holding Space
sed
has a pattern space and a hold space for complex text manipulations.
Example: Swap adjacent lines:
sed 'N; s/\(.*\)\n\(.*\)/\2\n\1/' file.txt
Explanation:
N
reads the next line into the pattern space.- The
s
command swaps the two lines.
Practical Examples and Use Cases
I. Replace All Occurrences of a String:
sed 's/old/new/g' file.txt
II. Delete Lines Containing a Pattern:
sed '/unwanted_pattern/d' file.txt
III. Insert Text After a Line Matching a Pattern:
sed '/pattern/a\New line of text' file.txt
IV. Edit Files In-Place with Backup:
sed -i.bak 's/foo/bar/g' file.txt
-i.bak
edits the file in-place and creates a backup with .bak
extension.
V. Change Delimiters in CSV Files:
sed 's/,/|/g' data.csv > data.psv
Converts comma-separated values to pipe-separated values.
Tips and Best Practices
- Enclose scripts in single quotes to prevent shell interpretation.
- Use backslashes to escape characters like
/
,&
, and\
. - Use without
-i
to test commands before modifying files in-place. - With
-E
(orsed -r
in GNUsed
), you can use extended regex syntax.
Example:
sed -E 's/([0-9]{3})-([0-9]{2})-([0-9]{4})/XXX-XX-\3/' ssn.txt
Awk
Developed in the 1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan at Bell Labs (hence the name awk
), awk
is a powerful text-processing language. It is designed for data extraction and reporting, offering a programming language with C-like syntax and features.
Main idea:
- Treats each line as a record and fields separated by delimiters.
- Executes actions based on pattern matches.
- Supports variables, arrays, and control flow statements.
- Provides functions for string manipulation, arithmetic, and more.
- Allows user-defined functions.
Syntax
The basic syntax of awk
is:
awk 'PATTERN { ACTION }' INPUTFILE
- The PATTERN represents a regular expression or condition that
awk
uses to identify matching lines or data segments within the input. - The ACTION includes the commands that
awk
executes whenever it finds a match for the specified pattern, determining how matching data is processed. - The INPUTFILE is the file that
awk
processes, allowing it to apply the pattern and action rules across the contents of the specified file.
How awk
Works
User
|
| Uses 'awk' with a program and input file(s)
v
+-------------------------------+
| awk Command |
| - Reads Input Line by Line |
| - Splits Line into Fields |
| - Applies Pattern and Action |
+-------------------------------+
|
| Outputs to Terminal or File
v
awk
reads the input file line by line, splits each line into fields based on a delimiter (default is whitespace), and then executes the specified actions on lines matching the pattern.
Common Operations with awk
Field and Record Processing
- Fields are individual data segments within a line, accessed using positional variables like
$1
,$2
, and so forth up to$NF
, where$NF
represents the total number of fields. - Records refer to each line within the input, with each record accessible using the
NR
variable, which denotes the current record number in the sequence.
Example: Print the first and third fields:
awk '{ print $1, $3 }' data.txt
Patterns and Actions
Execute actions only on lines matching a pattern.
Example: Print lines where the second field equals 'Error':
awk '$2 == "Error" { print }' logs.txt
Variables and Operators
awk
supports arithmetic and string operations.
Example: Sum values in the third field:
awk '{ sum += $3 } END { print "Total:", sum }' data.txt
Advanced awk
Techniques
Control Structures
awk
supports if
, while
, for
, and other control structures.
Example: Conditional Processing
awk '{
if ($3 > 100) {
print $1, $2, "High"
} else {
print $1, $2, "Low"
}
}' data.txt
Built-in Functions
awk
provides numerous built-in functions for mathematical and string operations.
String Functions:
Function | Description |
length(str) |
Returns the length of str . |
substr(str, start, length) |
Extracts substring. |
tolower(str) |
Converts to lowercase. |
toupper(str) |
Converts to uppercase. |
Example: Convert the second field to uppercase:
awk '{ $2 = toupper($2); print }' data.txt
User-Defined Functions
Define custom functions for reuse.
Example: Define a function to calculate the square:
awk 'function square(x) { return x * x }
{ print $1, square($2) }' data.txt
Practical Examples and Use Cases
I. Calculate Average of a Column:
awk '{ total += $3; count++ } END { print "Average:", total/count }' data.txt
II. Filter Rows Based on Field Value:
awk '$4 >= 50 { print }' scores.txt
III. Reformat Output:
awk '{ printf "%-10s %-10s %5.2f\n", $1, $2, $3 }' data.txt
Formats output with fixed-width columns and two decimal places.
IV. Count Occurrences of Unique Values:
awk '{ count[$1]++ } END { for (word in count) print word, count[word] }' words.txt
V. Process Delimited Data with Custom Field Separator:
awk -F ':' '{ print $1, $3 }' /etc/passwd
:
sets the field separator to colon.- Prints username and UID.
Tips and Best Practices
- Enclose scripts in single quotes to prevent shell interpretation.
- Use backslashes to escape characters like
/
,&
, and\
. - Use without
-i
to test commands before modifying files in-place. - With
-E
(orsed -r
in GNUsed
), you can use extended regex syntax.
Example:
sed -E 's/([0-9]{3})-([0-9]{2})-([0-9]{4})/XXX-XX-\3/' ssn.txt
- Ensure variables are initialized to avoid unexpected results.
- Use
-F
to set custom field separators. - Use
BEGIN
andEND
Blocks for actions before processing starts or after it ends.
Example:
awk 'BEGIN { print "Start Processing" } { print $0 } END { print "End Processing" }' file.txt
Challenges
- Research and describe the core differences between
sed
andawk
, focusing on their primary functionalities. Compare how each tool handles text streams and structured data, and discuss when it might be more appropriate to usesed
versusawk
. - Describe the sequence of operations
sed
performs on a text stream, including how it reads input, processes it in the pattern space, and outputs results. Explain the purpose of the pattern space and howsed
uses it to manage the text transformations on each line. - By default,
awk
uses whitespace as a delimiter to separate columns. Explain how to modify this default setting to use other delimiters, such as a comma (,
) or a colon (:
). Provide examples demonstrating how to set these delimiters with the-F
option inawk
. - Demonstrate how to extract data from multiple columns in
awk
with a specific example. Show how to retrieve data from the first, third, and fifth columns simultaneously, and explain how this syntax differs from extracting each column individually. - Use
awk
to filter lines where a particular column does not match a specific pattern. Provide an example that demonstrates this process, such as displaying lines from a file where the second column does not contain the word "error." - Explain how
awk
can perform data aggregation tasks, such as calculating the sum of values in a specific column. Provide an example of usingawk
to read a file with multiple rows of numeric data and compute the sum of all values in a given column. - Show how to use
sed
to replace all occurrences of a word in a text file with another word. Demonstrate howsed
can be used both to perform a global replacement within each line and to limit replacements to the first occurrence of the word on each line. - Explain how to use
awk
to format and print data in a specific way. For instance, given a file with names and scores, demonstrate how you could useawk
to print the names and scores in a formatted table with aligned columns. - Use
sed
to delete lines containing a specific pattern from a text file. Describe the command you used and explain howsed
processes the file to selectively remove lines based on pattern matching. - Combine
sed
andawk
in a pipeline to perform more complex text transformations. For example, usesed
to remove blank lines from a file, and then useawk
to calculate the average of numeric values in a specific column. Explain how combining these tools in a pipeline can solve more advanced text processing tasks.