Awk - Httqm's Docs

This can be done with awk's -i flag :

awk -i inplace '[awk instructions]' myFile

But don't forget that this replaces myFile with the output of the awk execution —which is not exactly equivalent to altering a file inplace. As a result, anything that is not explicitly printed by awk will be missing in the output file.
Example :

cat << EOF > test.txt
line1
hello world
line3EOF
awk -i inplace '/hello/ { print "hello everybody"}' test.txt; cat test.txt

hello everybody

This hack is based on the facts that :

Awk loops on all lines of input
at each iteration, Awk stores the contents of each line into its $0 variable
THEN —when present— the END clause is executed. At that time, the value of $0 is the value of the previous line, i.e. the last line of input.

display the first line (remember : { print $0 } is the default action): seq 0 9 | awk 'NR==1'
display the last line: seq 0 9 | awk 'END {print $0}'
our test matrix: for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done
display the full line 1: for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1'
display the line 1, field 3: for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1 {print $3}'
display the full last line: for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $0}'
display the last line, field 4: for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $4}'

Usage

Like gsub, gensub allows regex-based search and replace. Some of its notable differences are :

the replacement string can refer to subparts of the matched string using callbacks
gensub returns the modified string as the result of the function. The original target string is not changed. So you'll have to explicitly store the result in a variable and print it.

gensub(regexp, "replacement", how [, target])

regexp : explicit
replacement : callbacks are \1 to \9. Since replacement is a quoted string, the \ characters need to be escaped, and callbacks will appear as \\n
how :
- if it starts with g or G (i.e. "global") : replace all matches
- otherwise, consider it as a number indicating which match of regexp to replace. If < 1, assume 1.
target : the field to search+replace into. If missing, assume $0.

Example

Basic examples :

echo 'hello world' | awk '{ result=gensub(/o/, "O", 1); print result; }'

hellO world

echo 'hello world' | awk '{ result=gensub(/o/, "O", "g"); print result; }'

hellO wOrld

The variable name passed to print does NOT need a leading $.

A not-so-basic example :

On lines starting with a vowel, change the 3rd character of the 2nd word into an X :

echo -e 'alpha bravo\ncharlie delta\necho foxtrot' | awk '/^[aeiouy]/ { result=gensub(/^(..).(.*)/, "\\1X\\2", 1, $2); print $1" "result; next; } { print }'

alpha brXvo
charlie delta
echo foXtrot

Usage

gsub(regexp, replacement [, target])

Example

replace all o with 0 :: echo 'hello world' | awk '{ gsub(/o/, "0"); print }'

hell0 w0rld
replace all e with E in the 2^nd word only :: echo 'happy halloween' | awk '{ gsub(/e/, "E", $2); print }'

happy hallowEEn
remove square brackets [] :: echo '[foo]' | awk '{ gsub(/[\[\]]/, ""); print }'

foo

The basic command everybody knows :: grep -E '<article id=‌"[^"]' *xml | wc -l

This construct is relevant for several files only. Even though "it works", applying it to a single file is a UUOW .
An alternate awk-based "pipe-less" solution :: awk 'BEGIN {i=0;} /<article id="[^"]/ {i++;} END { print i;}' *xml

echo -e 'foo\nbar\nbaz' | awk '/arf/{found=1} END{exit !found}'; echo $?

echo -e 'foo\nbar\nbaz' | awk '/bar/{found=1} END{exit !found}'; echo $?

Explanations

Here is how Awk processes :

read the first line of input and search for a match
if no match found on the current line, read the next line
if a match is found, raise the specified flag : found=1 (use whatever name and value you like )
when all lines of input have been read, continue to the END section, and process instructions found there
exit terminates Awk, returning the specified exit code
since, for Unix exit codes "success" is said 0, we negate the flag value with !

Situation

How can I make sure a text file has :

(unknown number of lines before)

(some text before)EXPECTED_TAG1(some text after)

(unknown number of lines between)

(some other text before)EXPECTED_TAG2(some other text after)

(unknown number of lines after)

so far, grep-ing EXPECTED_TAG1, then grep-ing EXPECTED_TAG2 was not an acceptable solution

Solution

echo -e 'ga\nbu\nzo\nmeu' | awk -v RS='u' '/a.b/ {print $0}'
ga
b

echo -e 'ga\nbu\nzo\nmeu' | awk 'BEGIN {RS="u"} /a.b/ {print $0}'
ga
b


echo -e 'Super\ncali\nfragi\nlisti\ncexpia\nlido\ncious' | awk 'BEGIN {RS=".i"} {print $0}'
Super
ca

fra


s

cex
a

do

ous

Alternate solution

Alternate "low-tech" solution :

convert the input into a single giant line of text with tr
search strings as usual with your favorite tool

Situation

I have :

1;unique1
2;duplicate1
3;unique2
4;duplicate1
5;unique3
6;duplicate2
7;duplicate2
8;unique4
9;duplicate1

I want :

Anything stating whether some values are duplicated across lines

Solution

echo -e '1;unique1\n2;duplicate1\n3;unique2\n4;duplicate1\n5;unique3\n6;duplicate2\n7;duplicate2\n8;unique4\n9;duplicate1' | awk -F';' 'seen[$2]++ {print $2}'

duplicate1
duplicate2
duplicate1

This snippet only displays duplicates. Anything becomes a duplicate if it's already been seen once, which is why something that exists n times is displayed n-1 times.

Situation

I have :

4;unique_1
3;duplicate
2;duplicate
1;unique_2

I want :

4;unique_1
3;duplicate
1;unique_2

Solution

echo -e '4;unique_1\n3;duplicate\n2;duplicate\n1;unique_2' | awk -F ';' '!seen[$2]++'

(source)

Details

This command is telling Awk which lines to print :

$2 is the 2^nd field of each processed line (here : the one having duplicates)
seen[n] is the item with index n within the seen array (the array name is arbitrary and was chosen for readability)
++ is performed after ! (source)

So, for each line of the file, the node of the array seen is incremented and the line is printed if the content of that node was not (!) previously set.

Alternate solution

echo -e '4;unique_1\n3;duplicate\n2;duplicate\n1;unique_2' | sort -t ';' -u k2 | sort -n r k1
To keep the original lines order, prepend line numbers (with cat -n), and remove them (with cut -f 2-) :
- ```
cat -n << EOF | sort -t ';' -uk2 | sort -nrk1 | cut -f 2-
keepMe;unique_1
keepMe;duplicate
keepMe;duplicate
keepMe;unique_2
EOF
```
- tmpFile=$(mktemp); echo -e 'keepMe;unique_1\nkeepMe;duplicate\nkeepMe;duplicate\nkeepMe;unique_2' > "$tmpFile"; cat -n "$tmpFile" | sort -t ';' -uk2 | sort -nrk1 | cut -f 2-; rm "$tmpFile"

(source)

	print	exclude
2 fields out of several	echo {1..9} \| awk '{ print $3" "$4 }' 3 4	echo {1..9} \| awk '{ $3=$4=""; print }' 1 2 5 6 7 8 9
several fields out of many	echo {1..999} \| awk '{ for (i=123; i<=127; i++) printf $i" "; print ""}' 123 124 125 126 127	echo {1..30} \| awk '{ for (i=10; i<=19; i++) $i=""; print $0 }' 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29 30

Situation

The syntax of Awk commands is :

awk '/a regular expression/ {deal with it}' myFile

Since the regular expression is /-delimited, we have to escape those found in the regular expression itself :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/\/[a-z]\// {print}'

It it sometimes possible to workaround this :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/.[a-z]./ {print}'

But since it changes the meaning of the regular expression, it is not always applicable / advisable.

Solution

Awk has no parameter to specify the regular expression delimiter character, but we can use a variable to preserve readability :

echo -e '/foo/foo/\n/foo/bar/\n/bar/bar/\n/foo/baz/' | awk 'BEGIN {myRegex = "/foo/.a./"} $0 ~ myRegex {print}'

Here's a very basic example showing the amount of RAM installed, in GiB :

awk '/MemTotal:/ {printf "%.0fGiB\n", $2/1024/1024}' /proc/meminfo

Let's say I want to display the name + version + architecture + description of installed packages matching lsb-*. And I also want the header line so that it looks pretty. To do so :

dpkg -l lsb-*

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/	Name		Version		Architecture	Description
+++===================================================================================================
ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
un	lsb-core	<none>		<none>		(no description available)				not installed
ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility

I want installed packages only, so let's start filtering : dpkg -l lsb-* | awk '/^(ii|\|\|)/ {print $0}'

||/	Name		Version		Architecture	Description
ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility

Looks better. Now I'd like to get rid of the leading ii . Here comes the magic : fieldSeparator='|'; dpkg -l lsb-* | awk '/^(ii|\|\|)/ { for (i=2; i<=4; i++) printf $i"'$fieldSeparator'"; for (i=5; i<=NF; i++) printf $i" "; print ""}' | column -s "$fieldSeparator" -t
```
Name		Version		Architecture	Description
lsb-base	9.20161125	all		Linux Standard Base init script functionality
lsb-release	9.20161125	all		Linux Standard Base version reporting utility
```
How it works :
- All lines returned by dpkg are made of several fields : field 1 is ii , field 2 is the name, field 3 is the version, field 4 is the architecture and fields 5 to NF are the description
- All description fields have a different number of words, so NF is different for each returned line
- the for (i=2; i<=4; i++) printf $i"'$fieldSeparator'" command prints name + version + architecture (with a separator)
- the for (i=5; i<=NF; i++) printf $i" " command prints the description as a single data field (no separator added)
- all of this is fed into column for a nice tabular display and voilà!

Display only the n^th line after the one matching pattern (source) :

awk '/pattern/ { x = NR + n } NR == x' someFile

This construct assumes pattern matches only once.
The magic in this solution is that { print $0 } is the default action.

for i in {0..9}; do echo "line $i"; done | awk '/line 4/ {lineToDisplay = NR + 3} NR == lineToDisplay'

line 7

Display only the n^th line before the one matching pattern :

So far, I don't know whether this is possible with Awk. I think this may possible with sed. And a basic Bash solution would be :

for i in {0..9}; do echo "line $i"; done | grep -B3 'line 4' | head -1

Awk - HowTo's

Is it possible to alter files inplace with awk like sed -i does ?

Awk : how to detect the last line of input ?

Awk : how to alter fields with gensub ?

Usage

Example

Basic examples :

A not-so-basic example :

Awk : how to alter fields with gsub ?

Usage

Example

How to count lines matching a regex ?

How to return a different exit code stating match found / match not found ?

Explanations

How to match strings across lines ?

Situation

Solution

Alternate solution

How to detect duplicate fields in a file ?

Situation

Solution

How to filter lines having duplicate fields ?

Situation

Solution

Details

Alternate solution

How to filter fields to print ?

How to use another character instead of `/` as the regular expression delimiter ?

Situation

Solution

How to round floating numbers ?

Here's a very basic example showing the amount of RAM installed, in GiB :

How to customize the display of data fields ?

How to display the n^th line after / before a match ?

Display only the n^th line after the one matching pattern (source) :

Display only the n^th line before the one matching pattern :

Is it possible to alter files inplace with awk like sed -i does ?

Awk : how to detect the last line of input ?

Awk : how to alter fields with gensub ?

Usage

Example

Basic examples :

A not-so-basic example :

Awk : how to alter fields with gsub ?

Usage

Example

How to count lines matching a regex ?

How to return a different exit code stating match found / match not found ?

Explanations

How to match strings across lines ?

Situation

Solution

Alternate solution

How to detect duplicate fields in a file ?

Situation

Solution

How to filter lines having duplicate fields ?

Situation

Solution

Details

Alternate solution

How to filter fields to print ?

How to use another character instead of / as the regular expression delimiter ?

Situation

Solution

How to round floating numbers ?

Here's a very basic example showing the amount of RAM installed, in GiB :

How to customize the display of data fields ?

How to display the nth line after / before a match ?

Display only the nth line after the one matching pattern (source) :

Display only the nth line before the one matching pattern :

How to use another character instead of `/` as the regular expression delimiter ?

How to display the n^th line after / before a match ?

Display only the n^th line after the one matching pattern (source) :

Display only the n^th line before the one matching pattern :