Awk - HowTo's

mail

Awk : how to detect the last line of input ?

This hack is based on the facts that :
display the first line (remember : { print $0 } is the default action)
seq 0 9 | awk 'NR==1'
display the last line
seq 0 9 | awk 'END {print $0}'
our test matrix
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done
display the full line 1
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1'
display the line 1, field 3
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1 {print $3}'
display the full last line
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $0}'
display the last line, field 4
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $4}'
mail

Awk : how to alter fields with gensub ?

Usage

Like gsub, gensub allows regex-based search and replace. Some of its notable differences are :
gensub(regexp, "replacement", how [, target])

Example

Basic examples :

echo 'hello world' | awk '{ result=gensub(/o/, "O", 1); print result; }'
hellO world
echo 'hello world' | awk '{ result=gensub(/o/, "O", "g"); print result; }'
hellO wOrld

The variable name passed to print does NOT need a leading $.

A not-so-basic example :

On lines starting with a vowel, change the 3rd character of the 2nd word into an X :
echo -e 'alpha bravo\ncharlie delta\necho foxtrot' | awk '/^[aeiouy]/ { result=gensub(/^(..).(.*)/, "\\1X\\2", 1, $2); print $1" "result; next; } { print }'
alpha brXvo
charlie delta
echo foXtrot
mail

Awk : how to alter fields with gsub ?

Usage

gsub(regexp, replacement [, target])

Example

replace all o with 0 :
echo 'hello world' | awk '{ gsub(/o/, "0"); print }'
hell0 w0rld
replace all e with E in the 2nd word only :
echo 'happy halloween' | awk '{ gsub(/e/, "E", $2); print }'
happy hallowEEn
remove square brackets [] :
echo '[foo]' | awk '{ gsub(/[\[\]]/, ""); print }'
foo
mail

How to count lines matching a regex ?

The basic command everybody knows :
grep -E '<article id=‌"[^"]' *xml | wc -l
This construct is relevant for several files only. Even though "it works", applying it to a single file is a UUOW .
An alternate awk-based "pipe-less" solution :
awk 'BEGIN {i=0;} /<article id="[^"]/ {i++;} END { print i;}' *xml
mail

How to return a different exit code stating match found / match not found ?

echo -e 'foo\nbar\nbaz' | awk '/arf/{found=1} END{exit !found}'; echo $?
1
echo -e 'foo\nbar\nbaz' | awk '/bar/{found=1} END{exit !found}'; echo $?
0

Explanations

Here is how Awk processes :
  1. read the first line of input and search for a match
  2. if no match found on the current line, read the next line
  3. if a match is found, raise the specified flag : found=1 (use whatever name and value you like )
  4. when all lines of input have been read, continue to the END section, and process instructions found there
  5. exit terminates Awk, returning the specified exit code
  6. since, for Unix exit codes "success" is said 0, we negate the flag value with !
mail

How to match strings across lines ?

Situation

How can I make sure a text file has :
(unknown number of lines before)

(some text before)EXPECTED_TAG1(some text after)

(unknown number of lines between)

(some other text before)EXPECTED_TAG2(some other text after)

(unknown number of lines after)

so far, grep-ing EXPECTED_TAG1, then grep-ing EXPECTED_TAG2 was not an acceptable solution

Solution

echo -e 'ga\nbu\nzo\nmeu' | awk -v RS='u' '/a.b/ {print $0}'
ga
b
echo -e 'ga\nbu\nzo\nmeu' | awk 'BEGIN {RS="u"} /a.b/ {print $0}'
ga
b
echo -e 'Super\ncali\nfragi\nlisti\ncexpia\nlido\ncious' | awk 'BEGIN {RS=".i"} {print $0}'
Super
ca

fra


s

cex
a

do

ous

Alternate solution

Alternate "low-tech" solution :
  1. convert the input into a single giant line of text with tr
  2. search strings as usual with your favorite tool
mail

How to detect duplicate fields in a file ?

Situation

I have :
1;unique1
2;duplicate1
3;unique2
4;duplicate1
5;unique3
6;duplicate2
7;duplicate2
8;unique4
9;duplicate1
I want :
Anything stating whether some values are duplicated across lines

Solution

echo -e '1;unique1\n2;duplicate1\n3;unique2\n4;duplicate1\n5;unique3\n6;duplicate2\n7;duplicate2\n8;unique4\n9;duplicate1' | awk -F';' 'seen[$2]++ {print $2}'
duplicate1
duplicate2
duplicate1
This snippet only displays duplicates. Anything becomes a duplicate if it's already been seen once, which is why something that exists n times is displayed n-1 times.
mail

How to filter lines having duplicate fields ?

Situation

I have :
4;unique_1
3;duplicate
2;duplicate
1;unique_2
I want :
4;unique_1
3;duplicate
1;unique_2

Solution

echo -e '4;unique_1\n3;duplicate\n2;duplicate\n1;unique_2' | awk -F ';' '!seen[$2]++'

(source)

Details

This command is telling Awk which lines to print : So, for each line of the file, the node of the array seen is incremented and the line is printed if the content of that node was not (!) previously set.

Alternate solution

(source)
mail

How to filter fields to print ?

print exclude
2 fields out of several
echo {1..9} | awk '{ print $3" "$4 }'
3 4
echo {1..9} | awk '{ $3=$4=""; print }'
1 2   5 6 7 8 9
several fields out of many
echo {1..999} | awk '{ for (i=123; i<=127; i++) printf $i" "; print ""}'
123 124 125 126 127
echo {1..30} | awk '{ for (i=10; i<=19; i++) $i=""; print $0 }'
1 2 3 4 5 6 7 8 9           20 21 22 23 24 25 26 27 28 29 30
mail

How to use another character instead of / as the regular expression delimiter ?

Situation

The syntax of Awk commands is :

awk '/a regular expression/ {deal with it}' myFile

Since the regular expression is /-delimited, we have to escape those found in the regular expression itself :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/\/[a-z]\// {print}'

It it sometimes possible to workaround this :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/.[a-z]./ {print}'
But since it changes the meaning of the regular expression, it is not always applicable / advisable.

Solution

Awk has no parameter to specify the regular expression delimiter character, but we can use a variable to preserve readability :

echo -e '/foo/foo/\n/foo/bar/\n/bar/bar/\n/foo/baz/' | awk 'BEGIN {myRegex = "/foo/.a./"} $0 ~ myRegex {print}'

mail

How to round floating numbers ?

Here's a very basic example showing the amount of RAM installed, in GiB :

awk '/MemTotal:/ {printf "%.0fGiB\n", $2/1024/1024}' /proc/meminfo
mail

How to customize the display of data fields ?

Let's say I want to display the name + version + architecture + description of installed packages matching lsb-*. And I also want the header line so that it looks pretty. To do so :
  1. dpkg -l lsb-*
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/	Name		Version		Architecture	Description
    +++===================================================================================================
    ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
    un	lsb-core	<none>		<none>		(no description available)				not installed
    ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility
  2. I want installed packages only, so let's start filtering : dpkg -l lsb-* | awk '/^(ii|\|\|)/ {print $0}'
    ||/	Name		Version		Architecture	Description
    ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
    ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility
  3. Looks better. Now I'd like to get rid of the leading ii . Here come the magic : fieldSeparator='|'; dpkg -l lsb-* | awk '/^(ii|\|\|)/ { for (i=2; i<=4; i++) printf $i"'$fieldSeparator'"; for (i=5; i<=NF; i++) printf $i" "; print ""}' | column -s "$fieldSeparator" -t
    Name		Version		Architecture	Description
    lsb-base	9.20161125	all		Linux Standard Base init script functionality
    lsb-release	9.20161125	all		Linux Standard Base version reporting utility
    How it works :
    • All lines returned by dpkg are made of several fields : field 1 is ii , field 2 is the name, field 3 is the version, field 4 is the architecture and fields 5 to NF are the description
    • All description fields have a different number of words, so NF is different for each returned line
    • the for (i=2; i<=4; i++) printf $i"'$fieldSeparator'" command prints name + version + architecture (with a separator)
    • the for (i=5; i<=NF; i++) printf $i" " command prints the description as a single data field (no separator added)
    • all of this is fed into column for a nice tabular display and voilà!
mail

How to display the nth line after / before a match ?

Display only the nth line after the one matching pattern (source) :

awk '/pattern/ { x = NR + n } NR == x' someFile

for i in {0..9}; do echo "line $i"; done | awk '/line 4/ {lineToDisplay = NR + 3} NR == lineToDisplay'

line 7

Display only the nth line before the one matching pattern :

So far, I don't know whether this is possible with Awk. I think this may possible with sed. And a basic Bash solution would be :

for i in {0..9}; do echo "line $i"; done | grep -B3 'line 4' | head -1