Awk - Description, flags and examples

Awk : how to detect the last line of input ?

This hack is based on the facts that :
display the first line (remember : { print $0 } is the default action)
seq 0 9 | awk 'NR==1'
display the last line
seq 0 9 | awk 'END {print $0}'
our test matrix
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done
display the full line 1
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1'
display the line 1, field 3
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'NR==1 {print $3}'
display the full last line
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $0}'
display the last line, field 4
for i in {1..5}; do for j in {1..5}; do echo -n "L${i}C$j "; done; echo; done | awk 'END {print $4}'

Awk : how to alter fields with gsub ?

Usage :

gsub(regexp, replacement [, target])

Example :

echo 'abcd:' | awk '{ gsub(/:/, "", $1); print }'
abcd

Awk : print

How to count lines matching a regex ?

The basic command everybody knows :
grep -E '<article id=‌"[^"]' *xml | wc -l
This construct is relevant for several files only. Even though "it works", applying it to a single file is a UUOW .
An alternate awk-based "pipe-less" solution :
awk 'BEGIN {i=0;} /<article id="[^"]/ {i++;} END { print i;}' *xml

Awk : print, printf, sprintf

print :

  • with no argument, print the whole input line :
    echo -e 'line 1\nline 2\nline 3' | awk '{print}'
    line 1
    line 2
    line 3
  • with 1 argument, print it :
    echo -e 'line 1\nline 2\nline 3' | awk '{print $2}'
    1
    2
    3
  • with more than 1 argument :
    • when the arguments are separated by commas : print arguments separated by SPACE (default) or the specified OFS
    • when the arguments are separated by spaces : print arguments concatenated
    echo -e 'line 1\nline 2\nline 3' | awk '{print $2,$1}'; echo -e 'line 1\nline 2\nline 3' | awk '{print $2 $1}'
    1 line
    2 line
    3 line
    1line
    2line
    3line
  • Appends a carriage return \n to the output (printf doesn't) :
    echo | awk '{print "Hello world"}'; echo | awk '{printf "Hello world"}'

printf :

  • Doesn't add a trailing \n
  • Supports the C-style printf(string, expression list) syntax :
    echo | awk '{printf("%d is The Answer to The Great Question.", 42)}'

sprintf :

Returns without printing, what printf would have printed out with the same arguments.

switch / case construct

Here's a very basic example (not-so-perfect but you'll get the idea ) :
echo 'abc' | awk '{
	switch ($0) {
		case /[[:lower:]]+/:
			print "lowercase"
			break
		case /[[:upper:]]+/:
			print "uppercase"
			break
		}
	}'
As a one-liner that can be pasted into the shell :
echo 'abc' | awk '{ switch ($0) { case /[[:lower:]]+/: print "lowercase"; break; case /[[:upper:]]+/: print "uppercase"; break; } }'

How to return a different exit code stating match found / match not found ?

echo -e 'foo\nbar\nbaz' | awk '/arf/{found=1} END{exit !found}'; echo $?
1
echo -e 'foo\nbar\nbaz' | awk '/bar/{found=1} END{exit !found}'; echo $?
0

Explanations

Here is how Awk processes :
  1. read the first line of input and search for a match
  2. if no match found on the current line, read the next line
  3. if a match is found, raise the specified flag : found=1 (use whatever name and value you like )
  4. when all lines of input have been read, continue to the END section, and process instructions found there
  5. exit terminates Awk, returning the specified exit code
  6. since, for Unix exit codes "success" is said 0, we negate the flag value with !

How to match strings across lines ?

Situation :

How can I make sure a text file has :
(unknown number of lines before)

(some text before)EXPECTED_TAG1(some text after)

(unknown number of lines between)

(some other text before)EXPECTED_TAG2(some other text after)

(unknown number of lines after)

so far, grep-ing EXPECTED_TAG1, then grep-ing EXPECTED_TAG2 was not an acceptable solution

Solution :

echo -e 'ga\nbu\nzo\nmeu' | awk -v RS='u' '/a.b/ {print $0}'
ga
b
echo -e 'ga\nbu\nzo\nmeu' | awk 'BEGIN {RS="u"} /a.b/ {print $0}'
ga
b
echo -e 'Super\ncali\nfragi\nlisti\ncexpia\nlido\ncious' | awk 'BEGIN {RS=".i"} {print $0}'
Super
ca

fra


s

cex
a

do

ous

Alternate solution :

Alternate "low-tech" solution :
  1. convert the input into a single giant line of text with tr
  2. search strings as usual with your favorite tool

Make music with awk

Run this script :
#!/usr/bin/env bash

value1=120
# initial value : 160
# tested values : 200, 140, 100
#	decreasing from the initial value : more bass sounds

value2=0.5678
# initial value : 0.87055
# tested values : 0.747, 0.777, 0.789
#	increasing values above 3.xxx : extreme bass sounds (?), hardly audible
#	around 0.5xxxxx : nice chime sounds

value3=13
# initial value : 10
# tested values : 13, 17, 26
#	increasing values : more high-pitched sounds
#	26 makes some 'D2-R2' blips

value4=128	# no effect so far :-(
# initial value : 128

awk "function wl() {
		rate=64000;
		return (rate/$value1)*($value2^(int(rand()*$value3)))};
	BEGIN {
		srand();
		wla=wl();
		while(1) {
			wlb=wla;
			wla=wl();
			if (wla==wlb)
				{wla*=2;};
			d=(rand()*10+5)*rate/4;
			a=b=0; c=$value4;
			ca=40/wla; cb=20/wlb;
			de=rate/10; di=0;
			for (i=0;i<d;i++) {
				a++; b++; di++; c+=ca+cb;
				if (a>wla)
					{a=0; ca*=-1};
				if (b>wlb)
					{b=0; cb*=-1};
				if (di>de)
					{di=0; ca*=0.9; cb*=0.9};
				printf(\"%c\",c)};
			c=int(c);
			while(c!=$value4) {
				c<$value4?c++:c--;
				printf(\"%c\",c)};};}" | aplay -r 64000

How to detect duplicate fields in a file ?

Situation :

I have :
1;unique1
2;duplicate1
3;unique2
4;duplicate1
5;unique3
6;duplicate2
7;duplicate2
8;unique4
9;duplicate1
I want :
Anything stating whether some values are duplicated across lines

Solution :

echo -e '1;unique1\n2;duplicate1\n3;unique2\n4;duplicate1\n5;unique3\n6;duplicate2\n7;duplicate2\n8;unique4\n9;duplicate1' | awk -F';' 'seen[$2]++ {print $2}'
duplicate1
duplicate2
duplicate1
This snippet only displays duplicates. Anything becomes a duplicate if it's already been seen once, which is why something that exists n times is displayed n-1 times.

Awk internal keywords

BEGIN
a BEGIN rule is executed once only, before the first input record is read (example)
BEGINFILE
see ENDFILE
END
an END rule is executed once only, after all the input is read (example)
ENDFILE
This is a gawk extension. The ENDFILE rule :
  • is called when gawk has finished processing the last record in an input file. For the last input file, it will be called before any END rules.
  • is executed even for empty input files.
  • allows to catch errors.
next
immediately stop processing the current record and go on to the next record

How to filter lines having duplicate fields ?

Situation :

I have :
4;unique_1
3;duplicate
2;duplicate
1;unique_2
I want :
4;unique_1
3;duplicate
1;unique_2

Solution :

echo -e '4;unique_1\n3;duplicate\n2;duplicate\n1;unique_2' | awk -F ';' '!seen[$2]++'

(source)

Details :

This command is telling Awk which lines to print : So, for each line of the file, the node of the array seen is incremented and the line is printed if the content of that node was not (!) previously set.

Alternate solution :

(source)

Awk system command

ls -l | awk '/html$/ {system("echo "$NF)}'

  • this command is absolutely useless : I just needed a dummy working example
  • don't forget that Awk variables must stay outside of quotes

How to print all fields but one ?

echo -e 'A B C D\nE F G H\nI J K L' | awk '{ for (i=2; i<=NF; i++) printf $i" "; print "" }'
  • easy when the field to skip is the first or the last one, tough otherwise
  • leaves a trailing SPACE
    echo -e 'A B C D\nE F G H\nI J K L' | awk '{ for (i=2; i<=NF; i++) printf $i" "; print "" }' | xargs -I truc echo "'truc'"
    'B C D '
    'F G H '
    'J K L '
    ==> this comes from :
    	printf $i" "
echo -e 'A B C D\nE F G H\nI J K L' | awk '{ $1=""; printf $0; print "" }'
  • prints a SPACE after every field, including the hidden ones
  • works with any field (first, latest, middle), and any number of fields :
    echo -e 'A B C D\nE F G H\nI J K L' | awk '{ $1=$3=""; printf $0; print "" }'

awk vs gawk

The standard Debian setup comes with /usr/bin/awk (don't know where this one comes from, awk ?), which has basic / limited functionality : Once gawk is installed :
ls -l $(which awk)
lrwxrwxrwx 1 root root 21 Oct 11 2016 /usr/bin/awk -> /etc/alternatives/awk*
md5sum /etc/alternatives/awk $(which gawk)
23a5b5a3d9ba0d2c6277dbdaf2557033	/etc/alternatives/awk
23a5b5a3d9ba0d2c6277dbdaf2557033	/usr/bin/gawk

Once gawk is installed, it can be invoked with awk.

How to use another character instead of / as the regular expression delimiter ?

Situation :

The syntax of Awk commands is :

awk '/a regular expression/ {deal with it}' myFile

Since the regular expression is /-delimited, we have to escape those found in the regular expression itself :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/\/[a-z]\// {print}'

It it sometimes possible to workaround this :

echo -e '/a/\n/1/\n/b/\n/2/\n/c/\n/3/' | awk '/.[a-z]./ {print}'
But since it changes the meaning of the regular expression, it is not always applicable / advisable.

Solution :

Awk has no parameter to specify the regular expression delimiter character, but we can use a variable to preserve readability :

echo -e '/foo/foo/\n/foo/bar/\n/bar/bar/\n/foo/baz/' | awk 'BEGIN {myRegex = "/foo/.a./"} $0 ~ myRegex {print}'

How to round floating numbers ?

Here's a very basic example showing the amount of RAM installed, in GiB :

awk '/MemTotal:/ {printf "%.0fGiB\n", $2/1024/1024}' /proc/meminfo

How to customize the display of data fields ?

Let's say I want to display the name + version + architecture + description of installed packages matching lsb-*. And I also want the header line so that it looks pretty. To do so :
  1. dpkg -l lsb-*
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/	Name		Version		Architecture	Description
    +++===================================================================================================
    ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
    un	lsb-core	<none>		<none>		(no description available)				not installed
    ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility
  2. I want installed packages only, so let's start filtering : dpkg -l lsb-* | awk '/^(ii|\|\|)/ {print $0}'
    ||/	Name		Version		Architecture	Description
    ii	lsb-base	9.20161125	all		Linux Standard Base init script functionality
    ii	lsb-release	9.20161125	all		Linux Standard Base version reporting utility
  3. Looks better. Now I'd like to get rid of the leading ii . Here come the magic : fieldSeparator='|'; dpkg -l lsb-* | awk '/^(ii|\|\|)/ { for (i=2; i<=4; i++) printf $i"'$fieldSeparator'"; for (i=5; i<=NF; i++) printf $i" "; print ""}' | column -s "$fieldSeparator" -t
    Name		Version		Architecture	Description
    lsb-base	9.20161125	all		Linux Standard Base init script functionality
    lsb-release	9.20161125	all		Linux Standard Base version reporting utility
    How it works :
    • All lines returned by dpkg are made of several fields : field 1 is ii , field 2 is the name, field 3 is the version, field 4 is the architecture and fields 5 to NF are the description
    • All description fields have a different number of words, so NF is different for each returned line
    • the for (i=2; i<=4; i++) printf $i"'$fieldSeparator'" command prints name + version + architecture (with a separator)
    • the for (i=5; i<=NF; i++) printf $i" " command prints the description as a single data field (no separator added)
    • all of this is fed into column for a nice tabular display and voilà!

How to display the nth line after / before a match ?

Display only the nth line after the one matching pattern (source) :

awk '/pattern/ { x = NR + n } NR == x' someFile

This construct assumes pattern matches only once.

for i in {0..9}; do echo "line $i"; done | awk '/line 4/ {lineToDisplay = NR + 3} NR == lineToDisplay'

line 7

Display only the nth line before the one matching pattern :

So far, I don't know whether this is possible with Awk. I think this may possible with sed. And a basic Bash solution would be :

for i in {0..9}; do echo "line $i"; done | grep -B3 'line 4' | head -1

Awk internal variables

$n
the nth element of the current line ($0 being the whole line itself) :
for i in {1..4}; do echo 'a b c d' | awk '{print "Item '$i' of line \""$0"\" is "$'$i'"."}'; done
FS
Field Separator. Can be set with -F
NF
  • number of fields in the current line :
    echo 'a b c' | awk '{print NF}'; echo 'joe jack william averell' | awk '{print NF}';
    3
    4
  • It is often used to refer to the last field of a line :
    echo 'a b c' | awk '{print $NF}'; echo 'joe jack william averell' | awk '{print $NF}';
    c
    averell
NR
number of current row (starting at 1) : for i in {a..e}; do echo $i; done | awk '{ print "line "NR":\t" $0}'
line 1: a
line 2: b
line 3: c
line 4: d
line 5: e
OFS
Output Field Separator. It is automatically inserted between fields by print. Defaults to a single space.
This is not a CLI flag, it goes into the "action" part :
  • echo {a..z} | awk '{OFS="."; print $1,$3,$5,$7}'
    a.c.e.g
  • No need to repeat the definition for every line of input :
    echo {a..z} | awk 'BEGIN{OFS="PLOP"} {print $1,$3,$5,$7}'
    aPLOPcPLOPePLOPg
RS
Records Separator. Defaults to \n (NEWLINE), which means Awk processes individual records made of a single line of text. Can also be a regular expression for gawk.

Tailor files / strings / substrings with Awk / Bash / PERL / sed

awk

Extract specific fields from log files :
  • awk '$9 == "searchedKeyword" {print $7}' file.log | sort | uniq -c | sort -nr | head -n 10
  • awk '$6 ~ "30." {print $5" "$6}' file | ...

~ is the Awk operator to match a regular expression.

Bash (source)

Replace a substring :
  • ${string/substring/replacement} : replace 1st occurrence
  • ${string//substring/replacement} : replace all occurrences
  • myString='Hello World'; echo ${myString//[eo]/ab} : outputs Habllab Wabrld
Test whether a string matches a RegExp (source) :
testString='Hello World'; if [[ $testString =~ ^.*o.*o.*$ ]]; then echo "MATCHES"; else echo "DOESN'T MATCH"; fi

PERL

Apply a regExp to a string :
  • perl -e '$ARGV[0]=~ m/..(.)/; print $1' abcdef
  • echo AZERqsdfWXCV | xargs perl -e '$ARGV[0]=~ m/.{4}(.{4}).*(.)$/; print "$1 $2"'

sed

Extract (in CSV format) URL + hit/miss + generation time from a Varnish log :
sed -r 's/.*GET ([^ ]*).*(hit|miss) ([0-9.]*).*/\1;\2;\3/' access.log > result.log
Extract (in CSV format) URL + HTTP error code from Lighttpd log :
sed -r 's/^.*GET ([^ ]*).*HTTP\/1\.1" ([0-9]*).*$/\1;\2;/' /var/log/lighttpd/www.example.com.log > result.log
Same as above with HTTP 500 errors only + sorting results by descending number of occurrences :

logFile='/var/log/lighttpd/www.example.com.log'; resultFile='./result.csv'; tmpFile=$(mktemp --tmpdir tmp.result.XXXXXXXX); grep '" 500 ' $logFile | sed -r 's/^.*GET ([^ ]*).*HTTP\/1\.." ([0-9]*).*$/\1;\2;/' > $tmpFile; cat $tmpFile | sort | uniq -c | sort -nr > $resultFile; rm $tmpFile

Using grep 1st because sed can't find a match on every line, as we're reporting only on HTTP 500 errors.

Extract (in CSV format) several fields from Apache logs stored in a year/month/day directory tree :
resultFile='~/result.csv'; tmpFile=$(mktemp --tmpdir tmp.XXXXXXXX); csvHeader='web server;IP;HTTP method;URL used by method;full URL;'; echo $csvHeader > $tmpFile; logFilePath='/path/to/logfiles/'; startYear='2013'; endYear='2013'; startMonth='04'; endMonth='04'; startDay='01'; endDay='18'; for year in $(seq $startYear $endYear); do for month in $(seq $startMonth $endMonth); do for day in $(seq $startDay $endDay); do [ ${#month} -eq 1 ] && month='0'$month; [ ${#day} -eq 1 ] && day='0'$day; logFile=$logFilePath/$year/$month/$day/$year$month$day'-access.log'; echo "PROCESSING $logFile ..."; grep 'example.com' $logFile | grep -v 'GET' | sed -r 's/^.*(webServer(1|2)).* ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) .*\] "([A-Z]*) (.*) HTTP.*" [0-9]+ [0-9]+ "([^"]+)".*/;\1;\3;\4;\5;\6/' | sort | uniq >> $tmpFile; done; done; done; cat $tmpFile | sort | uniq -c | sort -nr >> $resultFile; rm $tmpFile

awk

Usage :

Awk is a programmable text filter. Its input can be :

Output :

An Awk script is made of 3 blocks :

  1. pre-process : BEGIN
  2. process
  3. post-process : END

Awk reads the input line by line, then applies the specified filter(s) to detect whether or not to process the current line. Before starting processing a line, Awk splits it into fields and stores fields values in $1 (1st field), $2, ..., $NF (last field). $0 is the whole input line. The fields separator (specified with FS) defaults either to [SPACE] or [TAB] (details).

There is no need to use grep together with Awk as Awk "RegExp matches" lines to process.

Filters :

Criteria select matching lines select not matching lines
line number within input
awk 'NR==n {doSomething}'
echo -e 'a\nb\nc\nd' | awk 'NR==3'
c
echo -e 'a\nb\nc\nd' | awk 'NR==3 {next}; {print}'
a
b
d
line vs regular expression
awk '/regEx/ {doSomething}'
echo -e 'foo\nbar\nbaz' | awk '/bar/ {print $0}'
bar
awk '!/regEx/ {doSomething}'
echo -e 'foo\nbar\nbaz' | awk '!/a/ {print $0}'
foo
(source, example)
line vs number of fields echo -e 'field1\nfield1\tfield2\nfield1\tfield2\tfield3' | awk 'NF == 2 {print $0}'
field1	field2
field vs number echo -e 'foo\t12\nbar\t34\nbaz\t56' | awk '$2 > 25 {print $0}'
bar	34
baz	56
Trying to filter data based on line numbers returned by grep -n with a construct like :
grep -n --color=always [options] | awk -F ':' '$n > x {doSomething}'
may fail because of the returned color codes.
  • echo -e 'FOO\nBAR\nBAZ' | grep -n --color=always '.A' | awk -F ':' '$1>2 {print $0}'
  • echo -e 'FOO\nBAR\nBAZ' | grep -n '.A' | awk -F ':' '$1>2 {print $0}'
field vs string
awk '$n == "value" {doSomething}'
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$2 == "bar2" {print $0}'
foo2 bar2 baz2
awk '$n != "value" {doSomething}'
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$2 != "bar2" {print $0}'
foo1 bar1 baz1
foo3 bar3 baz3
field vs regular expression
awk '$n ~ /regEx/ {doSomething}'
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$2 ~ /a.1/ {print $0}'
foo1 bar1 baz1
awk '$n !~ /regEx/ {doSomething}'
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$2 !~ /a.1/ {print $0}'
foo2 bar2 baz2
foo3 bar3 baz3
field vs regular expression with if / else construct (source) for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '{ if($2 ~ "a.2") {print "MATCH : "$2 } else {print "NO MATCH"} }'
NO MATCH
MATCH : bar2
NO MATCH
several conditions
awk 'condition1 logicalOperator condition2 logicalOperator ... conditionN {doSomething}'
logicalOperator can be (source) :
  • && : logical AND
  • || : logical OR
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$2 ~ "^ba.." && $3 == "baz3" {print $0}'
foo3 bar3 baz3
for i in {1..3}; do echo "foo$i bar$i baz$i"; done | awk '$1 ~ "1$" || $3 ~ "3$" {print $0}'
foo1 bar1 baz1
foo3 bar3 baz3
for i in {6..22}; do echo "a b c d e f g h $i"; done | awk '$NF==7 || $NF==21 {print $0}'
a b c d e f g h 7
a b c d e f g h 21
Numerical field with trailing unit letter or text :

If the numerical value has a unit letter, it doesn't work anymore :

echo -e "foo\t8U\nbar\t34U\nbaz\t56U" | awk '$2 > 25 {print $0}'
foo	8U	ooops !
bar	34U
baz	56U
solution :
echo -e "foo\t8U\nbar\t34U\nbaz\t56U" | awk 'strtonum($2) > 25 {print $0}'
bar	34U
baz	56U

Try it :

strtonum() looks smart enough to handle trailing units (source) :
awk 'BEGIN {
	print "trailing unit (single letter) : " strtonum("123U")
	print "trailing unit (word) : " strtonum("123potatoes")
	print "leading unit (single letter) : " strtonum("Y123")
	print "leading unit (word) : " strtonum("banana123")
	}'
trailing unit (single letter) : 123	OK
trailing unit (word) : 123		OK
leading unit (single letter) : 0	KO
leading unit (word) : 0			KO

Flags :

Flag Usage
-F x use x as the input Field separator
  • x can be several characters long : echo 'GAABCDBUABCDZOABCDMEU' | awk -F 'ABCD' '{print $1,$2,$3,$4}'
  • default field separator : any run of spaces and/or tabs and/or newlines (excluding leading and trailing runs) (details)
-v variable=value declare a variable (example)

Example :

Process log files :

Count occurrences of an error message in a log file :
This code removes the [10-Oct-2012 18:15:46 UTC] fields from every logfile line. This is why Awk is taught to display all fields starting from the 4th :
awk '/^\[/ { for (i=4;i<=NF;i++) printf $i " ";print ""}' logFile

printf adds no carriage return after printing. print does.

Then count occurrences :
awk '/^\[/ { for (i=4;i<=NF;i++) printf $i " ";print ""}' logFile | sort | uniq -c | sort -nr
From a multiple-fields line, displays fields starting from the 4th :
In a log file such as :
[13-Nov-2013 03:03:35 Europe/Paris] PHP Warning: Memcached::touch(): ... in ....php on line 45
[13-Nov-2013 03:04:42 Europe/Paris] PHP Warning: file_get_contents(http://...): HTTP/1.0 404 Not Found in ...php on line 202
...
let's say you'd like to remove the date/time field to group and count similar errors. To do so :

awk '{ for (i=1;i<=3;i++) $i="";print }' file.log | awk '{sub(/^[ \t]+/, ""); print}' | sort | uniq -c | sort -nr

  • the 1st Awk command replaces the first 3 fields with an empty string, so that the line only contains the remaining fields, starting from the 4th as required
  • the 2nd Awk command just removes leading whitespaces (source)
Match a keyword from a variable :
for httpCode in 301 302 304; do echo -n "Code $httpCode : "; awk -v needle="$httpCode" '$6 ~ needle {print " "}' logFile | wc -l; done

Selecting PID's :

ps --ppid 1 | awk '/d$/ {print $1}'
Lists processes whose parent's PID is 1, then selects processes whose name ends in 'd', and prints the corresponding PID, which is the line field #1.

Specifying the field separator :

awk -F ':' '{ print "username: " $1 "\t\tuid:" $3 }' /etc/passwd

List all ports and PIDs on which a Mongodb instance is listening :

netstat -laputen | awk '/mongo/ {print "IP:port = "$4"\tPID = "$9}' | sort | uniq

Select non-empty lines :

echo -e 'A\tB\tC\tD\nE\tF\tG\tH\n\nI\tJ\tK\tL' | awk '!/^$/ {print $3}'

C
G
K

Dark wizardry ?

This awk command made me scratch my head quite a bit : it returns fields from 2 distinct lines, that even are not contiguous (I considered the eventuality of magic ).
To figure this out, I simplified it, and let the magic happen :
echo -e 'key1\tvalue1\nkey2\tvalue2' | awk '/key1|key2/ { printf $2 " " }'
value1 value2
Explanation :
  • the input (either an echo, a line "piped" in, or a whole file) is perfectly "normal" : there is no hack regarding field separators or end of line markers.
  • the /key1|key2/ part of the awk command is a "normal" regular expression alternation
  • the printf $2 " " part simply prints the 2nd field of each matching line, followed by a space
So what's the trick ?
Let's have a deeper look at how awk works and what we're instructing it to do with the echo | awk command above :
  1. no pre-process, so let's start eating lines and doing things
  2. awk splits the input into distinct lines
  3. awk reads the 1st line : key1\tvalue1
  4. does it match the regular expression ? Yes, so print the 2nd field and a space character : value1 
  5. done with this line, continue with the next line
  6. read the 2nd line : key2\tvalue2
  7. does it match the regular expression ? Yes again, so print the 2nd field and a space character : value2 
  8. The trick is that awk does not automatically add a newline character after printing. So the output of any step is printed right after the output of the previous step. This is why, at this step of the procedure, the output looks like : value1 value2 
  9. done with this line, no next line
  10. no post-process
  11. the end !