Awk

Operator	Usage
`&&`	`expr1 && expr2` logical AND : true if both expr1 and expr2 are true expr2 is evaluated only if expr1 is true
`\|\|`	`expr1 \|\| expr2` logical OR : true if at least one of expr1 or expr2 is true expr2 is evaluated only if expr1 is false
`!`	`! expr` logical NOT : true if expr is false

I'm trying to execute

awk -i inplace

but I get the error :

awk: not an option: -i

The -i flag belongs to GNU Awk and is not available for the "regular" awk program.

Install GNU Awk :

apt install gawk

An example is worth one thousand words., so enjoy your reading !

Tutorials & basic examples :

http://sparky.rice.edu/awk.html
my own scripts, some of them using external Awk files :
- figures.sh + network_statistics.awk
- searchNowrap.sh

convert sleep durations into human-readable durations (related: numbers with trailing unit letter)

for unit in '' s m; do
	for duration in 0 1 2 10 100; do
		value="$duration$unit"
		echo -n "'$value'."
		awk '
			/^[0-9]+$/	{ if($0 < 2) print $0 " second"; else print $0 " seconds"; }
			/^[0-9]+[ms]$/	{
				number=strtonum($0)
				unit=gensub(/^[0-9]+([ms])/, "\\1", "g");

				switch (unit) {
					case /m/:
						longUnit="minute"
						break
					case /s/:
						longUnit="second"
						break
					}
				 if(number > 1) print number" "longUnit"s"; else print number" "longUnit   ;
				 }
			' <<< "$value"
	done
done | column -s '.' -t

'0'     0 second
'1'     1 second
'2'     2 seconds
'10'    10 seconds
'100'   100 seconds
'0s'    0 second
'1s'    1 second
'2s'    2 seconds
'10s'   10 seconds
'100s'  100 seconds
'0m'    0 minute
'1m'    1 minute
'2m'    2 minutes
'10m'   10 minutes
'100m'  100 minutes

a dummy script with gensub, if else, length and logical operators

cat << EOF | awk '
BEGIN	{ followingWord=""; myVariable=""; }
/FOO/	{ followingWord=gensub(/.*FOO ([^ ]+).*/, "\\1", "g") }
/BAR/	{ myVariable="A" }
/BAZ/	{ myVariable="B" }
		{ if((length(followingWord) > 0) && (length(myVariable) > 0)) {
			print followingWord" ("length(followingWord)") "myVariable" ("length(myVariable)")."
			followingWord=""; myVariable="";
			}
		}
'
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut FOO labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex BAR ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore BAZ eu fugiat nulla pariatur.
EOF

labore (6) A (1).

List "enabled" repositories :

The URL linked below lists repositories :

[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-stable-source]
name=Docker CE Stable - Sources
baseurl=https://download.docker.com/linux/centos/$releasever/source/stable
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

I want to list the IDs of those having enabled=1 :

curl -s https://download.docker.com/linux/centos/docker-ce.repo | awk ' \
	BEGIN		{ repoId=""; } \
	/^ *$/		{ repoId=""; } \
	/^\[.*\]/	{ repoId=$0; } \
	/^enabled=1/	{ if (repoId != "") { result=gensub(/[\[\]]/, "", "g", repoId); print result; } }'

docker-ce-stable

Find who's the link / who's the target in a symlink :

The snippet below serves just to play with Awk since it can be replaced by the much more efficient command readlink.

while read line; do
	echo -e "\t$line"
	echo "$line" | awk '{ link=$1; target=""; for(i=2; i<=NF; i++) { if($i!="->") link=link" "$i; else break; } for(j=i+1; j<NF; j++) target=target""$j" "; target=target""$NF; print " LINK: \047"link"\047, TARGET: \047"target"\047"; }'
	echo
done < <(find -type l -exec ls -l {} + | awk '{ for (i=1; i<9; i++) $i=""; print $0; }')

Explanations :

find -type l -exec ls -l {} + | awk '{ for (i=1; i<9; i++) $i=""; print $0; }': turn this :
lrwxrwxrwx 1 kevin developers 57 Apr 5 11:03 'path/to/link' -> 'path/to/target'
into this :
path/to/link -> path/to/target
i.e. remove the metadata shown by ls -l, which is made of the 8 first line fields.
\047: code to let Awk's print display single quotes ' (see comments of this answer)

Run this script :

#!/usr/bin/env bash

value1=120
# initial value : 160
# tested values : 200, 140, 100
#	decreasing from the initial value : more bass sounds

value2=0.5678
# initial value : 0.87055
# tested values : 0.747, 0.777, 0.789
#	increasing values above 3.xxx : extreme bass sounds (?), hardly audible
#	around 0.5xxxxx : nice chime sounds

value3=13
# initial value : 10
# tested values : 13, 17, 26
#	increasing values : more high-pitched sounds
#	26 makes some 'D2-R2' blips

value4=128	# no effect so far :-(
# initial value : 128

awk "function wl() {
		rate=64000;
		return (rate/$value1)*($value2^(int(rand()*$value3)))};
	BEGIN {
		srand();
		wla=wl();
		while(1) {
			wlb=wla;
			wla=wl();
			if (wla==wlb)
				{wla*=2;};
			d=(rand()*10+5)*rate/4;
			a=b=0; c=$value4;
			ca=40/wla; cb=20/wlb;
			de=rate/10; di=0;
			for (i=0;i<d;i++) {
				a++; b++; di++; c+=ca+cb;
				if (a>wla)
					{a=0; ca*=-1};
				if (b>wlb)
					{b=0; cb*=-1};
				if (di>de)
					{di=0; ca*=0.9; cb*=0.9};
				printf(\"%c\",c)};
			c=int(c);
			while(c!=$value4) {
				c<$value4?c++:c--;
				printf(\"%c\",c)};};}" | aplay -r 64000

BEGIN

a BEGIN rule is executed once only, before the first input record is read (example)

BEGINFILE

see ENDFILE

END

an END rule is executed once only, after all the input is read (example)

ENDFILE

This is a gawk extension. The ENDFILE rule :

is called when gawk has finished processing the last record in an input file. For the last input file, it will be called before any END rules
is executed even for empty input files
allows to catch errors

The standard Debian setup comes with /usr/bin/awk (don't know where this one comes from, awk ?), which has basic / limited functionality :

doesn't support the .{n} syntax
Only gawk supports {} (source)

Once gawk is installed :

ls -l $(which awk)

lrwxrwxrwx 1 root root 21 Oct 11 2016 /usr/bin/awk -> /etc/alternatives/awk*

md5sum /etc/alternatives/awk $(which gawk)

23a5b5a3d9ba0d2c6277dbdaf2557033	/etc/alternatives/awk
23a5b5a3d9ba0d2c6277dbdaf2557033	/usr/bin/gawk

Once gawk is installed, it can be invoked with awk.

$n

the n^th element of the current line ($0 being the whole line itself) :

for i in {1..4}; do echo 'a b c d' | awk '{print "Item '$i' of line \""$0"\" is "$'$i'"."}'; done

FILENAME

name of the current input file
- when reading from standard input
empty string inside a BEGIN rule

FS

Field Separator. Can be set with -F

NF

number of fields in the current line :

for string in 'a b c' 'joe jack william averell'; do
	echo "$string" | awk '{ print NF }'
done

3
4

It is often used to refer to the last field of a line :

for string in 'a b c' 'joe jack william averell'; do
	echo "$string" | awk '{ print $NF }'
done

c
averell

to refer to the last but 1, , last but n :
echo 'threeBeforeLast twoBeforeLast oneBeforeLast last' | awk '{ print $(NF-1)" "$NF }'
```
oneBeforeLast last
```

To display numerous data fields or for more complex situations, read : How to filter fields to print ?

NR

number of records processed so far (which can be approximated to the number of the current row, starting at 1) :

for i in {a..e}; do echo $i; done | awk '{ print "line "NR":\t" $0}'

line 1: a
line 2: b
line 3: c
line 4: d
line 5: e

OFS

Output Field Separator. It is automatically inserted between fields by print. Defaults to a single space.

This is not a CLI flag, it goes into the "action" part :

echo {a..z} | awk '{OFS="."; print $1,$3,$5,$7}'
```
a.c.e.g
```
No need to repeat the definition for every line of input :
echo {a..z} | awk 'BEGIN{OFS="PLOP"} {print $1,$3,$5,$7}'
```
aPLOPcPLOPePLOPg
```

RS

Records Separator

defaults to \n (NEWLINE) : by default Awk considers 1 record == 1 line of input
gawk also accepts regular expression

Extract specific fields from log files :

awk '$9 == "searchedKeyword" {print $7}' file.log | sort | uniq -c | sort -nr | head -n 10
awk '$6 ~ "30." {print $5" "$6}' file | ...

~ is the Awk operator to match a regular expression.

Bash (source)

Replace a substring :

${string/substring/replacement} : replace 1^st occurrence
${string//substring/replacement} : replace all occurrences
myString='Hello World'; echo ${myString//[eo]/ab} : outputs Habllab Wabrld

Test whether a string matches a RegExp (source) :

testString='Hello World'; if [[ $testString =~ ^.*o.*o.*$ ]]; then echo "MATCHES"; else echo "DOESN'T MATCH"; fi

PERL

Apply a regExp to a string :

perl -e '$ARGV[0]=~ m/..(.)/; print $1' abcdef
echo AZERqsdfWXCV | xargs perl -e '$ARGV[0]=~ m/.{4}(.{4}).*(.)$/; print "$1 $2"'

sed

Extract (in CSV format) URL + hit/miss + generation time from a Varnish log :

sed -r 's/.*GET ([^ ]*).*(hit|miss) ([0-9.]*).*/\1;\2;\3/' access.log > result.log

Extract (in CSV format) URL + HTTP error code from Lighttpd log :

sed -r 's/^.*GET ([^ ]*).*HTTP\/1\.1" ([0-9]*).*$/\1;\2;/' /var/log/lighttpd/www.example.com.log > result.log

Same as above with HTTP 500 errors only + sorting results by descending number of occurrences :

logFile='/var/log/lighttpd/www.example.com.log'; resultFile='./result.csv'; tmpFile=$(mktemp --tmpdir tmp.result.XXXXXXXX); grep '" 500 ' $logFile | sed -r 's/^.*GET ([^ ]*).*HTTP\/1\.." ([0-9]*).*$/\1;\2;/' > $tmpFile; cat $tmpFile | sort | uniq -c | sort -nr > $resultFile; rm $tmpFile

Using grep 1^st because sed can't find a match on every line, as we're reporting only on HTTP 500 errors.

Extract (in CSV format) several fields from Apache logs stored in a year/month/day directory tree :

resultFile='~/result.csv'; tmpFile=$(mktemp --tmpdir tmp.XXXXXXXX); csvHeader='web server;IP;HTTP method;URL used by method;full URL;'; echo $csvHeader > $tmpFile; logFilePath='/path/to/logfiles/'; startYear='2013'; endYear='2013'; startMonth='04'; endMonth='04'; startDay='01'; endDay='18'; for year in $(seq $startYear $endYear); do for month in $(seq $startMonth $endMonth); do for day in $(seq $startDay $endDay); do [ ${#month} -eq 1 ] && month='0'$month; [ ${#day} -eq 1 ] && day='0'$day; logFile=$logFilePath/$year/$month/$day/$year$month$day'-access.log'; echo "PROCESSING $logFile ..."; grep 'example.com' $logFile | grep -v 'GET' | sed -r 's/^.*(webServer(1|2)).* ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) .*\] "([A-Z]*) (.*) HTTP.*" [0-9]+ [0-9]+ "([^"]+)".*/;\1;\3;\4;\5;\6/' | sort | uniq >> $tmpFile; done; done; done; cat $tmpFile | sort | uniq -c | sort -nr >> $resultFile; rm $tmpFile

Awk is a programmable text filter. Its input can be :

a file : awk [options] myFile
the standard input stdin : stream of text from previous command | awk [options]

Output :

by default : the standard output stdout (source)
any file you like with output redirection : awk [options] inputFile > outputFile
Don't forget that awk's default action is to print matching lines. So any action limited to { print $0 }, albeit correct, is redundant .
details on print

An Awk script is made of 3 blocks :

pre-process : BEGIN
process
post-process : END

Awk reads the input line by line, then applies the specified filter(s) to detect whether or not to process the current line. Before starting processing a line, Awk splits it into fields and stores fields values in $1 (1^st field), $2, ..., $NF (last field). $0 is the whole input line. The fields separator (specified with FS) defaults either to [SPACE] or [TAB] (details).

There is no need to use grep together with Awk as Awk "RegExp matches" lines to process.

Filters :

Criteria	select matching lines	select not matching lines
line number within input	awk 'NR==n {doSomething}' echo -e 'a\nb\nc\nd' \| awk 'NR==3' c	echo -e 'a\nb\nc\nd' \| awk 'NR==3 {next}; {print}' a b d
line vs regular expression	awk '/regEx/ {doSomething}' echo -e 'foo\nbar\nbaz' \| awk '/bar/ {print $0}' bar echo -e 'foo\nPool ID : 1234\nbar\nID du pool : 4321\nbaz' \| awk '/(Pool ID\|ID du pool)/ {print $NF}' 1234 4321	awk '!/regEx/ {doSomething}' echo -e 'foo\nbar\nbaz' \| awk '!/a/ {print $0}' foo (source, example)
line vs number of fields Comparison Operators	echo -e 'field1\nfield1\tfield2\nfield1\tfield2\tfield3' \| awk 'NF == 2 {print $0}' field1 field2
field vs number Comparison Operators special case with trailing unit letter	echo -e 'foo\t12\nbar\t34\nbaz\t56' \| awk '$2 > 25 {print $0}' bar 34 baz 56 Awk is smart enough to strip leading zeroes : echo {01..10} \| awk '$3 > 2 { print "ok" }' ok echo {01..10} \| awk '$3 > 3 { print "ok" }' (void) echo {01..10} \| awk '$3 >= 3 { print "ok" }' ok echo {0001..10} \| awk '$3 >= 3 { print "ok" }' ok Trying to filter data based on line numbers returned by grep -n with a construct like : grep -n --color=always [options] \| awk -F ':' '$n > x {doSomething}' may fail because of the returned color codes. echo -e 'FOO\nBAR\nBAZ' \| grep -n --color=always '`.A`' \| awk -F '`:`' '$1>2 {print $0}' echo -e 'FOO\nBAR\nBAZ' \| grep -n '`.A`' \| awk -F '`:`' '$1>2 {print $0}'
field vs string	awk '$n == "value" {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 == "bar2" {print $0}' foo2 bar2 baz2	awk '$n != "value" {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 != "bar2" {print $0}' foo1 bar1 baz1 foo3 bar3 baz3
field vs regular expression limitations	awk '$n ~ /regEx/ {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 ~ /a.1/ {print $0}' foo1 bar1 baz1 find the shortest path : echo -e "bla dir1/\nbla dir1/dir2/\nbla dir1/dir2/dir3/" \| awk '$NF ~ /^[^/]*\/$/ {print $NF}' dir1/	awk '$n !~ /regEx/ {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 !~ /a.1/ {print $0}' foo2 bar2 baz2 foo3 bar3 baz3
field vs regular expression with `if / else` construct (source)	for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '{ if($2 ~ "a.2") {print "MATCH : "$2 } else {print "NO MATCH"} }' NO MATCH MATCH : bar2 NO MATCH
several conditions	awk 'condition1 logicalOperator condition2 logicalOperator ... conditionN {doSomething}' logicalOperator can be (source) : `&&` : logical AND `\|\|` : logical OR for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 ~ "^ba.." && $3 == "baz3" {print $0}' foo3 bar3 baz3 for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$1 ~ "1$" \|\| $3 ~ "3$" {print $0}' foo1 bar1 baz1 foo3 bar3 baz3 for i in {6..22}; do echo "a b c d e f g h $i"; done \| awk '$NF==7 \|\| $NF==21 {print $0}' a b c d e f g h 7 a b c d e f g h 21 echo \| awk '`1==1 && (2==1 \|\| 3==3)` { print "ok" }' ok

Numerical field with trailing unit letter or text (related: convert sleep durations into human-readable durations) :

If the numerical value has a unit letter, it doesn't work anymore :

echo -e "foo\t8U\nbar\t34U\nbaz\t56U" | awk '$2 > 25 {print $0}'

foo	8U	ooops !
bar	34U
baz	56U

solution :

echo -e "foo\t8U\nbar\t34U\nbaz\t56U" | awk 'strtonum($2) > 25 {print $0}'

bar	34U
baz	56U

Try it :

df -h | awk 'strtonum($5) > 75 {print $0}'
df -h | awk 'BEGIN {gsub(/%/, "", $5)} {if(strtonum($5) > 50) {print $0}}'

strtonum() looks smart enough to handle trailing units (source) :

awk 'BEGIN {
	print "trailing unit (single letter) : " strtonum("123U")
	print "trailing unit (word) : " strtonum("123potatoes")
	print "leading unit (single letter) : " strtonum("Y123")
	print "leading unit (word) : " strtonum("banana123")
	}'

trailing unit (single letter) : 123	OK
trailing unit (word) : 123		OK
leading unit (single letter) : 0	KO
leading unit (word) : 0			KO

Flag Usage

Flag	Usage
-F sep	use sep as the input Field separator, which can be (gnu.org) : a single character OR a regular expression `echo -e 'GA BU ZO\tMEU' \| awk '{print $1"\t"$2"\t"$3"\t"$4}'` default separator GA BU ZO MEU `echo 'GA.BU,ZO;MEU' \| awk -F '.' '{print $1"\t"$2"\t"$3"\t"$4}'` single character separator GA BU,ZO;MEU `echo 'GA.BU,ZO;MEU' \| awk -F '.,' '{print $1"\t"$2"\t"$3"\t"$4}'` several characters = regex GA.B ZO;MEU `echo 'GAABCDBUABCDZOABCDMEU' \| awk -F 'ABCD' '{print $1"\t"$2"\t"$3"\t"$4}'`regex again GA BU ZO MEU `echo 'GA.BU,ZO;MEU' \| awk -F ',\|;' '{print $1"\t"$2"\t"$3"\t"$4}'` another regex GA.BU ZO MEU `echo 'GA.BU,ZO;MEU' \| awk -F ',\|\\.' '{print $1"\t"$2"\t"$3"\t"$4}'` details GA BU ZO;MEU default value : any run of `spaces` and/or `tabs` and/or `newlines` (excluding leading and trailing runs) (details) `.` needs to be escaped twice : `\\.`, otherwise, awk complains : awk: warning: escape sequence `\.' treated as plain `.'
-i awkLibrary --include awkLibrary	load the specified library awkLibrary (example)
-v variable=value	declare a variable (example) use multiple -v to declare several variables : -v variable1=value1 -v variable2=value2

-F sep

use sep as the input Field separator, which can be (gnu.org) :

a single character
OR a regular expression

echo -e 'GA BU  ZO\tMEU' | awk '{print $1"\t"$2"\t"$3"\t"$4}'		default separator
GA	BU	ZO	MEU

echo 'GA.BU,ZO;MEU' | awk -F '.' '{print $1"\t"$2"\t"$3"\t"$4}'		single character separator
GA	BU,ZO;MEU

echo 'GA.BU,ZO;MEU' | awk -F '.,' '{print $1"\t"$2"\t"$3"\t"$4}'	several characters = regex
GA.B	ZO;MEU

echo 'GAABCDBUABCDZOABCDMEU' | awk -F 'ABCD' '{print $1"\t"$2"\t"$3"\t"$4}'regex again
GA	BU	ZO	MEU

echo 'GA.BU,ZO;MEU' | awk -F ',|;' '{print $1"\t"$2"\t"$3"\t"$4}'	another regex
GA.BU	ZO	MEU

echo 'GA.BU,ZO;MEU' | awk -F ',|\\.' '{print $1"\t"$2"\t"$3"\t"$4}'	details
GA	BU	ZO;MEU

default value : any run of spaces and/or tabs and/or newlines (excluding leading and trailing runs) (details)
. needs to be escaped twice : \\., otherwise, awk complains :
awk: warning: escape sequence `\.' treated as plain `.'

-i awkLibrary
--include awkLibrary load the specified library awkLibrary (example)

-v variable=value

declare a variable (example)
use multiple -v to declare several variables : -v variable1=value1 -v variable2=value2

If Awk's exit statement is invoked with a numeric value, this numeric value is used as the exit status code.

Otherwise (source) :

Exit status	Description
`0` aka UNIX_SUCCESS	No problem during execution, including when no match was found. Check it : for char in a b; do echo "$char" \| awk '/`a`/'; echo $?; echo; done a 0 0
`1` aka UNIX_FAILURE	An error occurred
`2`	Fatal error

Process log files :

Count occurrences of an error message in a log file :

This code removes the [10-Oct-2012 18:15:46 UTC] fields from every logfile line. This is why Awk is taught to display all fields starting from the 4^th :

awk '/^\[/ { for (i=4;i<=NF;i++) printf $i " ";print ""}' logFile

printf adds no carriage return after printing. print does.

Then count occurrences :

awk '/^\[/ { for (i=4;i<=NF;i++) printf $i " ";print ""}' logFile | sort | uniq -c | sort -nr

From a multiple-fields line, displays fields starting from the 4^th :

In a log file such as :

[13-Nov-2013 03:03:35 Europe/Paris] PHP Warning: Memcached::touch(): ... in ....php on line 45
[13-Nov-2013 03:04:42 Europe/Paris] PHP Warning: file_get_contents(http://...): HTTP/1.0 404 Not Found in ...php on line 202
...

let's say you'd like to remove the date/time field to group and count similar errors. To do so :

awk '{ for (i=1;i<=3;i++) $i="";print }' file.log | awk '{sub(/^[ \t]+/, ""); print}' | sort | uniq -c | sort -nr

the 1^st Awk command replaces the first 3 fields with an empty string, so that the line only contains the remaining fields, starting from the 4^th as required
the 2^nd Awk command just removes leading whitespaces (source)

Match a keyword from a variable :

You can't use a variable name within the // operator to select the matching line :

DON'T : echo -e 'apple\nbanana\ncarrot' | awk -v letterToMatch='b' '/letterToMatch/'
DO : echo -e 'apple\nbanana\ncarrot' | awk -v letterToMatch='b' '$1 ~ letterToMatch'

for httpCode in 301 302 304; do echo -n "Code $httpCode : "; awk -v needle="$httpCode" '$6 ~ needle {print " "}' logFile | wc -l; done

Examples to illustrate the above :

echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' '/stuffToMatch/'
```
(nothing)
```
echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' '$0 ~ /stuffToMatch/'
```
(nothing)
```
no match found, as said above
echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' 'stuffToMatch'
```
fruit: apple
fruit: banana
vegetable: carrot
```
matches everything
echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' '$0 ~ stuffToMatch'
```
fruit: banana
vegetable: carrot
```
echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' '$1 ~ stuffToMatch'
```
vegetable: carrot
```
echo -e 'fruit: apple\nfruit: banana\nvegetable: carrot' | awk -v stuffToMatch='b' '$2 ~ stuffToMatch'
```
fruit: banana
```
All examples work as expected

Quotes or not around the value to match ?

match the string "a"
echo -e '1a\n2b\n3a\n4b' | awk '$0 ~ "a"'
```
1a
3a
```
match the unset variable a
echo -e '1a\n2b\n3a\n4b' | awk '$0 ~ a'
```
1a
2b
3a
4b
```
same as above, but explicit :
echo -e '1a\n2b\n3a\n4b' | awk -v a='' '$0 ~ a'
```
1a
2b
3a
4b
```
now setting the variable :
echo -e '1a\n2b\n3a\n4b' | awk -v a='a' '$0 ~ a'
```
1a
3a
```

When regex-matching a string with a variable :

run this pseudo-code :
```
if input =~ "foo.=A"
	continue
else
	print input
```
echo -e 'foo1=A\nfoo2=B\nfoo3=C' | awk -v value='A' '$0 ~ "foo.="value {next} {print}'
other example :
for i in {A..C}; do for j in {A..C}; do echo "$i$j"; done; done | awk -v value='A' '$0 ~ "("value"|B)C"'
```
AC
BC
```

Selecting PID's :

ps --ppid 1 | awk '/d$/ {print $1}'

Lists processes whose parent's PID is 1, then selects processes whose name ends in 'd', and prints the corresponding PID, which is the line field #1.

Specifying the field separator :

awk -F ':' '{ print "username: " $1 "\t\tuid:" $3 }' /etc/passwd

List all ports and PIDs on which a Mongodb instance is listening :

netstat -laputen | awk '/mongo/ {print "IP:port = "$4"\tPID = "$9}' | sort | uniq

Select non-empty lines :

echo -e 'A\tB\tC\tD\nE\tF\tG\tH\n\nI\tJ\tK\tL' | awk '!/^$/ {print $3}'

C
G
K

Dark wizardry ?

This awk command made me scratch my head quite a bit : it returns fields from 2 distinct lines, that even are not contiguous ().
To figure this out, I simplified it, and let the magic happen :

echo -e 'key1\tvalue1\nkey2\tvalue2' | awk '/key1|key2/ { printf $2 " " }'

value1 value2

Explanation :

the input (either an echo, a line "piped" in, or a whole file) is perfectly "normal" : there is no hack regarding field separators or end of line markers.
the /key1|key2/ part of the awk command is a "normal" regular expression alternation
the printf $2 " " part simply prints the 2^nd field of each matching line, followed by a space

So what's the trick ?
Let's have a deeper look at how awk works and what we're instructing it to do with the echo | awk command above :

no pre-process, so let's start eating lines and doing things
awk splits the input into distinct lines
awk reads the 1^st line : key1\tvalue1
does it match the regular expression ? Yes, so print the 2^nd field and a space character : value1
done with this line, continue with the next line
read the 2^nd line : key2\tvalue2
does it match the regular expression ? Yes again, so print the 2^nd field and a space character : value2
The trick is that awk does not automatically add a newline character after printing. So the output of any step is printed right after the output of the previous step. This is why, at this step of the procedure, the output looks like : value1 value2
done with this line, no next line
no post-process
the end !

Criteria	select matching lines	select not matching lines
line number within input	awk 'NR==n {doSomething}' echo -e 'a\nb\nc\nd' \| awk 'NR==3' c	echo -e 'a\nb\nc\nd' \| awk 'NR==3 {next}; {print}' a b d
line vs regular expression	awk '/regEx/ {doSomething}' echo -e 'foo\nbar\nbaz' \| awk '/bar/ {print $0}' bar echo -e 'foo\nPool ID : 1234\nbar\nID du pool : 4321\nbaz' \| awk '/(Pool ID\|ID du pool)/ {print $NF}' 1234 4321	awk '!/regEx/ {doSomething}' echo -e 'foo\nbar\nbaz' \| awk '!/a/ {print $0}' foo (source, example)
line vs number of fields Comparison Operators	echo -e 'field1\nfield1\tfield2\nfield1\tfield2\tfield3' \| awk 'NF == 2 {print $0}' field1 field2
field vs number Comparison Operators special case with trailing unit letter	echo -e 'foo\t12\nbar\t34\nbaz\t56' \| awk '$2 > 25 {print $0}' bar 34 baz 56 Awk is smart enough to strip leading zeroes : echo {01..10} \| awk '$3 > 2 { print "ok" }' ok echo {01..10} \| awk '$3 > 3 { print "ok" }' (void) echo {01..10} \| awk '$3 >= 3 { print "ok" }' ok echo {0001..10} \| awk '$3 >= 3 { print "ok" }' ok Trying to filter data based on line numbers returned by grep -n with a construct like : grep -n --color=always [options] \| awk -F ':' '$n > x {doSomething}' may fail because of the returned color codes. echo -e 'FOO\nBAR\nBAZ' \| grep -n --color=always '`.A`' \| awk -F '`:`' '$1>2 {print $0}' echo -e 'FOO\nBAR\nBAZ' \| grep -n '`.A`' \| awk -F '`:`' '$1>2 {print $0}'
field vs string	awk '$n == "value" {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 == "bar2" {print $0}' foo2 bar2 baz2	awk '$n != "value" {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 != "bar2" {print $0}' foo1 bar1 baz1 foo3 bar3 baz3
field vs regular expression limitations	awk '$n ~ /regEx/ {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 ~ /a.1/ {print $0}' foo1 bar1 baz1 find the shortest path : echo -e "bla dir1/\nbla dir1/dir2/\nbla dir1/dir2/dir3/" \| awk '$NF ~ /^[^/]*\/$/ {print $NF}' dir1/	awk '$n !~ /regEx/ {doSomething}' for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 !~ /a.1/ {print $0}' foo2 bar2 baz2 foo3 bar3 baz3
field vs regular expression with `if / else` construct (source)	for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '{ if($2 ~ "a.2") {print "MATCH : "$2 } else {print "NO MATCH"} }' NO MATCH MATCH : bar2 NO MATCH
several conditions	awk 'condition1 logicalOperator condition2 logicalOperator ... conditionN {doSomething}' logicalOperator can be (source) : `&&` : logical AND `\|\|` : logical OR for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$2 ~ "^ba.." && $3 == "baz3" {print $0}' foo3 bar3 baz3 for i in {1..3}; do echo "foo$i bar$i baz$i"; done \| awk '$1 ~ "1$" \|\| $3 ~ "3$" {print $0}' foo1 bar1 baz1 foo3 bar3 baz3 for i in {6..22}; do echo "a b c d e f g h $i"; done \| awk '$NF==7 \|\| $NF==21 {print $0}' a b c d e f g h 7 a b c d e f g h 21 echo \| awk '`1==1 && (2==1 \|\| 3==3)` { print "ok" }' ok

Awk logical operators

Error : awk: not an option: -i

Situation

Details

Solution

Awk examples