I like Bash scripting - Scripting's my favorite

The while ... read construct

Typical case : reading a file line by line :

tmpFile=$(mktemp)
echo -e 'Bob\nKevin\nStuart' > "$tmpFile";
while read name; do
	echo "Hello, '$name' !"
done < "$tmpFile"
rm "$tmpFile"
Hello, 'Bob' !
Hello, 'Kevin' !
Hello, 'Stuart' !
Leave leading and trailing spaces with IFS= :
tmpFile=$(mktemp)
echo -e ' Bob \n Kevin \n Stuart ' > "$tmpFile";
while read name; do
	echo "Hello, '$name' !"
done < "$tmpFile"
while IFS= read name; do
	echo "Hello, '$name' !"
done < "$tmpFile"
rm "$tmpFile"
Hello, 'Bob' !
Hello, 'Kevin' !
Hello, 'Stuart' !
Hello, ' Bob ' !
Hello, ' Kevin ' !
Hello, ' Stuart ' !
Trying to understand the -r :
tmpFile=$(mktemp)
echo -e 'apples\t12\nbananas\t3\ncoconuts\t42' > "$tmpFile";
while read fruit number; do
	echo -e "Number of '$fruit' : '$number'"
done < "$tmpFile"
while read -r fruit number; do
	echo -e "Number of '$fruit' : '$number'"
done < "$tmpFile"
rm "$tmpFile"
Number of 'apples' : '12'
Number of 'bananas' : '3'
Number of 'coconuts' : '42'
Number of 'apples' : '12'
Number of 'bananas' : '3'
Number of 'coconuts' : '42'
Makes no difference (not the right use case ? I'll have to further investigate this one.)

It also works with process substitution :

while read line; do
	echo "$line"
done < <(ps -u $(whoami) | head -10)
Remember : done < <(command)

... and with heredocs too :

while read line; do
	echo "$line"
done <<< $(ps -u $(whoami) | head -10)
Remember : done <<< $(command)

The if ... then ... else ... fi construct

Because I can never remember this construct :
if condition; then
	some
	commands
else
	some
	different
	commands
fi
There's no need for { } surrounding the then and else blocks.

Script error : Bad substitution

Situation :

Script execution fails :
./myScript.sh
./myScript.sh: errorLine: ./myScript.sh: Bad substitution

Details :

myScript.sh tries to use a substitution function (such as the $(...) construct or brace expansion) that is not supported by the shell used to execute it.

Solution :

  1. Have a look at the shebang line of myScript.sh. It's very likely that you'll see something like #!/bin/sh, because the author of this script "has always done like this and never had problems" ()
  2. Change the shebang to the shell you'd like to interpret myScript.sh (typically : #!/bin/bash)

cut vs awk : which is the fastest to extract data from a CSV ?

Situation :

I work a lot with CSV files, and one of the recurrent tasks is to extract the value of the nth field from the current line ($line). There are several methods to do so : Is one faster than the other ?

Details :

Let's script this :
#!/usr/bin/env bash

nbLines=5000
nbFieldsPerLine=100
fieldSize=10
fieldSeparator=';'

fieldToExtract=76

tmpFile=$(mktemp --tmpdir='/run/shm')
resultFile=$(mktemp --tmpdir='/run/shm')


showStep() {
	stepDescription=$1
	echo -e "\n$stepDescription"
	}


showMethod() {
	methodDescription=$1
	showStep "Getting the ${fieldToExtract}th field with the '$methodDescription' method"
	}


function getDurationOfAction() {
	action=$1
	{ time "$1"; } 2>&1 | awk '/real/ { print $2 }'
	}


showStep "Preparing source data : $nbLines lines of $nbFieldsPerLine fields ($fieldSize characters each)"
for((i=0; i<"$nbLines"; i++)); do
	pwgen "$fieldSize" -N "$nbFieldsPerLine" -1 | xargs | tr ' ' "$fieldSeparator" >> "$tmpFile"
done


showMethod 'variable=$(echo | cut)'
getData_echoCutVariableMethod() {
	while read line; do
		data=$(echo "$line" | cut -d "$fieldSeparator" -f "$fieldToExtract")
	done < "$tmpFile"
	}
getDurationOfAction 'getData_echoCutVariableMethod'


showMethod "echo | cut > $resultFile"
getData_echoCutResultFileMethod() {
	while read line; do
		echo "$line" | cut -d "$fieldSeparator" -f "$fieldToExtract" > "$resultFile"
	done < "$tmpFile"
	}
getDurationOfAction getData_echoCutResultFileMethod


showMethod 'variable=$(echo | awk)'
getData_echoAwkVariableMethod() {
	while read line; do
		data=$(echo "$line" | awk -F "$fieldSeparator" '{print $'$fieldToExtract'}')
	done < "$tmpFile"
	}
getDurationOfAction getData_echoAwkVariableMethod


showMethod "echo | awk > $resultFile"
getData_echoAwkResultFileMethod() {
	while read line; do
		echo "$line" | awk -F "$fieldSeparator" '{print $'$fieldToExtract'}' > "$resultFile"
	done < "$tmpFile"
	}
getDurationOfAction getData_echoAwkResultFileMethod


rm "$tmpFile" "$resultFile"
Preparing source data : 5000 lines of 100 fields (10 characters each)

Getting the 76th field with the 'variable=$(echo | cut)' method
0m7.684s

Getting the 76th field with the 'echo | cut > resultFile' method
0m5.623s

Getting the 76th field with the 'variable=$(echo | awk)' method
0m12.033s

Getting the 76th field with the 'echo | awk > resultFile' method
0m10.156s

Solution :

Summary :

  • cut is faster than awk
  • work on a RAMdisk whenever possible
  • remember cut can extract all specified fields from a file in a single operation :

The case ... esac construct

#!/bin/bash
something='hello world'
case $something in
	h*)
		echo 'foo'
		;;
	*)
		echo 'bar'
		;;
esac
foo
someVariable
  • this is what the various cases will be matched against
  • tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal are performed before matching is attempted
pattern)
  • shell pattern matching (not regexp), do not use quotes (details and examples)
  • tilde expansion, parameter expansion, command substitution, and arithmetic expansion are performed before matching is attempted
  • the final semicolons ;; are mandatory. They can not be omitted to create a "fallback" like in C or with the PHP switch
  • the first pattern that matches is executed, the following ones are skipped
Some of the extended pattern matching operators require the extglob shell option to be enabled (and will produce shell errors otherwise) :
shopt -s extglob

Bash functions

Function declaration syntax : function foo() {} or foo() {} (source) ?

There is (almost) no difference when working on GNU/Linux :

foo() {}
  • is the POSIX syntax
  • is more portable
  • has less chances of failing when moving scripts to other shells/proprietary Unices

Possible function declaration syntaxes
foo() commands
  • POSIX syntax
  • Supported by Bourne-like shells
function foo { commands; }
  • Korn shell syntax
  • NOT POSIX
  • Supported by Bash and Zsh for compatibility with Ksh
function foo() { commands; }
do not use this

More about shell functions (source)

List existing functions
You may have created + sourced some shell functions, either inline or from files such as ~/.bashrc and ~/.bash_aliases. You can list them all with : declare -F
Find where a function is defined
  • shopt -s extdebug; declare -F functionName
    functionName	lineNumber	/path/to/functions.sh
  • bash --debugger; declare -F functionName
    functionName	lineNumber	/path/to/functions.sh
View the code of a function
declare -f functionName

Functions and variable scope :

As stated here (source ?) :
  • All variables declared inside a function will be shared with the calling environment.
  • All variables declared local will not be shared.

Bash scripting : using the right shebang

The shebang is the first line of a script (shell, Python, PERL, ...) which instructs the operating system of which binary should be used to interpret and execute the script commands. shebangs usually start with #!, optionally followed by a space.
As for shell scripts, and especially Bash scripts, there are several flavors of shebangs :

Shebang Pro's Con's
#!/bin/sh short and simple
  • expects /bin/sh to symlink to /bin/bash, which is common but not mandatory / may change
  • works only if not using Bash-specific commands or options. Otherwise, this could lead to weird bugs / undesired side-effects, which is why I discourage using this method : the #!/bin/bash shebang is safer with only 2 extra keystrokes
#!/bin/bash calling THE Bash binary with its absolute path : short, simple, efficient. This is the safest. some may argue this is less portable because the Bash binary may not be /bin/bash but /usr/bin/bash or /usr/local/bin/bash or ... (but I guess symlinks would be created adequately in such situations anyway)
#!/usr/bin/env bash find the Bash binary wherever it is (it picks the 1st answer from the output of env) : different install path (system-wide), customized path (user-level setting). This is more portable.
  • could be used to execute a rogue Bash binary
  • users with customized $PATH could refer to different binaries and experience different behaviors of the same script

Bash loops

for loops :

More examples
  • ugly : start=37; stop=73; increment=7; for i in $(eval echo "{$start..$stop..$increment}"); do echo $i; done (source)
  • better : start=37; stop=73; increment=7; for((i=$start; i<$stop; i+=$increment)); do echo $i; done

for vs while (source) :

  • ugly : for line in $(cat file.txt); do echo $line; done
  • ugly again (UUOC) : cat file.txt | while read line; do echo $line; done
  • better : while read line; do echo $line; done < file.txt (more)
    This is better because you don't need to spawn a sub-process with |, or with $(...), or start the external cat command.

An until loop :

a=5; until [ "$a" -eq 2 ]; do echo $a; a=$((a-1)); done

until ping -c 1 192.168.144.118 & >/dev/null; do echo -n '.'; sleep 1; done; echo OK

url='http://x.y.z.y/page/content/changing/upon/reload/'
needle='some text'
tmpFile=$(mktemp --tmpdir tmp.XXXXXXXX)
loops=0
until grep -i "$needle" $tmpFile; do
	loops=$((loops+1))
	>$tmpFile
	wget $url -O $tmpFile
done
echo "Number of loops : $loops"

Associative arrays in Bash

With Bash 4, it is possible to use associative arrays (but not nested associative arrays : source).
As an alternative to associative arrays, you can use tuples.

Create an associative array :

declare -A myArray
myArray[foo]='this is "foo"...'
myArray[bar]='this is "bar"...'
myArray[baz]='this is "baz"...'

Browse by keys :

for key in "${!myArray[@]}"; do
	echo -e "key :\t$key"
	echo -e "value :\t${myArray[$key]}\n"
done
key :	bar
value :	this is "bar"...

key :	baz
value :	this is "baz"...

key :	foo
value :	this is "foo"...

Browse by values :

for value in "${myArray[@]}"; do
	echo -e "value :\t$value\n"
done
value : this is "bar"...

value : this is "baz"...

value : this is "foo"...

Variables holding a path in shell scripts

This article is about coding styles, which is not only personal but also far from perfect (by design ). When it comes to declare a variable in a shell script to store a path, I see at least 2 method to do so, and so far I've still not settled on one or on the other (so writing this article may help) :

Method 1 : the trailing / is in the value :

somePath='/path/to/directory/'

foo="${somePath}foo"
bar="${somePath}bar"

Method 2 : no trailing / in the value :

somePath='/path/to/directory'

foo="$somePath/foo"
bar="$somePath/bar"
Pro's Con's
method
1
  • Error using variable : somePath='/path/to/directory/' then baz="$somePath/baz" (instead of baz="${somePath}baz") would be interpreted as /path/to/directory//baz : ugly but works
  • Error declaring variable : somePath='/path/to/directory' (missing trailing /) then baz="${somePath}baz" will be interpreted as : /path/to/directorybaz : error
  • The { and } are slow to type, and add some visually-cryptic-characters around the variable name which is bad for readability.
method
2
  • Error declaring variable : somePath='/path/to/directory/' (extra trailing /) then baz="$somePath/baz" will be interpreted as : /path/to/directory//baz : ugly but works
  • Better readability
  • Less keystrokes
  • Error using variable : somePath='/path/to/directory' then baz="${somePath}baz" (instead of baz="$somePath/baz") would be interpreted as /path/to/directorybaz : error
    This is less likely to happen because of the extra keystrokes involved.

Counting keystrokes :

The method 1 is 70 characters long, and the method 2 : 67. But after removing characters that are typed in both methods (variable names, =, $, , quotes, ...), we get (with { and } being 2 keystrokes each on a french keyboard) :

Conclusion :

Let's settle on method 2 ! (until we find further arguments )

How to load tuples in a shell script ?

data='key1;value1 key2;value2'; for tuple in $data; do key=$(echo $tuple | cut -d ';' -f 1); value=$(echo $tuple | cut -d ';' -f 2); echo "tuple: '$tuple', key : '$key', value : '$value'"; done

Real-world example : testing a list of user accounts on a FTP server :

ftpHost='my.ftp.server'; ftpPort=21; data='joe;123456 jack;password william;qwerty averell;averell'; for tuple in $data; do login=$(echo $tuple | cut -d ';' -f 1); password=$(echo $tuple | cut -d ';' -f 2); echo "Trying account '$login/$password' :"; curl --insecure --ftp-ssl --ftp-pasv --user "$login:$password" "ftp://$ftpHost:$ftpPort/"; echo; done

How to use booleans in Bash ?

Playing with true and false (source) :

theWorldIsFlat=true
# ...do something interesting...
if $theWorldIsFlat; then
	echo 'Be careful not to fall off!'
fi

Bash doesn't know booleans. This hack works because $myVariable is replaced by true at run time, which returns a Unix success (same would go on with false, returning a Unix failure).
If you're not convinced $myVariable is an alias of the true command, try these :

myVariable=true; if $myVariable; then echo OK; fi
OK
myVariable=false; if $myVariable; then echo OK; fi
(nothing)
myVariable=ooops; if $myVariable; then echo OK; fi
bash: ooops : command not found

More fun ?

true && true && echo A || echo B
A
true && false && echo A || echo B
B

Testing return codes :

#!/usr/bin/env bash

UNIX_SUCCESS=0
UNIX_FAILURE=1

returnBoolean() {
	wantedReturnValue="$1"

	case "$wantedReturnValue" in
		"$UNIX_SUCCESS")
			return $(true)
			;;
		"$UNIX_FAILURE")
			return $(false)
			;;
	esac
	}

for result in $UNIX_SUCCESS $UNIX_FAILURE; do
	returnBoolean $result
	returnCode=$?
	[ "$returnCode" -eq "$result" ] && echo OK || echo KO
done

set and unset variables :

unset a; [ "$a" ] && echo OK || echo KO; a=1; [ "$a" ] && echo OK || echo KO; a=0; [ "$a" ] && echo OK || echo KO

KO
OK
OK

"Variable variables" a.k.a dynamic variables

Usage :

A dynamic variable is a variable holding the name of another variable.

Example :

A basic example :

#!/usr/bin/env bash

myVar='Hello'
dynamicVar="myVar"

echo "The value of '$dynamicVar' is '${!dynamicVar}'."
Will output :

The value of 'myVar' is 'Hello'.

Checking a list of variables :

This snippet checks that none of the variables from the list is an empty string :
for variableToCheck in variableA variableB variableC; do
	echo "The variable '$variableToCheck' has value : '${!variableToCheck}'."
	if [ -z "${!variableToCheck}" ]; then
		<deal with it !>
	fi
done

When it comes to check the input of a script in search of missing parameters, it is better to simply count parameters rather than testing all variables. Indeed, when expecting command parameterA parameterB and getting command foo, you can't tell which of parameter A or B is missing (unless you're using named parameters, of course).

List all permutations of a list of lists, then list all items :

#!/usr/bin/env bash

listOfLists='colors fruits cars'
colors='red green blue'
fruits='apple banana grape'
cars='ferrari porsche lada'

output=''
# generate all permutations of the list of lists
for aList in $listOfLists; do
	for anotherList in $listOfLists; do
		[ "$anotherList" = "$aList" ] && continue
		for oneMoreList in $listOfLists; do
			[ "$oneMoreList" = "$anotherList" ] || [ "$oneMoreList" = "$aList" ] && continue

			output="$output\nLISTS: 1.$aList 2.$anotherList 3.$oneMoreList"

			# we now have all the list of lists, let's list the contents of all those lists
			for item1 in ${!aList}; do
				for item2 in ${!anotherList}; do
					for item3 in ${!oneMoreList}; do
						output="$output\n$item1 $item2 $item3"
					done
				done
			done
			# contents over

		done
	done
done
echo -e "$output" | column -s ' ' -t

Bash tests : [ ... -x ... ], [ ... = ... ], [[ ... =~ ... ]], ...

File operators (source 1, 2) :

Option true if ...
-b file is a block device special file, such as /dev/sda :
brw-rw---T 1 root disk 8, 0 sept. 22 14:23 /dev/sda
-d file is a directory
-e file exists, whatever type of file it is
-f file is a regular file, not a directory or a device
-h -L file is a symbolic link
-s file has data (i.e. is not zero-sized)
! bitwise NOT. Must be followed by a whitespace.

String operators (source) :

It is a safe practice to always quote tested strings.

Option Usage
-n string is not null, i.e. has length > 0
-z string is null, i.e. has length == 0
is equal to
[ "$a" = "$b" ]
[ "$a" == "$b" ]
[[ "$a" == "$b" ]]
is not equal to
[ "$a" != "$b" ]
[[ "$a" != "$b" ]]

Integer operators (source) :

is equal to
[ "$a" -eq "$b" ]
[[ "$a" = "$b" ]]
[[ "$a" == "$b" ]]
is not equal to
[ "$a" -ne "$b" ]
[[ "$a" != "$b" ]]

Logical operators (sources : 1, 2) :

For compound tests, you can use things like :
AND
if [ $condition1 ] && [ $condition2 ]
if [[ $condition1 && $condition2 ]]
if [ $condition1 -a $condition2 ]
OR
if [ $condition1 ] || [ $condition2 ]
if [[ $condition1 || $condition2 ]]
if [ $condition1 -o $condition2 ]

Unary if :

Some languages have a unary if operator :
if(true) {
	...
	}
Bash allows to mimic a unary if operator, but this looks error-prone (read below). Instead, this looks more reliable :
  • if [ "$myVar" -eq "$UNIX_SUCCESS" ]; then
    	...
    	fi
  • if [[ "$myVar" == "$UNIX_SUCCESS" ]]; then
    	...
    	fi
Unary if and booleans :
Be very careful with these constructs as they may have counter-intuitive results :
true; echo $?; false; echo $?; [ true ]; echo $?; [ false ]; echo $?; [[ true ]]; echo $?; [[ false ]]; echo $?
0
1
0
0
0
0
  • The [ whatever ] construct is short for [ -n whatever ] and tests the string length (source, details, examples)
    [ true ]; echo $?; [ false ]; echo $?; [ hello ]; echo $?; [ '' ]; echo $?
    0		actually testing the non-empty string true, so UNIX_SUCCESS
    0
    0
    1		testing an empty string, so UNIX_FAILURE
  • Same goes on when whatever evaluates to a string (i.e. command result or value returned by a function) : it will be checked with -n :
    [ $(true) ]; echo $?; [ $(false) ]; echo $?; [ $(echo hello) ]; echo $?; [ $(echo -n '') ]; echo $?
    1		true returns a UNIX_SUCCESS but displays nothing (i.e. empty string)
    1		similar reason for false
    0		echo will always display a string...
    1		...unless properly silenced 
  • If you just want to test an exit status, you don't need [ ... ], just chain commands with the proper operators.
  • Same goes on for [[ ... ]].

Regex operators :

Test a match (source : 1, 2) :
filesystemSize='1234M'; [[ "$filesystemSize" =~ ^[0-9]+[KMGT] ]] && echo match || echo 'no match'
match
filesystemSize='123'; [[ "$filesystemSize" =~ ^[0-9]+[KMGT] ]] && echo match || echo 'no match'
no match
ip='10.122.47.127'; regexp="([0-9]{1,3}\.){3}[0-9]{1,3}"; [[ $ip =~ $regexp ]] && echo OK || echo KO
OK
How to check a string against a list of values ?
  • The regular expression itself mustn't be quoted.
  • If the regular expression becomes slightly complex (or contains a Bash variable), it should be stored in an extra variable.
  • Should the regular expression contain a SPACE character, it must be escaped : \ .
Test a no-match (source) :
string='abcdef'; [[ ! "$string" =~ 123 ]] && echo A || echo B
A
string='abcd123ef'; [[ ! "$string" =~ 123 ]] && echo A || echo B
B
Retrieve matched substrings (source) :
string='abcdef'; [[ $string =~ .(.).(.).(.) ]]; for i in {1..3}; do echo -n ${BASH_REMATCH[$i]}; done
bdf

About test, [ and [[ (source : 1, 2) :

[ (test command) and [[ ("new test" command) are used to evaluate expressions :

  • [ and test are available in POSIX shells.
  • [[ works only in Bash, Zsh and the Korn shell, and is more powerful.

Bash script flags

Many flags can be moved up or down to alter the behavior of Bash scripts (defensive scripting, debugging, ...). Flags can be raised by 2 means : To lower a flag : set +flag
Flag Usage
-e Exit immediately if a simple-command exits with a non-zero status.
-n read commands but do not execute them
-u leave script and display an error message when using an unset variable
-v show the code as it is read
-x show the code as it is executed

Notes and examples with -e (details) :

This doesn't work on :

  • commands within if, until, while block
  • compound commands (list using && or ||)
  • commands with return value being inverted via !

set -e; echo -n 'hello'; true; echo ' world'
Outputs : hello world
echo "set -e; echo -n 'hello'; false; echo ' world'" | bash
Outputs : hello (the echo ... | bash hack is just to be able to see the result, since because of the false, an exit is executed, forcing to leave the current shell)
set -e; dir='/tmp'; if [ -d "$dir" ] ; then echo "$dir exists"; else echo "$dir does not exist"; fi
Outputs : /tmp exists
No non-success exit code met, -e keeps sleeping.
set -e; dir='/aDirThatDoesNotExist'; if [ -d "$dir" ] ; then echo "$dir exists"; else echo "$dir does not exist"; fi
Outputs : /aDirThatDoesNotExist does not exist
A non-success exit code is met, but -e is muzzled by if.
set -e; dir='/tmp'; [ -d "$dir" ] && echo "$dir exists" || echo "$dir does not exist"
Outputs : /tmp exists
No non-success exit code met, -e keeps sleeping.
set -e; dir='/aDirThatDoesNotExist'; [ -d "$dir" ] && echo "$dir exists" || echo "$dir does not exist"
Outputs : /aDirThatDoesNotExist does not exist
A non-success exit code is met, but -e is muzzled by ????.
#!/usr/bin/env bash
set -e
echo -n 'hello'
true
echo ' world'
Outputs : hello world
#!/usr/bin/env bash
set -e
echo -n 'hello'
false
echo ' world'
Outputs : hello, and returns the exit code 1
#!/usr/bin/env bash
set -e
echo -n 'hello'
if true; then
	echo -n ' wonderful'
fi
echo ' world'
Outputs : hello wonderful world
#!/usr/bin/env bash
set -e
echo -n 'hello'
if false; then
	echo -n ' wonderful'
fi
echo ' world'
Outputs : hello world, and returns the exit code 0.
A non-success exit code is met, but -e is muzzled by if.

Opportunity for a joke :

If you run set -e in a terminal, this will affect the current shell and any further command your "victim" will type. At the 1st non-success return code met (which is VERY easy : try TAB-completing like cd TAB), an exit will be fired, closing the terminal

If you _unintentionally_ run that joke on yourself (), you can disable the -e flag with : set +e