I like Bash scripting - Scripting's my favorite

mail

Script error : Bad substitution

Situation

Script execution fails :
./myScript.sh
./myScript.sh: errorLine: ./myScript.sh: Bad substitution

Details

myScript.sh tries to use a substitution function (such as the $() construct or brace expansion) that is not supported by the shell used to execute it.

Solution

  1. Have a look at the shebang line of myScript.sh. It's very likely that you'll see something like #!/bin/sh, because the author of this script "has always done like this and never had problems" ()
  2. Change the shebang to the shell you'd like to interpret myScript.sh (typically : #!/bin/bash)
mail

cut vs awk : which is the fastest to extract data from a CSV ?

Situation

I work a lot with CSV files, and one of the recurrent tasks is to extract the value of the nth field from the current line ($line). There are several methods to do so : Is one faster than the other ?

Details

Let's script this :
#!/usr/bin/env bash

nbLines=5000
nbFieldsPerLine=100
fieldSize=10
fieldSeparator=';'

fieldToExtract=76

tmpFile=$(mktemp --tmpdir='/run/shm')
resultFile=$(mktemp --tmpdir='/run/shm')


showStep() {
	stepDescription=$1
	echo -e "\n$stepDescription"
	}


showMethod() {
	methodDescription=$1
	showStep "Getting the ${fieldToExtract}th field with the '$methodDescription' method"
	}


function getDurationOfAction() {
	action=$1
	{ time "$1"; } 2>&1 | awk '/real/ { print $2 }'
	}


showStep "Preparing source data : $nbLines lines of $nbFieldsPerLine fields ($fieldSize characters each)"
for((i=0; i<"$nbLines"; i++)); do
	pwgen "$fieldSize" -N "$nbFieldsPerLine" -1 | xargs | tr ' ' "$fieldSeparator" >> "$tmpFile"
done


showMethod 'variable=$(echo | cut)'
getData_echoCutVariableMethod() {
	while read line; do
		data=$(echo "$line" | cut -d "$fieldSeparator" -f "$fieldToExtract")
	done < "$tmpFile"
	}
getDurationOfAction 'getData_echoCutVariableMethod'


showMethod "echo | cut > $resultFile"
getData_echoCutResultFileMethod() {
	while read line; do
		echo "$line" | cut -d "$fieldSeparator" -f "$fieldToExtract" > "$resultFile"
	done < "$tmpFile"
	}
getDurationOfAction getData_echoCutResultFileMethod


showMethod 'variable=$(echo | awk)'
getData_echoAwkVariableMethod() {
	while read line; do
		data=$(echo "$line" | awk -F "$fieldSeparator" '{print $'$fieldToExtract'}')
	done < "$tmpFile"
	}
getDurationOfAction getData_echoAwkVariableMethod


showMethod "echo | awk > $resultFile"
getData_echoAwkResultFileMethod() {
	while read line; do
		echo "$line" | awk -F "$fieldSeparator" '{print $'$fieldToExtract'}' > "$resultFile"
	done < "$tmpFile"
	}
getDurationOfAction getData_echoAwkResultFileMethod


rm "$tmpFile" "$resultFile"
Preparing source data : 5000 lines of 100 fields (10 characters each)

Getting the 76th field with the 'variable=$(echo | cut)' method
0m7.684s

Getting the 76th field with the 'echo | cut > resultFile' method
0m5.623s

Getting the 76th field with the 'variable=$(echo | awk)' method
0m12.033s

Getting the 76th field with the 'echo | awk > resultFile' method
0m10.156s

Solution

Summary :

  • cut is faster than awk
  • work on a RAMdisk whenever possible
  • remember cut can extract all specified fields from a file in a single operation :

Alternate method for parsing a CSV file (source) :

The methods shown above retrieve CSV data fields in a 2-step process :
  1. read the line from the CSV file
  2. split the line into fields and store values in the corresponding variables
It is possible to get CSV values in a single operation like this :
while IFS=, read -r field1 field2; do
	# do something with "$field1" and "$field2"
done < input.csv
Depending on the context, one of these methods may be more appropriate :
  • The cut / awk methods are not the "pure Bash" ones but they can prove useful when the source has MANY fields (like a log file) and you're only interested in SOME of them, not ALL.
  • The IFS= + read -r is the "proper" way of doing this, but it requires to name ALL the data fields, even if they're not used inside the loop. Moreover, this can make the while... line longer and decrease code readability when there are MANY data fields.
mail

Bash functions

Function declaration syntax : function foo() {} or foo() {} (source) ?

There is (almost) no difference when working on GNU/Linux :

foo() {}
  • is the POSIX syntax
  • is more portable
  • has less chances of failing when moving scripts to other shells/proprietary Unices

Possible function declaration syntaxes

foo() { commands }
  • POSIX syntax
  • Supported by Bourne-like shells (Bash, )
function foo { commands }
function foo() { commands }
do not use this !

More about shell functions (source)

List existing functions

You may have created + sourced some shell functions, either inline or from files such as ~/.bashrc and ~/.bash_aliases. You can list them all with : declare -F

Find where a function is defined

  • shopt -s extdebug; declare -F functionName
    functionName	lineNumber	/path/to/functions.sh
  • bash --debugger; declare -F functionName
    functionName	lineNumber	/path/to/functions.sh

View the code of a function

declare -f functionName

Functions and variable scope (source) :

Variables that are declared :
  • inside a function will be shared with the calling environment
  • local will not be shared
mail

Bash scripting : using the right shebang

The shebang is the first line of a script (shell, Python, PERL, ...) which instructs the operating system of which binary should be used to interpret and execute the script commands. shebangs usually start with #!, optionally followed by a space.
As for shell scripts, and especially Bash scripts, there are several flavors of shebangs :

Shebang Pro's Con's
#!/bin/sh short and simple
  • expects /bin/sh to symlink to /bin/bash, which is common but not mandatory / may change
  • works only if not using Bash-specific commands or options. Otherwise, this could lead to weird bugs / undesired side-effects, which is why I discourage using this method : the #!/bin/bash shebang is safer with only 2 extra keystrokes
#!/bin/bash calling THE Bash binary with its absolute path : short, simple, efficient. This is the safest. some may argue this is less portable because the Bash binary may not be /bin/bash but /usr/bin/bash or /usr/local/bin/bash or ... (but I guess symlinks would be created adequately in such situations anyway)
#!/usr/bin/env bash find the Bash binary wherever it is (it picks the 1st answer from the output of env) : different install path (system-wide), customized path (user-level setting). This is more portable.
  • could be used to execute a rogue Bash binary
  • users with customized $PATH could refer to different binaries and experience different behaviors of the same script

mail

Bash arrays

Table of contents

Indexed arrays (source)

Initialize an array :

declare -a myArray

myArray[0]='john'
myArray[1]='paul'
myArray[2]='george'
myArray[3]='ringo'
alternate syntax (it is valid to declare + assign at once) :
declare -a myArray=(john paul george ringo)

Display array values :

all at once :
echo "${myArray[@]}"
john paul george ringo
a single one :
echo "${myArray[2]}"
george

Browse array in a for loop :

for item in "${myArray[@]}"; do
	echo "$item"
done
john
paul
george
ringo

Append a value to the array :

myArray+=('the_5th_guy')
echo "${myArray[@]}"
john paul george ringo the_5th_guy

Associative arrays (source)

With Bash 4, it is possible to use associative arrays (but not nested associative arrays : source).
As an alternative to associative arrays, you can use tuples.

Create an associative array :

declare -A myArray
myArray[foo]='this is "foo"'
myArray[bar]='this is "bar"'
myArray[baz]='this is "baz"'

Browse by keys :

for key in "${!myArray[@]}"; do
	echo -e "key :\t$key"
	echo -e "value :\t${myArray[$key]}\n"
done
key :	bar
value :	this is "bar"

key :	baz
value :	this is "baz"

key :	foo
value :	this is "foo"

Browse by values :

for value in "${myArray[@]}"; do
	echo -e "value :\t$value\n"
done
value : this is "bar"

value : this is "baz"

value : this is "foo"

Array length

When dealing with Bash arrays, some documents declare that ${#myArray} is :
  • the length of myArray, i.e. the number of items it contains
  • the maximum index value found in myArray
No idea whether this was true in previous Bash versions.
As of today (summer 2020), the ${#} construct has only 1 meaning : return the length of the enclosed string. What's tricky with ${#myArray} is that omitting to specify an index within the array actually refers to its 1st item (aka myArray[0], sources : 1, 2).
${#myArray} is the length of the 1st item of myArray

Let's check it :

for i in a ab abc abcd; do unset myArray; declare -a myArray; myArray+=($i); echo ${#myArray}; done
1
2
3
4

How can I get the number of items in an array ?

Use ${#myArray[@]} :
declare -a fruits; fruits=(apple banana coconut); echo ${#fruits[@]}; echo ${#fruits}
3	number of fruits
5	length of string apple

Arrays can be sparse :

declare -a myArray; myArray=(Lorem ipsum dolor sit amet); for index in "${!myArray[@]}"; do echo "$index ${myArray[index]}"; done; echo -e "Length : ${#myArray[@]}\n"; unset myArray[3]; for index in "${!myArray[@]}"; do echo "$index ${myArray[index]}"; done; echo "Length : ${#myArray[@]}"
0 Lorem
1 ipsum
2 dolor
3 sit
4 amet
Length : 5

0 Lorem
1 ipsum
2 dolor
4 amet	no more myArray[3]
Length : 4
mail

Variables holding a path in shell scripts

This article is about coding styles, which is not only personal but also far from perfect (by design ). When it comes to declare a variable in a shell script to store a path, I see at least 2 method to do so, and so far I've still not settled on one or on the other (so writing this article may help) :

Method 1 : the trailing / is in the value :

somePath='/path/to/directory/'

foo="${somePath}foo"
bar="${somePath}bar"

Method 2 : no trailing / in the value :

somePath='/path/to/directory'

foo="$somePath/foo"
bar="$somePath/bar"
Pro's Con's
method
1
  • Error using variable : somePath='/path/to/directory/' then baz="$somePath/baz" (instead of baz="${somePath}baz") would be interpreted as /path/to/directory//baz : ugly but works
  • Error declaring variable : somePath='/path/to/directory' (missing trailing /) then baz="${somePath}baz" will be interpreted as : /path/to/directorybaz : error
  • The { and } are slow to type, and add some visually-cryptic-characters around the variable name which is bad for readability.
method
2
  • Error declaring variable : somePath='/path/to/directory/' (extra trailing /) then baz="$somePath/baz" will be interpreted as : /path/to/directory//baz : ugly but works
  • Better readability
  • Less keystrokes
  • Error using variable : somePath='/path/to/directory' then baz="${somePath}baz" (instead of baz="$somePath/baz") would be interpreted as /path/to/directorybaz : error
    This is less likely to happen because of the extra keystrokes involved.

Counting keystrokes :

The method 1 is 70 characters long, and the method 2 : 67. But after removing characters that are typed in both methods (variable names, =, $, , quotes, ...), we get (with { and } being 2 keystrokes each on a french keyboard) :

Conclusion :

Let's settle on method 2 ! (until we find further arguments )

mail

How to load tuples in a shell script ?

Method 1 :

data='key1;value1 key2;value2'; for tuple in $data; do key=$(echo $tuple | cut -d ';' -f 1); value=$(echo $tuple | cut -d ';' -f 2); echo "tuple: '$tuple', key : '$key', value : '$value'"; done

Real-world example : testing a list of user accounts on a FTP server :

ftpHost='myFtpServer'; ftpPort=21; data='joe;123456 jack;password william;qwerty averell;averell'; for tuple in $data; do login=$(echo $tuple | cut -d ';' -f 1); password=$(echo $tuple | cut -d ';' -f 2); echo "Trying account '$login/$password' :"; curl --insecure --ftp-ssl --ftp-pasv --user "$login:$password" "ftp://$ftpHost:$ftpPort/"; echo; done

Method 2 :

#!/usr/bin/env bash

data=" \		without this, the 1st round has empty values
key1 value1
key2 value2"		if the closing " is on a new line, the last round has empty values

while read key value; do
	echo "key: '$key', value: '$value'"
done <<< "$data"
mail

How to use booleans in Bash ?

Playing with true and false (source) :

theWorldIsFlat=true
# ...do something interesting...
if $theWorldIsFlat; then
	echo 'Be careful not to fall off!'
fi

Bash doesn't know booleans. This hack works because $myVariable is replaced by true at run time, which returns a Unix success (same would go on with false, returning a Unix failure).
If you're not convinced $myVariable is an alias of the true command, try these :

myVariable=true; if $myVariable; then echo OK; fi
OK
myVariable=false; if $myVariable; then echo OK; fi
(nothing)
myVariable=ooops; if $myVariable; then echo OK; fi
bash: ooops : command not found

More fun ?

true && true && echo A || echo B
A
true && false && echo A || echo B
B

Testing return codes :

#!/usr/bin/env bash

UNIX_SUCCESS=0
UNIX_FAILURE=1

returnBoolean() {
	wantedReturnValue="$1"

	case "$wantedReturnValue" in
		"$UNIX_SUCCESS")
			return $(true)
			;;
		"$UNIX_FAILURE")
			return $(false)
			;;
	esac
	}

for result in $UNIX_SUCCESS $UNIX_FAILURE; do
	returnBoolean $result
	returnCode=$?
	[ "$returnCode" -eq "$result" ] && echo OK || echo KO
done

set and unset variables :

unset a; [ "$a" ] && echo OK || echo KO; a=1; [ "$a" ] && echo OK || echo KO; a=0; [ "$a" ] && echo OK || echo KO

KO
OK
OK
mail

"Variable variables" a.k.a dynamic variables

Usage

A dynamic variable is a variable holding the name of another variable.

Example

A basic example :

#!/usr/bin/env bash

myVar='Hello'
dynamicVar="myVar"

echo "The value of '$dynamicVar' is '${!dynamicVar}'."
Will output :

The value of 'myVar' is 'Hello'.

Checking a list of variables :

This snippet checks that none of the variables from the list is an empty string :
for variableToCheck in variableA variableB variableC; do
	echo "The variable '$variableToCheck' has value : '${!variableToCheck}'."
	if [ -z "${!variableToCheck}" ]; then
		<deal with it !>
	fi
done

When it comes to check the input of a script in search of missing parameters, it is better to simply count parameters rather than testing all variables. Indeed, when expecting command parameterA parameterB and getting command foo, you can't tell which of parameter A or B is missing (unless you're using named parameters, of course).

List all permutations of a list of lists, then list all items :

#!/usr/bin/env bash

listOfLists='colors fruits cars'
colors='red green blue'
fruits='apple banana grape'
cars='ferrari porsche lada'

output=''
# generate all permutations of the list of lists
for aList in $listOfLists; do
	for anotherList in $listOfLists; do
		[ "$anotherList" = "$aList" ] && continue
		for oneMoreList in $listOfLists; do
			[ "$oneMoreList" = "$anotherList" ] || [ "$oneMoreList" = "$aList" ] && continue

			output="$output\nLISTS: 1.$aList 2.$anotherList 3.$oneMoreList"

			# we now have all the list of lists, let's list the contents of all those lists
			for item1 in ${!aList}; do
				for item2 in ${!anotherList}; do
					for item3 in ${!oneMoreList}; do
						output="$output\n$item1 $item2 $item3"
					done
				done
			done
			# contents over

		done
	done
done
echo -e "$output" | column -s ' ' -t
mail

Bash tests : [ -x ], [ = ], [[ =~ ]], ...

File operators (source 1, 2) :

In the manual, these tests are phrased : if file exists and is . For the sake of brevity, I've cut the file exists part because no file can be if it doesn't exist .
Option true if ...
-b file is a block device special file, such as /dev/sda :
brw-rw---T 1 root disk 8, 0 sept. 22 14:23 /dev/sda
-d file is a directory
-e file exists, whatever type of file it is
-f file is a regular file, not a directory or a device
-h -L file is a symbolic link
-r file is readable
-s file has data (i.e. is not zero-sized)
[ -t fd ] true if file descriptor fd is open and refers to a terminal (which highlights an interactive shell when testing the 0, 1 or 2 file descriptors)
-x file is executable
! bitwise NOT. Must be followed by a whitespace.

String operators (source) :

It is a safe practice to always quote tested strings.

Option Usage
-n string length is not null (length > 0)
-z string length is zero (length == 0)

empty vs unset strings :

nonEmptyString='hello'; [ -n "$nonEmptyString" ] && echo A || echo B; [ -z "$nonEmptyString" ] && echo C || echo D
A
D

emptyString=''; [ -n "$emptyString" ] && echo A || echo B; [ -z "$emptyString" ] && echo C || echo D
B
C

unset unsetString; [ -n "$unsetString" ] && echo A || echo B; [ -z "$unsetString" ] && echo C || echo D
B
C
Bash makes no difference between empty and unset strings, and considers both as null (source).
is equal to
[ "$a" = "$b" ]
[ "$a" == "$b" ]
[[ "$a" == "$b" ]]
is not equal to
[ "$a" != "$b" ]
[[ "$a" != "$b" ]]

Integer operators (source) :

is equal to
[ "$a" -eq "$b" ]
[[ "$a" = "$b" ]]
[[ "$a" == "$b" ]]
is not equal to
[ "$a" -ne "$b" ]
[[ "$a" != "$b" ]]

Logical operators (sources : 1, 2) :

For compound tests, you can use things like :
AND
if [ $condition1 ] && [ $condition2 ]
if [[ $condition1 && $condition2 ]]
if [ $condition1 -a $condition2 ]
OR
if [ $condition1 ] || [ $condition2 ]
if [[ $condition1 || $condition2 ]]
if [ $condition1 -o $condition2 ]

Unary if :

Some languages have a unary if operator :
if(true) {
	
	}
Bash allows to mimic a unary if operator, but this looks error-prone (read below). Instead, this looks more reliable :
  • if [ "$myVar" -eq "$UNIX_SUCCESS" ]; then
    	
    	fi
  • if [[ "$myVar" == "$UNIX_SUCCESS" ]]; then
    	
    	fi

Unary if and booleans :

Be very careful with these constructs as they may have counter-intuitive results :
true; echo $?; false; echo $?; [ true ]; echo $?; [ false ]; echo $?; [[ true ]]; echo $?; [[ false ]]; echo $?
0
1
0
0
0
0
  • The [ whatever ] construct is short for [ -n whatever ] and tests the string length (source, details, examples)
    [ true ]; echo $?; [ false ]; echo $?; [ hello ]; echo $?; [ '' ]; echo $?
    0		actually testing the non-empty string true, so UNIX_SUCCESS
    0
    0
    1		testing an empty string, so UNIX_FAILURE
  • Same goes on when whatever evaluates to a string (i.e. command result or value returned by a function) : it will be checked with -n :
    [ $(true) ]; echo $?; [ $(false) ]; echo $?; [ $(echo hello) ]; echo $?; [ $(echo -n '') ]; echo $?
    1		true returns a UNIX_SUCCESS but displays nothing (i.e. empty string)
    1		similar reason for false
    0		echo will always display a string...
    1		...unless properly silenced 
  • If you just want to test an exit status, you don't need [ ], just chain commands with the proper operators.
  • Same goes on for [[ ]].

Regex operators :

Test a match (source) :

filesystemSize='1234M'; [[ "$filesystemSize" =~ ^[0-9]+[KMGT] ]] && echo match || echo 'no match'
match
filesystemSize='123'; [[ "$filesystemSize" =~ ^[0-9]+[KMGT] ]] && echo match || echo 'no match'
no match
ip='10.122.47.127'; regexp="([0-9]{1,3}\.){3}[0-9]{1,3}"; [[ $ip =~ $regexp ]] && echo OK || echo KO
OK
How to check a string against a list of values ?
  • The regular expression itself mustn't be quoted.
  • If the regular expression becomes slightly complex (or contains a Bash variable), it should be stored in an extra variable.
  • Should the regular expression contain a SPACE character, it must be escaped : \ .
  • =~ supports ERE (source).

Test a no-match (source) :

string='abcdef'; [[ ! "$string" =~ 123 ]] && echo A || echo B
A
string='abcd123ef'; [[ ! "$string" =~ 123 ]] && echo A || echo B
B

Retrieve matched substrings (source) :

string='abcdef'; [[ $string =~ .(.).(.).(.) ]]; for i in {1..3}; do echo -n ${BASH_REMATCH[$i]}; done
bdf

About test, [ and [[ (source : 1, 2) :

[ (test command) and [[ ("new test" command) are used to evaluate expressions :

  • [ and test are available in POSIX shells.
  • [[ works only in Bash, Zsh and the Korn shell, and is more powerful.

Examples :

[ hello ] && echo ok || echo ko
ok								testing any non-empty string is a success (details)

[ '' ] && echo ok || echo ko
ko								testing an empty string is a failure (details)

[ -x /bin/true ] && echo ok
ok

[ -x /bin/true -a -e /bin/plop ] && echo ok || echo ko
ko

[ $(which cp) ] && echo ok || echo ko
ok

[ $(which plop) ] && echo ok || echo ko
ko								the $() construct returns an empty string, which causes the test to fail

[ "$(which plop)" ] && echo ok || echo ko
ko								no change when quoting a regular empty string 

[ $(which cp) -a $(which ls) ] && echo ok || echo ko
ok

[ $(which cp) -a $(which plop) ] && echo ok || echo ko
bash: [: /bin/cp: unary operator expected			this is because $(which cp) evaluates to /bin/cp
								and $(which plop) evaluates to  (i.e. unquoted empty string, aka no argument)
								 resulting in : [ /bin/cp -a ]

[ "$(which cp)" -a "$(which plop)" ] && echo ok || echo ko
ko								no more error because "" is a valid argument to -a

[ '$(which cp)' -a '$(which plop)' ] && echo ok || echo ko
ok								testing literal strings $(which plop) and $(which plop) : not empty, then success
mail

Bash script flags

Bash comes with flags that can be raised / lowered to enable / disable some specific behavior (for defensive coding, debugging, ...) while executing commands or scripts.

Read full details and examples about Bash script flags in the article dedicated to set.