goto X
: aka unconditional branchingif X goto Y
: aka conditional branching:label
: the point in code we're you'll jump to. Declaration is preceded by a colon :
command
: one or more sed commandsb label
: this is the actual goto label
part. If no label is specified, jump to the end of the script.:label
: the point in code we're you'll jump to. Declaration is preceded by a colon :
command
: one or more sed commandst label
: this is the actual if goto label
part :
Aaa
AAA
AAA
,
with ;
within the parentheses only ?
:repeat s/\(([^,)]*\),\([^)]*)\)/\1;\2/;t repeat
':repeat s/ \(([^,)]*\),\([^)]*)\) / \1;\2 / ;t repeat
:repeat the label we'll jump to later s/ the substitute command s/search/replace/ \(([^,)]*\),\([^)]*)\) the search part / \1;\2 the replace part, with callbacks / ;t repeat the conditional branching
\(([^,)]*\),\([^)]*)\)
(
and \(
would switch places\( | ([^,)]* |==> group 1 \) | , \( | [^)]*) |==> group 2 \) |
(
followed by "anything except ,
and )
". This matches (a,
and contains anything except )
and ends with )
. This matches b, c)Letters (a; b, c), numbers (1, 2, 3), fruits (apple, banana, coconut).and —since a replacement was made— jumps to the start
Letters (a; b; c), numbers (1, 2, 3), fruits (apple, banana, coconut).
SPACE
into _
only to make them visible in the shell.#!/usr/bin/env bash tmpFile=$(mktemp) echo -e 'no trailing space\n1 trailing space \nno trailing space\n2 trailing spaces \nno trailing space' > "$tmpFile" showSpacesInFile() { fileToShow=$1 tr ' ' '_' < "$fileToShow" } echo 'BEFORE :' showSpacesInFile "$tmpFile" sed -ri 's/ +$//g' "$tmpFile" echo -e'\nAFTER :' showSpacesInFile "$tmpFile" [ -f "$tmpFile" ] && rm "$tmpFile"
BEFORE : no_trailing_space 1_trailing_space_ no_trailing_space 2_trailing_spaces__ no_trailing_space AFTER : no_trailing_space 1_trailing_space no_trailing_space 2_trailing_spaces no_trailing_space
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/i $textToInsert"
line 1 foo line 2 line 3
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/a $textToInsert"
line 1 line 2 foo line 3
textToInsert='foo\nbar\n\tbaz'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/a $textToInsert"
line 1 line 2 foo bar baz line 3
script itself can be made of an address and a command (source).
#hello fine ! #PATTERN fine ! world
#PATTERN
#hello
#world all lines are altered, there must be a bug
#PATTERN finally ok !
hello
world
$
is an alias for "the last line of the file") :p
shown in examples above has NOTHING to do with address specification, it's the print command.'anyCondition,$ p'
. The $ p
part is not a typo about the $p
variable, it's the $
sign (meaning "until the end of inputFile"), then the print command. Function | Usage |
---|---|
a someText | append someText after the line matched by the specified address (source) |
d | delete (example) |
g | globally substitute (when used with s///). Read more. |
i someText | insert someText before the line matched by the specified address (source) |
p | print
Due to the way sed works (i.e. copy input to work space, alter it, move to output), and considering the fact that this command actually instructs sed to print the current line, it has the effect of displaying it twice, unless silenced by -n. Check it : tmpFile=$(mktemp --tmpdir playingWithSed.XXXX); for i in {1..3}; do echo "line $i" >> "$tmpFile"; done; sed '2 p' "$tmpFile"; echo; sed -n '2 p' "$tmpFile"; echo; sed 's/i/a/' "$tmpFile"; echo; sed 's/ne 2/ne TWO/' "$tmpFile"; rm "$tmpFile"
line 1 nothing specified : display as-is line 2 nothing specified : display as-is line 2 explicitly asked to print, so here it is line 3 nothing specified : display as-is line 2 -n specified : only print line(s) matching criteria lane 1 changed line is displayed lane 2 changed line is displayed lane 3 changed line is displayed line 1 unchanged line is displayed as-is line TWO changed line is displayed line 3 unchanged line is displayed as-is |
q, nqc Q, nQc |
stop processing input after reading n lines :
|
s/search/replace/ | the pattern substitution operator
|
y/searchList/replaceList/ | the transliteration operator replaces all occurrences of the characters found in searchList with the positionally corresponding character of replaceList (which is what tr does)
echo 'The quick brown fox jumps over the lazy dog' | sed '
y/aeiouy/AEIOUY/ 'ThE qUIck brOwn fOx jUmps OvEr thE lAzY dOg I've found no means to pass lists of character using the
[A-Z] / [a-z] syntax . |
Flag | Usage | Example |
---|---|---|
n | only replace the nth match of the regexp |
echo 'hello world' | sed 's/l/L/2'
helLo world |
e | allows one to pipe input from a shell command into pattern space | |
g | replace all matches of the regexp |
echo 'hello world' | sed 's/l/L/g'
heLLo worLd |
i I | make the regexp match case-insensitive |
echo '1_ab 2_AB 3_aB 4_Ab' | sed 's/ab/xy/ig'
1_xy 2_xy 3_xy 4_xy |
m M | match the regular expression in multi-line mode | |
p | If the substitution was made, then print the new pattern space(???) |
echo -e 'apple\nbanana\ncarrot' | sed -nr 's/(.)\1/__/p'
a__le ca__ot |
w file | If the substitution was made, then write out the result to the named file(as above : great, but I don't get it ) details |
When sed is puzzled by non-ASCII characters, try a PERL equivalent. Instead of : sed 'regEx', run : perl -pe 'regEx'.
NEVER forget that, although this g means "globally replace ...", anything sed is aware of is lines of text, not the text itself. So what is "global" to sed at any time is a single line of text. Thus, using the g option will replace ALL occurrences found in every line.
AAAA bbbb AAAA bbbb bAbA AbAbAll occurrences have been substituted.
Aaaa bbbb Aaaa bbbb bAba AbabOnly the 1st occurrence of each line has been substituted.
Flag | Usage |
---|---|
-e script --expression=script |
add script to the commands to be executed (see example, grymoire.com) |
-f scriptFile --file=scriptFile |
add the contents of scriptFile to the commands to be executed |
-i suffix --in-place=suffix |
Alter directly the specified file instead of returning result to stdout. If suffix is provided while altering inputFile, sed will make a backup named inputFilesuffix.
sed is weird if using -i suffix, whereas --in-place=suffix works like a charm.
If the target is a symlink, sed will make the required changes and save them in a regular file instead of altering the target of the link. --follow-symlinks prevents this (source).
|
-n | Only lines explicitly selected for output are written : suppress the default output in which each line, after it is examined for editing, is written to standard output. This allows discarding lines having no match. |
-r --regexp-extended -E |
Inside character classes,
\ is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.: []^-] (source, example)
|
-z --null-data |
consider input lines are separated by NUL characters rather than \n
echo -e 'foo\nbar\nbaz\n' | sed '
s/^\(.\)/-\1/ '; echo -e 'foo\0bar\0baz\0' | sed -z 's/^\(.\)/-\1/ '-foo -bar -baz -foo-bar-baz- |
sed -n 'np' fileName | less
It is also possible to remove the 1st line of a file with : tail -n+2 fileBefore > fileAfter (source)
head can do this easier when processing a stream of text, but sed saves a | and a temp file by directly altering the source file.
This is a "If you can't beat them, just ignore them" method : using a different LANG setting skips the non-ASCII characters.
Just do : LANG=C, then run your sed command without worrying about special characters anymore.
Changing LANG (currently fr_FR.UTF-8) may have side effects in the next commands of your shell session. To avoid this :
sed -r 's/regExp/\x;\y/' logFile > resultFile
Extracting the substring is actually made by substituting all characters that don't match with an empty string (=keeping only those that match). In the RegExp, ( and ) are used to surround the characters that match. The \1 stands for "anything found in the 1st group of (...)"
line="foo;1.2.3.4;bar;5.6.7.8;baz;"; pattern='[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; echo "$line" | sed -r 's/('$pattern')/\n\1/g' | sed -r 's/^('$pattern')?.*$/\1/g' | sed '/^$/ d'
How it works :
$line
into shorter lines :
whatever matchedSubstring1somethingElse matchedSubstring2everythingElse matchedSubstringNwhoCares
matchedSubstring1 matchedSubstring2 matchedSubstringN
matchedSubstring1 matchedSubstring2 matchedSubstringN
s#/\*.*\*/##g
'Lorem ipsum dolor sit adipiscing elit.
s#/\*.*?\*/##g
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.
s/search/replace/
command, sed and PERL (and possibly others) allow replacing the /
with anything you like to increase readability (here : #
)/
and *
characters, which are reserved in the context of regular expressions, hence some extra \
to escape thems/search/replace/
command)SPACE
characters were the text has been removed : I didn't want to make the code too complex for this example, but this can easily be fixed <article id="anchor"> <titre>title</titre> ... content ... </article>This command counts the number of anchor starting with each of the alphabet letter :
Even though sed is very good for "search and replace" tasks, these commands will fail :
because sed processes its input line by line. The commands above will just replace an empty line with... an empty lineTo effectively delete empty lines :
b c c
tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/a/b/' -e 's/b/c/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile" a->b b->c a b c b ==> b ==> c c c c tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/b/c/' -e 's/a/b/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile" b->c a->b a a b b ==> c ==> c c c c
echo -e "aaaa\nbbbb\nbaba\nabab" | sed '0,/a/ s//A/'
Aaaa bbbb baba ababExplanation :
0,/a/
is an address instructing sed to focus exclusively on a subset of lines. This subset starts at the line 0 and ends after the first line matching the regular expression /a/.s//A/
is the function telling sed to substitute a
with A
once, since there is no g option.
a
is implicit, since this is what was matched earlier. Check this with : echo 'abc' | sed '0,/a/ s//A/'.To run any sed command on a line NOT matching a pattern, there's just to negate the match with the !
operator :
There's a hack implying the b
(branch) operator, instructing sed to jump to the end of the script (i.e. skip the matched line). The solution above looks better, but I keep it for future reference :
This article is about handling 2-character strings like \n
and \t
with sed, but NOT the special characters they represent : newline
and TAB
.
It was inspired by the examples above not being displayed correctly after processing the text of these pages
tmpFile=$(mktemp); echo "correct \t horse \n battery \t staple" > "$tmpFile"; cat "$tmpFile"; tmpStringNewline=$(pwgen 6 1); tmpStringTab=$(pwgen 6 1); sed -ri "s/[\d92]n/$tmpStringNewline/g; s/[\d92]t/$tmpStringTab/g" "$tmpFile"; cat "$tmpFile"; sed -ri 's/'$tmpStringNewline'/\\n/g; s/'$tmpStringTab'/\\t/g' "$tmpFile"; cat "$tmpFile"; rm "$tmpFile"
\
+ n
(and \
+ t
) distinctly, which is confirmed by the first cat.\n
and \t
with unique strings.[\d92]n
and [\d92]t
are the regexp-style ways of saying literally \n
and literally \t
. 92 is the decimal ASCII code of \
.s///
command string, and the \
are \
-escaped. When found within double quotes by sed, \n
and \t
have their special meanings.