"sed" the 'Stream EDitor' - Description, flags and examples

mail

sed branching

sed comes with branching capabilities allowing to perform things like :

Unconditional branching

Syntax :

sed ':label command; b label'
  • :label : the point in code we're you'll jump to. Declaration is preceded by a colon :
  • command : one or more sed commands
  • b label : this is the actual goto label part. If no label is specified, jump to the end of the script.

Example :

Conditional branching

Syntax :

sed ':label command; t label'
  • :label : the point in code we're you'll jump to. Declaration is preceded by a colon :
  • command : one or more sed commands
  • t label : this is the actual if goto label part :
    • if the last substitute command modified the pattern, jump to the label
    • if no label is specified, jump to the end of the script

Example :

  • default behavior : change the 1st occurrence only :
    echo 'aaa' | sed 's/a/A/'
    Aaa
  • change all occurrences using g :
    echo 'aaa' | sed 's/a/A/g'
    AAA
  • change all occurrences using conditional branching :
    echo 'aaa' | sed ':repeat s/a/A/; t repeat'
    AAA
    Branching is overkill in such a basic situation, but you get the idea
  • a not-so-basic example, now : how to replace , with ; within the parentheses only ?
    echo "Letters (a, b, c), numbers (1, 2, 3), fruits (apple, banana, coconut)." | sed ':repeat s/\(([^,)]*\),\([^)]*)\)/\1;\2/;t repeat'
    • the interesting part is :
      :repeat s/\(([^,)]*\),\([^)]*)\)/\1;\2/;t repeat
    • let's give it some air :
      :repeat    s/    \(([^,)]*\),\([^)]*)\)    /    \1;\2    /    ;t repeat
    • MORE air :
      :repeat					the label we'll jump to later
      s/					the substitute command s/search/replace/
      	\(([^,)]*\),\([^)]*)\)		the search part
      	/
      	\1;\2				the replace part, with callbacks
      	/
      ;t repeat				the conditional branching
    • replace is pretty straightforward, so let's focus on search : \(([^,)]*\),\([^)]*)\)
      whether we're using basic or extended regular expressions with sed, the ( and \( would switch places
      once given some air, search looks like :
      \(			|
      	([^,)]*		|==> group 1
      \)			|
      ,
      \(			|
      	[^)]*)		|==> group 2
      \)			|
    • the "group 1" is : an opening ( followed by "anything except , and )". This matches (a
    • the "group 2" follows a , and contains anything except ) and ends with ). This matches  b, c)
    • sed proceeds with the replacement, giving :
      Letters (a; b, c), numbers (1, 2, 3), fruits (apple, banana, coconut).
      and —since a replacement was made— jumps to the start
    • so we have :
      • for "group1" : (a; b
      • for "group1" :  c)
    • and after replacement :
      Letters (a; b; c), numbers (1, 2, 3), fruits (apple, banana, coconut).
    • and so on until there is nothing left to replace : no more "jump to the start". sed ends.
mail

How to remove trailing SPACEs ?

This example changes SPACE into _ only to make them visible in the shell.
#!/usr/bin/env bash
tmpFile=$(mktemp)

echo -e 'no trailing space\n1 trailing space \nno trailing space\n2 trailing spaces  \nno trailing space' > "$tmpFile"

showSpacesInFile() {
	fileToShow=$1
	tr ' ' '_' < "$fileToShow"
	}

echo 'BEFORE :'
showSpacesInFile "$tmpFile"

sed -ri 's/ +$//g' "$tmpFile"

echo -e'\nAFTER :'
showSpacesInFile "$tmpFile"

[ -f "$tmpFile" ] && rm "$tmpFile"
BEFORE :
no_trailing_space
1_trailing_space_
no_trailing_space
2_trailing_spaces__
no_trailing_space

AFTER :
no_trailing_space
1_trailing_space
no_trailing_space
2_trailing_spaces
no_trailing_space
mail

How to insert a string before / after a line matching a pattern ?

insert text before the line matching a pattern :
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/i $textToInsert"
line 1
foo
line 2
line 3
append text after the line matching a pattern :
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/a $textToInsert"
line 1
line 2
foo
line 3
It even works with special characters :
textToInsert='foo\nbar\n\tbaz'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/a $textToInsert"
line 1
line 2
foo
bar
	baz
line 3
mail

How to display the nth line of a file ?

Piece of cake with sed :

echo -e "Line 1\nLine 2\nLine 3\nLine 4" | sed -n '2 p'

Line 2
mail

sed

Usage

sed [flags] 'script' inputFile
script itself can be made of an address and a command (source).

How sed works (source) :

  1. read one line from inputFile and copy it into "work space" (memory zone)
  2. if line matches selection criteria (line number, range, regular expression, ...) :
    1. apply 1st edition command to the contents of work space
    2. apply 2nd edition command (if any) to the (possibly altered) contents of work space
    3. apply 3rd, ..., edition command (if any) to the (possibly altered) contents of work space
  3. append the contents of work space into output file
  4. repeat until end of inputFile

An address can be (source) :

a line number :
sed -n '2 p' inputFile
a range of line numbers :
sed -n '12,15 p' inputFile
lines matching a pattern :
  • sed -n '/pattern/ p' inputFile
  • makes it possible to match + alter lines at once :
    display all lines
    echo -e "\nLine "{1..3} | sed '/2/ s/Line/LINE/'
    Line 1
    LINE 2		only line that was altered
    Line 3
    display altered line(s) only
    echo -e "\nLine "{1..3} | sed -n '/2/ s/Line/LINE/p'
    LINE 2
lines not matching a pattern
lines from the beginning of a file up to a regular expression :
sed -n '0,/pattern/ p' inputFile
Many examples consider line 1 as 'the beginning of the file'. However, if we wanted to alter files from 'the beginning of the file' up to PATTERN, look what happens :
  • echo -e 'hello\nPATTERN\nworld' | sed '1,/PATTERN/ s/^/#/'
    #hello		fine !
    #PATTERN	fine !
    world
  • echo -e 'PATTERN\nhello\nworld' | sed '1,/PATTERN/ s/^/#/'
    #PATTERN
    #hello
    #world		all lines are altered, there must be a bug 
  • echo -e 'PATTERN\nhello\nworld' | sed '0,/PATTERN/ s/^/#/'
    #PATTERN	finally ok  !
    hello
    world
lines from a regular expression up to the end of the file ($ is an alias for "the last line of the file") :
sed -n '/pattern/,$ p' inputFile
lines between two regular expressions (included) :
sed -n '/patternStart/,/patternStop/ p' inputFile
Useless use of grep :
The grep | sed 's///' construct is pretty common, albeit unnecessary. Indeed, here, grep is used to select the line of input we're going to alter, while sed actually alters it. But sed is able to do it all on its own : select + alter. Both commands below are equivalent :
  • echo -e "apple\nbanana\ncarrot" | grep 'banana' | sed 's/a/o/g'
  • echo -e "apple\nbanana\ncarrot" | sed -n '/banana/ s/a/o/g p'
  • The p shown in examples above has NOTHING to do with address specification, it's the print command.
  • It is perfectly legal to have something looking like : 'anyCondition,$ p'. The $ p part is not a typo about the $p variable, it's the $ sign (meaning "until the end of inputFile"), then the print command.

sed commands (sources : common ones, others, others again) :

Function Usage
a someText append someText after the line matched by the specified address (source)
d delete (example)
g globally substitute (when used with s///). Read more.
i someText insert someText before the line matched by the specified address (source)
p print

Due to the way sed works (i.e. copy input to work space, alter it, move to output), and considering the fact that this command actually instructs sed to print the current line, it has the effect of displaying it twice, unless silenced by -n. Check it :

tmpFile=$(mktemp --tmpdir playingWithSed.XXXX); for i in {1..3}; do echo "line $i" >> "$tmpFile"; done; sed '2 p' "$tmpFile"; echo; sed -n '2 p' "$tmpFile"; echo; sed 's/i/a/' "$tmpFile"; echo; sed 's/ne 2/ne TWO/' "$tmpFile"; rm "$tmpFile"
line 1		nothing specified : display as-is
line 2		nothing specified : display as-is
line 2		explicitly asked to print, so here it is
line 3		nothing specified : display as-is

line 2		-n specified : only print line(s) matching criteria

lane 1		changed line is displayed
lane 2		changed line is displayed
lane 3		changed line is displayed

line 1		unchanged line is displayed as-is
line TWO	changed line is displayed
line 3		unchanged line is displayed as-is

q, nqc
Q, nQc
stop processing input after reading n lines :
  1. the optional n parameter (an address) instructs sed to leave after reading the nth line
  2. then quit (and return the optional exit code c)
examples (this is no big deal with short inputs but can make quite a difference on fat ones) :
  • seq 10 | sed -n '1,4 p;3q'
    seq 10 | sed -ne '1,4 p' -e '3q'
    1
    2
    3
  • seq 10 | sed '3Q42'; echo $?
    1
    2		sed left after reading the 3rd line without printing it
    42		the specified exit code
s/search/replace/ substitute patterns
  • s/// has flags
  • The character right after the s is used as the field separator, which saves some headaches when working with paths (source) :
    echo "/path/to/index.html" | sed 's_/path/to/_http://www.example.com/_'
    • Be VERY careful while using _ as a separator together with variables within the search/replace construct : _ is a valid character in a variable name : "s_$search_$replace_" will try to use variables $search_ and $replace_
    • Invalid preceding regular expression is because of (misused) single quotes and {} in : search='o'; replace='0'; echo 'Hello World' | sed -r 's_${search}_${replace}_g'. {} can be used in search pattern to express something repeated : echo 'hello' | sed -r 's_l{2}_X_' returns heXo. (More about regular expressions)
  • Using callbacks :
    • Using & as a callback to the matched string. This is an alternate solution to using extended regular expressions with -r. Both commands are equivalent :
      • example 1 :
        • echo abc | sed 's/abc/(&)/'
        • echo abc | sed -r 's/(abc)/(\1)/'
      • example 2 :
        • echo xyz | sed 's/.\(.\)./\1/'
        • echo xyz | sed -r 's/.(.)./\1/'
    • The \n callback can even be found in the 'search' part of the expression to match duplicated words (source). This example replaces foowhateverfoo with whatever :
      echo 'foo1BARfoo2BAZfoo3foo4' | sed -r 's|(foo)(.*)\1|\2|'
      1BARfoo2BAZfoo34

Flags of the substitute command : s///X (source):

Flag Usage Example
n only replace the nth match of the regexp echo 'hello world' | sed 's/l/L/2'
helLo world
e allows one to pipe input from a shell command into pattern space
g replace all matches of the regexp echo 'hello world' | sed 's/l/L/g'
heLLo worLd
i I make the regexp match case-insensitive echo '1_ab 2_AB 3_aB 4_Ab' | sed 's/ab/xy/ig'
1_xy 2_xy 3_xy 4_xy
m M match the regular expression in multi-line mode
p If the substitution was made, then print the new pattern space (???)
With -n, print only the lines where a change was made. details
echo -e 'apple\nbanana\ncarrot' | sed -nr 's/(.)\1/__/p'
a__le
ca__ot
w file If the substitution was made, then write out the result to the named file (as above : great, but I don't get it ) details

About non-ASCII charsets (source) :

When sed is puzzled by non-ASCII characters, try a PERL equivalent. Instead of : sed 'regEx', run : perl -pe 'regEx'.

About the g option of the substitute command s///g :

NEVER forget that, although this g means "globally replace ...", anything sed is aware of is lines of text, not the text itself. So what is "global" to sed at any time is a single line of text. Thus, using the g option will replace ALL occurrences found in every line.

echo -e "aaaa\nbbbb\naaaa\nbbbb\nbaba\nabab" | sed 's/a/A/g'
AAAA
bbbb
AAAA
bbbb
bAbA
AbAb
All occurrences have been substituted.
echo -e "aaaa\nbbbb\naaaa\nbbbb\nbaba\nabab" | sed 's/a/A/'
Aaaa
bbbb
Aaaa
bbbb
bAba
Abab
Only the 1st occurrence of each line has been substituted.

Flags

Flag Usage
-e script
--expression=script
add script to the commands to be executed (See example)
-f scriptFile
--file=scriptFile
add the contents of scriptFile to the commands to be executed
-i suffix
--in-place=suffix
Alter directly the specified file instead of returning result to stdout. If suffix is provided while altering inputFile, sed will make a backup named inputFilesuffix.
sed is weird if using -i suffix, whereas --in-place=suffix works like a charm.
If the target is a symlink, sed will make the required changes and save them in a regular file instead of altering the target of the link. --follow-symlinks prevents this (source).
-n Only lines explicitly selected for output are written : suppress the default output in which each line, after it is examined for editing, is written to standard output. This allows discarding lines having no match.
Requires the p function to actually output something when used with s/// :
echo -e "whatever\nvalue=1\nvalue=42\nvalue=1000\nwho cares?" | sed -nr 's/value=(.*)$/\1/p'
-r
--regexp-extended
-E
  • use extended regular expressions
  • -E is the POSIX option, which should be preferred for portability
Inside character classes, \ is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.: []^-] (source, example)

Example

Display the nth line of a file :

sed -n 'np' fileName | less

Remove the nth line of a file :

  • sed 'nd' fileBefore > fileAfter
  • or : sed -i 'nd' file

It is also possible to remove the 1st line of a file with : tail -n+2 fileBefore > fileAfter (source)

Delete lines matching a pattern :

sed -i '/pattern/d' file

Delete lines, starting at line n included, until the end (i.e. keep only the n-1 heading lines of input) :

head can do this easier when processing a stream of text, but sed saves a | and a temp file by directly altering the source file.

Delete lines from file, starting at line n, until line matching pattern (included) (source) :

sed -i 'n,/pattern/d' file
!d instead of d swaps the kept / removed regions. This deletes all lines after the line matching pattern (which is excluded).

Substitute words :

sed -i 's/search/replace/g' file
Replace every occurrence of search with replace. The final g is for global replacement (=replace ALL occurrences)
regExpNeedle='test\.example\.com'; regExpReplacement='www\.example\.com'; echo "My website name is test.example.com. test.example.com is a good name." | sed "s/$regExpNeedle/$regExpReplacement/g"
Same using variables.
needle='search'; replacement='replace'; grep -lr "$needle" * | xargs sed -i "s/$needle/$replacement/g"
Replace every occurrence of search with replace within all files matching grep search pattern.
mail

Remove all non-ASCII characters from file / string

Method 1 (source) :

  • sed -i 's/[\d128-\d255]//g' file
  • sed -i 's/[^[:print:]]//' file

Method 2 "The dirty method" (source) :

echo 'La lettre à Elise' | sed -n 'l0' | sed -r 's/(\\[0-9]+)//g'
sed -n 'l0' : show the string "as seen by sed"

Method 3 "The C method" (source) :

This is a "If you can't beat them, just ignore them" method : using a different LANG setting skips the non-ASCII characters.
Just do : LANG=C, then run your sed command without worrying about special characters anymore.

Changing LANG (currently fr_FR.UTF-8) may have side effects in the next commands of your shell session. To avoid this :

  1. SAVED_LANG=$LANG
  2. Have fun with sed !
  3. LANG=$SAVED_LANG
mail

Extract fields from a logfile

sed -r 's/regExp/\x;\y/' logFile > resultFile

mail

Extract a substring with a RegExp

Extract the 4 last characters of a string :

echo "foo1234" | sed -r 's/^.*(.{4})$/\1/'

Extracting the substring is actually made by substituting all characters that don't match with an empty string (=keeping only those that match). In the RegExp, ( and ) are used to surround the characters that match. The \1 stands for "anything found in the 1st group of (...)"

Extract an unknown number of substrings matching a pattern :

In the example below, let's imagine we have a string containing an unknown number of IP addresses mixed with some text I don't care about, and I'd like to get those IP addresses :
line="foo;1.2.3.4;bar;5.6.7.8;baz;"; pattern='[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; echo "$line" | sed -r 's/('$pattern')/\n\1/g' | sed -r 's/^('$pattern')?.*$/\1/g' | sed '/^$/ d'
How it works :
  1. the 1st sed splits $line into shorter lines :
    whatever
    matchedSubstring1somethingElse
    matchedSubstring2everythingElse
    matchedSubstringNwhoCares
  2. the 2nd sed processes these short lines by keeping only what matches the pattern. This generates an empty line if nothing matches :
    matchedSubstring1
    matchedSubstring2
    matchedSubstringN
  3. the 3rd sed removes empty lines from the output :
    matchedSubstring1
    matchedSubstring2
    matchedSubstringN
mail

Need a RegExp "lazy star" ?

sed doesn't support "lazy star" but PERL can help :

grep -i 'someText' someFile | perl -pe 's/.*(a href|A HREF)="(http:\/\/.*?)">.*/\2/'

Other examples :

without lazy star :
echo 'Lorem ipsum dolor sit /*REMOVE ME*/ amet, consectetur /*REMOVE ME*/ adipiscing elit.' | perl -pe 's#/\*.*\*/##g'
Lorem ipsum dolor sit  adipiscing elit.
with lazy star :
echo 'Lorem ipsum dolor sit /*REMOVE ME*/ amet, consectetur /*REMOVE ME*/ adipiscing elit.' | perl -pe 's#/\*.*?\*/##g'
Lorem ipsum dolor sit  amet, consectetur  adipiscing elit.
  • in the s/search/replace/ command, sed and PERL (and possibly others) allow replacing the / with anything you like to increase readability (here : #)
  • this example is not the most basic since it matches a string having itself / and * characters, which are reserved in the context of regular expressions, hence some extra \ to escape them
  • the highlighted parts show what is matched by each regular expression (and removed by the s/search/replace/ command)
  • there are still some double-SPACE characters were the text has been removed : I didn't want to make the code too complex for this example, but this can easily be fixed
mail

Remove ANSI colors codes

mail

Count the number of articles in this file with anchor starting with each alphabet letter

In the (earliest!) original format of this file (BashIndex.xml), articles are formatted as :
<article id="anchor">
	<titre>title</titre>
	...
	content
	...
</article>
This command counts the number of anchor starting with each of the alphabet letter :
sed -n '/<article id=/ s/.*article id="\(.\).*".*/\1/gp' BashIndex.xml | sort | uniq -c | sort -nr
mail

Delete empty lines

Even though sed is very good for "search and replace" tasks, these commands will fail :

because sed processes its input line by line. The commands above will just replace an empty line with... an empty line

To effectively delete empty lines :

sed -i '/^$/ d' inputFile

mail

How to remove newline characters ?

Since sed's input is sliced in single-line chunks, sed is not fully aware of line endings. Hence this is not the right tool for the job. Consider tr instead.
mail

How to chain sed commands ?

echo 'Hello world' | sed -e 's/Hello/Hi/' -e 's/world/people/'
Hi people
echo 'Hello world' | sed 's/Hello/Hi/; s/world/people/'
Hi people
In this context, successive commands are actually "piped" into each other :
echo 'a' | sed -e 's/a/b/' -e 's/b/c/'
c
echo 'a' | sed 's/a/b/; s/b/c/'
c
tmpFile=$(mktemp); echo 'a' > "$tmpFile"; sed -e 's/a/b/' -e 's/b/c/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
c
tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/b/c/' -e 's/a/b/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
b
c
c
tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/a/b/' -e 's/b/c/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
	a->b			b->c
a			b			c
b	==>		b	==>		c
c			c			c

tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/b/c/' -e 's/a/b/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
	b->c			a->b
a			a			b
b	==>		c	==>		c
c			c			c
mail

How to substitute the 1st occurrence only of a pattern in a stream ?

echo -e "aaaa\nbbbb\nbaba\nabab" | sed '0,/a/ s//A/'

Aaaa
bbbb
baba
abab
Explanation :
mail

Negative matching

When it comes to negative matching, regular expressions are said to be "difficult" / "not-designed-to-do-this". However, this _may_ still be possible (sources : 1, 2) at the cost of increased complexity and loss of readability
KISS suggests there is no shame in using several |-separated grep's or sed's .

To run any sed command on a line NOT matching a pattern, there's just to negate the match with the ! operator :

Make all not happy become VERY HAPPY :
echo -e 'happy\nhappy\nunhappy\nhappy' | sed '/^happy/! s/.*/VERY HAPPY/'
Keep only the good guys :
echo -e 'good guy\nbad guy\nbad guy\ngood guy' | sed '/^good/! d'

There's a hack implying the b (branch) operator, instructing sed to jump to the end of the script (i.e. skip the matched line). The solution above looks better, but I keep it for future reference :

Make all not happy become VERY HAPPY :
echo -e 'happy\nhappy\nunhappy\nhappy' | sed '/^happy/b; s/.*/VERY HAPPY/'
Keep only the good guys :
echo -e 'good guy\nbad guy\nbad guy\ngood guy' | sed '/^good/b; d'
mail

Need to handle \n or \t strings ?

This article is about handling 2-character strings like \n and \t with sed, but NOT the special characters they represent : newline and TAB.
It was inspired by the examples above not being displayed correctly after processing the text of these pages

tmpFile=$(mktemp); echo "correct \t horse \n battery \t staple" > "$tmpFile"; cat "$tmpFile"; tmpStringNewline=$(pwgen 6 1); tmpStringTab=$(pwgen 6 1); sed -ri "s/[\d92]n/$tmpStringNewline/g; s/[\d92]t/$tmpStringTab/g" "$tmpFile"; cat "$tmpFile"; sed -ri 's/'$tmpStringNewline'/\\n/g; s/'$tmpStringTab'/\\t/g' "$tmpFile"; cat "$tmpFile"; rm "$tmpFile"

  1. We echo a literal string into a temporary file. No -e flag, so we're effectively writing the \ + n (and \ + t) distinctly, which is confirmed by the first cat.
  2. Then pwgen generates 2 random strings, so that we can replace \n and \t with unique strings.
  3. [\d92]n and [\d92]t are the regexp-style ways of saying literally \n and literally \t. 92 is the decimal ASCII code of \.
  4. Then cat to confirm the first round of replacements worked.
  5. Reverting these replacements : mind the simple quotes (double quotes on the first round). The variables holding the random strings are outside of the sed s/// command string, and the \ are \-escaped. When found within double quotes by sed, \n and \t have their special meanings.