"sed" the 'Stream EDitor' - Description, flags and examples

How to remove trailing SPACEs ?

This example changes SPACE into _ only to make them visible in the shell.
#!/usr/bin/env bash
tmpFile=$(mktemp)

echo -e 'no trailing space\n1 trailing space \nno trailing space\n2 trailing spaces  \nno trailing space' > "$tmpFile"

showSpacesInFile() {
	fileToShow=$1
	tr ' ' '_' < "$fileToShow"
	}

echo 'BEFORE :'
showSpacesInFile "$tmpFile"

sed -ri 's/ +$//g' "$tmpFile"

echo -e'\nAFTER :'
showSpacesInFile "$tmpFile"

[ -f "$tmpFile" ] && rm "$tmpFile"
BEFORE :
no_trailing_space
1_trailing_space_
no_trailing_space
2_trailing_spaces__
no_trailing_space

AFTER :
no_trailing_space
1_trailing_space
no_trailing_space
2_trailing_spaces
no_trailing_space

How to insert a string before / after a specific line of a file ?

insert text before the line matching a pattern :
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/i $textToInsert"
line 1
foo
line 2
line 3
append text after the line matching a pattern :
textToInsert='foo'; pattern='line 2'; echo -e "line 1\nline 2\nline 3" | sed "/$pattern/a $textToInsert"
line 1
line 2
foo
line 3

How to display the nth line of a file ?

Piece of cake with sed :

echo -e "Line 1\nLine 2\nLine 3\nLine 4" | sed -n '2 p'

Line 2

sed

Usage :

sed [flags] 'script' inputFile
script itself can be made of an address and a function (source).

How sed works (source) :

  1. read one line from inputFile and copy it into "work space" (memory zone)
  2. if line matches selection criteria (line number, range, regular expression, ...) :
    1. apply 1st edition command to the contents of work space
    2. apply 2nd edition command (if any) to the (possibly altered) contents of work space
    3. apply 3rd, ..., edition command (if any) to the (possibly altered) contents of work space
  3. append the contents of work space into output file
  4. repeat until end of inputFile

An address can be (source) :

a line number :
sed -n '2 p' inputFile
a range of line numbers :
sed -n '12,15 p' inputFile
lines matching a pattern :
  • sed -n '/pattern/ p' inputFile
  • echo -e "\nLine "{1..5} | sed '/3/ s/Line/LINE/'
lines not matching a pattern
lines from the beginning of a file up to a regular expression :
sed -n '0,/pattern/ p' inputFile
Many examples consider line 1 as 'the beginning of the file'. However, if we wanted to alter files from 'the beginning of the file' up to PATTERN, look what happens :
  • echo -e 'hello\nPATTERN\nworld' | sed '1,/PATTERN/ s/^/#/'
    #hello		fine !
    #PATTERN	fine !
    world
  • echo -e 'PATTERN\nhello\nworld' | sed '1,/PATTERN/ s/^/#/'
    #PATTERN
    #hello
    #world		all lines are altered, there must be a bug 
  • echo -e 'PATTERN\nhello\nworld' | sed '0,/PATTERN/ s/^/#/'
    #PATTERN	finally ok  !
    hello
    world
lines from a regular expression up to the end of the file ($ is an alias for "the last line of the file") :
sed -n '/pattern/,$ p' inputFile
lines between two regular expressions (included) :
sed -n '/patternStart/,/patternStop/ p' inputFile
  • The p shown in examples above has NOTHING to do with address specification, it's the print command.
  • It is perfectly legal to have something looking like : 'anyCondition,$ p'. The $ p part is not a typo about the $p variable, it's the $ sign (meaning "until the end of inputFile"), then the print command.

sed functions (sources : common ones, others, others again (scroll down)) :

Function Usage
a someText append someText after the line matched by the specified address (source)
d delete
g globally substitute (when used with s///). Read more.
i someText insert someText before the line matched by the specified address (source)
p print

Due to the way sed works (i.e. copy input to work space, alter it, move to output), and considering the fact that this command actually instructs sed to print the current line, it has the effect of displaying it twice, unless silenced by -n. Check it :

tmpFile=$(mktemp --tmpdir playingWithSed.XXXX); for i in {1..3}; do echo "line $i" >> "$tmpFile"; done; sed '2 p' "$tmpFile"; echo; sed -n '2 p' "$tmpFile"; echo; sed 's/i/a/' "$tmpFile"; echo; sed 's/ne 2/ne TWO/' "$tmpFile"; rm "$tmpFile"
line 1		nothing specified : display as-is
line 2		nothing specified : display as-is
line 2		explicitly asked to print, so here it is
line 3		nothing specified : display as-is

line 2		-n specified : only print line(s) matching criteria

lane 1		changed line is displayed
lane 2		changed line is displayed
lane 3		changed line is displayed

line 1		unchanged line is displayed as-is
line TWO	changed line is displayed
line 3		unchanged line is displayed as-is

q, nqc
Q, nQc
stop processing input and quit (and return the optional exit code c). No big deal with short files but this can make quite a difference on fat ones (source) :
  1. generate a big file (500 millions lines, this may take a while ) :
    tmpFile='/run/shm/myBigFile'; tmpFile2='/run/shm/myBigFile2'; > "$tmpFile"; > "$tmpFile2"; for ((i=1;i<=1000000;i++)); do echo $i >> "$tmpFile"; done; for ((i=1;i<=500;i++)); do cat "$tmpFile" >> "$tmpFile2"; done; wc -l "$tmpFile2"
  2. compare :
    • time { sed -n '1p' "$tmpFile2"; }
      1			the 1st line, as requested
      
      real    0m20.444s	sed continued reading the input since it has no further instruction
      user    0m19.211s
      sys     0m1.043s
    • time { sed -n '1p;1q' "$tmpFile2"; }
      1			line 1 again
      
      real    0m0.002s	sed stopped reading the input just after the line 1, which is WAY faster 
      user    0m0.002s
      sys     0m0.001s
The optional n parameter (an address) instructs sed to leave after reading the nth line :
  • seq 10 | sed -n '1,4 p;3q'
    seq 10 | sed -ne '1,4 p' -e '3q'
    1
    2
    3
  • seq 10 | sed '3Q42'; echo $?
    1
    2		sed left after reading the 3rd line without printing it
    42		the specified exit code
s/search/replace/ substitute patterns
The character right after the s is used as the field separator, which saves some headaches when working with paths (source) :
echo "/path/to/index.html" | sed 's_/path/to/_http://www.example.com/_'
  • Be VERY careful while using _ as a separator together with variables within the search/replace construct : _ is a valid character in a variable name : "s_$search_$replace_" will try to use variables $search_ and $replace_
  • Invalid preceding regular expression is because of (misused) single quotes and {} in : search='o'; replace='0'; echo 'Hello World' | sed -r 's_${search}_${replace}_g'. {} can be used in search pattern to express something repeated : echo 'hello' | sed -r 's_l{2}_X_' returns heXo. (More about regular expressions)
Using callbacks :
  • Using & as the matched string
  • echo 'xyz' | sed 's/.\(.\)./\1/'
  • echo 'xyz' | sed -r 's/.(.)./\1/'
  • The \n callback can even be found in the 'search' part of the expression to match duplicated words (source). This example replaces foowhateverfoo with whatever :
    echo 'foo1BARfoo2BAZfoo3foo4' | sed -r 's|(foo)(.*)\1|\2|'
    1BARfoo2BAZfoo34

About non-ASCII charsets (source) :

When sed is puzzled by non-ASCII characters, try a PERL equivalent. Instead of : sed 'regEx', run : perl -pe 'regEx'.

A word of about the g option of the substitute command : s///g :

We must NEVER forget that, although this g means "globally replace ...", anything sed is aware of is lines of text, not the text itself. So what is "global" to sed at any time is a single line of text. Thus, using the g option will replace ALL occurrences of the line.

echo -e "aaaa\nbbbb\naaaa\nbbbb\nbaba\nabab" | sed 's/a/A/g'
AAAA
bbbb
AAAA
bbbb
bAbA
AbAb
All occurrences have been substituted.
echo -e "aaaa\nbbbb\naaaa\nbbbb\nbaba\nabab" | sed 's/a/A/'
Aaaa
bbbb
Aaaa
bbbb
bAba
Abab
Only the 1st occurrence of each line has been substituted.

Flags :

Flag Usage
-e script --expression=script add script to the commands to be executed (See example)
-f scriptFile --file=scriptFile add the contents of scriptFile to the commands to be executed
-i suffix
--in-place=suffix
Alter directly the specified file instead of returning result to stdout. If suffix is provided while altering inputFile, sed will make a backup named inputFilesuffix.
sed is weird if using -i suffix, whereas --in-place=suffix works like a charm.
If the target is a symlink, sed will make the required changes and save them in a regular file instead of altering the target of the link. --follow-symlinks prevents this (source).
-n Only lines explicitly selected for output are written : suppress the default output in which each line, after it is examined for editing, is written to standard output. This allows discarding lines having no match.
Requires the p function to actually output something when used with s/// :
echo -e "whatever\nvalue=1\nvalue=42\nvalue=1000\nwho cares?" | sed -nr 's/value=(.*)$/\1/p'
-r use extended regular expressions
Inside character classes, \ is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, e.g.: []^-] (source, example)

Example :

Display the nth line of a file :

sed -n 'np' fileName | less

Remove the nth line of a file :

  • sed 'nd' fileBefore > fileAfter
  • or : sed -i 'nd' file

It is also possible to remove the 1st line of a file with : tail -n+2 fileBefore > fileAfter (source)

Delete lines matching a pattern :

sed -i '/pattern/d' file

Delete lines, starting at line n included !, until the end (i.e. keep only the n-1 heading lines of file/stream) :

head can do this easier when processing a stream of text, but sed saves a | and a temp file by directly altering the source file.

Delete lines from file, starting at line n, until line matching pattern (included !) (source) :

sed -i 'n,/pattern/d' file
!d instead of d swaps the kept / removed regions. This deletes all lines after the line matching pattern (which is excluded).

Substitute words :

sed -i 's/search/replace/g' file
Replace every occurrence of search with replace. The final g is for global replacement (=replace ALL occurrences)
regExpNeedle='test\.example\.com'; regExpReplacement='www\.example\.com'; echo "My website name is test.example.com. test.example.com is a good name." | sed "s/$regExpNeedle/$regExpReplacement/g"
Same using variables.
needle='search'; replacement='replace'; grep -lr "$needle" * | xargs sed -i "s/$needle/$replacement/g"
Replace every occurrence of search with replace within all files matching grep search pattern.

Replace a string with a newline

sed "s/replaceMe/\\$(echo -e '\n\r')/g" fileBefore > fileAfter

Remove all non-ASCII characters from file/string

Method 1 (source) :

  • sed -i 's/[\d128-\d255]//g' file
  • sed -i 's/[^[:print:]]//' file

Method 2 "The dirty method" (source) :

echo 'La lettre à Elise' | sed -n 'l0' | sed -r 's/(\\[0-9]+)//g'
sed -n 'l0' : show the string "as seen by sed"

Method 3 "The C method" (source) :

This is a "If you can't beat them, just ignore them" method : using a different LANG setting skips the non-ASCII characters.
Just do : LANG=C, then run your sed command without worrying about special characters anymore.

Changing LANG (currently fr_FR.UTF-8) may have side effects in the next commands of your shell session. To avoid this :

  1. SAVED_LANG=$LANG
  2. Have fun with sed !
  3. LANG=$SAVED_LANG

Extract fields from a logfile

sed -r 's/regExp/\x;\y/' logFile > resultFile

Extract a substring with a RegExp

Extract the 4 last characters of a string :

echo "foo1234" | sed -r 's/^.*(.{4})$/\1/'

Extracting the substring is actually made by substituting all characters that don't match with an empty string (=keeping only those that match). In the RegExp, ( and ) are used to surround the characters that match. The \1 stands for "anything found in the 1st group of (...)"

Extract an unknown number of substrings matching a pattern :

In the example below, let's imagine we have a string containing an unknown number of IP addresses mixed with some text I don't care about, and I'd like to get those IP addresses :
line="foo;1.2.3.4;bar;5.6.7.8;baz;"; pattern='[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; echo "$line" | sed -r 's/('$pattern')/\n\1/g' | sed -r 's/^('$pattern')?.*$/\1/g' | sed '/^$/ d'
How it works :
  1. the 1st sed splits $line into shorter lines :
    whatever
    matchedSubstring1somethingElse
    matchedSubstring2everythingElse
    matchedSubstringNwhoCares
  2. the 2nd sed processes these short lines by keeping only what matches the pattern. This generates an empty line if nothing matches :
    matchedSubstring1
    matchedSubstring2
    matchedSubstringN
  3. the 3rd sed removes empty lines from the output :
    matchedSubstring1
    matchedSubstring2
    matchedSubstringN

Need a RegExp "lazy star" ?

sed doesn't support "lazy star" but PERL can help :

grep -i 'someText' someFile | perl -pe 's/.*(a href|A HREF)="(http:\/\/.*?)">.*/\2/'

Other examples :

without lazy star :
echo 'Lorem ipsum dolor sit /*REMOVE ME*/ amet, consectetur /*REMOVE ME*/ adipiscing elit.' | perl -pe 's#/\*.*\*/##g'
Lorem ipsum dolor sit  adipiscing elit.
with lazy star :
echo 'Lorem ipsum dolor sit /*REMOVE ME*/ amet, consectetur /*REMOVE ME*/ adipiscing elit.' | perl -pe 's#/\*.*?\*/##g'
Lorem ipsum dolor sit  amet, consectetur  adipiscing elit.
  • in the s/search/replace/ command, sed and PERL (and possibly others) allow replacing the / with anything you like to increase readability (here : #)
  • this example is not the most basic since it matches a string having itself / and * characters, which are reserved in the context of regular expressions, hence some extra \ to escape them
  • the highlighted parts show what is matched by each regular expression (and removed by the s/search/replace/ command)
  • there are still some double-SPACE characters were the text has been removed : I didn't want to make the code too complex for this example, but this can easily be fixed

Remove ANSI colors codes

Count the number of articles in this file with anchor starting with each alphabet letter

In the (earliest!) original format of this file (BashIndex.xml), articles are formatted as :
<article id="anchor">
	<titre>title</titre>
	...
	content
	...
</article>
This command counts the number of anchor starting with each of the alphabet letter :
sed -n '/<article id=/ s/.*article id="\(.\).*".*/\1/gp' BashIndex.xml | sort | uniq -c | sort -nr

Delete empty lines

Even though sed is very good for "search and replace" tasks, these commands will fail :

because sed processes its input line by line. The commands above will just replace an empty line with... an empty line

To effectively delete empty lines :

sed -i '/^$/ d' inputFile

How to remove newline characters ?

Since sed's input is sliced in single-line chunks, sed is not fully aware of line endings. Hence this is not the right tool for the job. Consider tr instead.

How to chain sed commands ?

echo 'Hello world' | sed -e 's/Hello/Hi/' -e 's/world/people/'
Hi people
echo 'Hello world' | sed 's/Hello/Hi/; s/world/people/'
Hi people
In this context, successive commands are actually "piped" into each other :
echo 'a' | sed -e 's/a/b/' -e 's/b/c/'
c
echo 'a' | sed 's/a/b/; s/b/c/'
c
tmpFile=$(mktemp); echo 'a' > "$tmpFile"; sed -e 's/a/b/' -e 's/b/c/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
c
tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/b/c/' -e 's/a/b/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
b
c
c
tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/a/b/' -e 's/b/c/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
	a->b			b->c
a			b			c
b	==>		b	==>		c
c			c			c

tmpFile=$(mktemp); echo -e 'a\nb\nc' > "$tmpFile"; sed -e 's/b/c/' -e 's/a/b/' "$tmpFile"; [ -f "$tmpFile" ] && rm "$tmpFile"
	b->c			a->b
a			a			b
b	==>		c	==>		c
c			c			c

How to substitute the 1st occurrence only of a pattern in a stream ?

echo -e "aaaa\nbbbb\nbaba\nabab" | sed '0,/a/ s//A/'

Aaaa
bbbb
baba
abab
Explanation :

Negative matching

When it comes to negative matching, regular expressions are said to be "difficult" / "not-designed-to-do-this". However, this _may_ still be possible (sources : 1, 2) at the cost of increased complexity and loss of readability
KISS suggests there is no shame in using several |-separated grep's or sed's .

To run any sed command on a line NOT matching a pattern, there's just to negate the match with the ! operator :

Make all not happy become VERY HAPPY :
echo -e 'happy\nhappy\nunhappy\nhappy' | sed '/^happy/! s/.*/VERY HAPPY/'
Keep only the good guys :
echo -e 'good guy\nbad guy\nbad guy\ngood guy' | sed '/^good/! d'

There's a hack implying the b (branch) operator, instructing sed to jump to the end of the script (i.e. skip the matched line). The solution above looks better, but I keep it for future reference :

Make all not happy become VERY HAPPY :
echo -e 'happy\nhappy\nunhappy\nhappy' | sed '/^happy/b; s/.*/VERY HAPPY/'
Keep only the good guys :
echo -e 'good guy\nbad guy\nbad guy\ngood guy' | sed '/^good/b; d'

Need to handle \n or \t strings ?

This article is about handling 2-character strings like \n and \t with sed, but NOT the special characters they represent : newline and TAB.
It was inspired by the examples above not being displayed correctly after processing the text of these pages

tmpFile=$(mktemp); echo "correct \t horse \n battery \t staple" > "$tmpFile"; cat "$tmpFile"; tmpStringNewline=$(pwgen 6 1); tmpStringTab=$(pwgen 6 1); sed -ri "s/[\d92]n/$tmpStringNewline/g; s/[\d92]t/$tmpStringTab/g" "$tmpFile"; cat "$tmpFile"; sed -ri 's/'$tmpStringNewline'/\\n/g; s/'$tmpStringTab'/\\t/g' "$tmpFile"; cat "$tmpFile"; rm "$tmpFile"

  1. We echo a literal string into a temporary file. No -e flag, so we're effectively writing the \ + n (and \ + t) distinctly, which is confirmed by the first cat.
  2. Then pwgen generates 2 random strings, so that we can replace \n and \t with unique strings.
  3. [\d92]n and [\d92]t are the regexp-style ways of saying "literally \n" and "literal \t". 92 is the decimal ASCII code of \.
  4. Then cat to confirm the first round of replacements worked.
  5. Reverting these replacements : mind the simple quotes (double quotes on the first round). The variables holding the random strings are outside of the sed s/// command string, and the \ are \-escaped. When found within double quotes by sed, \n and \t have their special meanings.