Skip to content
Snippets Groups Projects

SED

setup

??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!" {% include-markdown "pages/bash_manip/bash_manip-0-setup.md" %}

Concept

sed (short for Stream Editor) is a powerful command-line tool used for text manipulation, allowing you to search, replace, delete, and modify text within files or streams.

Syntax

sed [Option(s)] 'Command(s)' [File(s)]

??? Note "Available Options" | Option | Description | |----------|----------| | -n | Suppress automatic printing of pattern space | -e | Add the script to the commands to be executed | -f | Add the script file to the commands to be executed | -i | Edit files in place (makes backup if extension supplied) | -r | Use extended regular expressions in the script | -s | Treat files as separate rather than as a single continuous long stream

At our level the options the most useful would be -n, -i and -e

Skipping the option part, sed commands can be shaped in different way :

# case1 by line number
sed '<integer>FLAG'
# case2 by line matching
sed '/<pattern>/FLAG'
# case2.2 by line matching
sed '/<pattern>/FLAG <string>'
# case3 by match
sed 'FLAG/<pattern>/<string>/'
# case3.2 by match
sed 'FLAG/<pattern>/<string>/FLAG'

??? Note "Available FLAGs" | Command | Description | Comment | Case1 sed '<integer>FLAG' | Case2 sed '/<pattern>/FLAG'| Case2.2 sed '/<pattern>/FLAG <string>' | Case3 sed 'FLAG/<pattern>/<string>/' | Case3.2 sed 'FLAG/<pattern>/<string>/FLAG'| |----------|----------| ----| ----| ----| ----| ----| ----| q | Quit after a line (/<pattern>/q or <integer>q) | | x | x | | | | d | Delete lines (/<pattern>/d or <integer>d) | | x | x | | | | p | Print matched lines (-n '/<pattern>/p') | Only with -n option | | x | | | | a | Append text after a line (/<pattern>/a Add new text after) | On macOS (BSD sed) the command requires a backslash (\) and a newline. | | | x | | | i | Insert text before a line (/<pattern>/i Add new text before) | On macOS (BSD sed) the command requires a backslash (\) and a newline | | | x | | | c | Change entire line (/<pattern>/c This is a new line) | | | | x | | | y | character transliteration (y/<characters>/<characters>/) | | | | | x | | s | Substitute first match on each line (s/<pattern>/<string>/) | | | | | x | | s + g | Global - Substitute all occurrences on each line (s/<pattern>/<string>/g) | | | | | x | x | s + i | Case-insensitive - Substitute all occurrences on each line (s/<pattern>/<string>/i) | | | | | x | x | s + p | Print modified lines (s/<pattern>/<string>/p) | | | | | x | x | s + g + i + p | A combination of s + flags i,p,g is possilbe (s/<pattern>/<string>/pig) | | | | | x | x |

Line selection

Syntax

sed -n 'line p' file
Command Description Comment
sed -n '8p' file Print line 8
sed -n '8p; 16p' file Print lines 8 and 16
sed -n -e '8p' -e '16p' file Print lines 8 and 16
sed -n '8,16 p' file Print lines from 8 to 16
sed '8,$ p' file Print lines from line 8 to the end of the file
sed -n '1~8 p' file Print from line 1, every 8 lines ~ not supported by BSD sed (MacOS)

Exercice

!!! question "Print the header and line 686 529 until the end."

??? example "Click to show the solution"
bash sed -n '1p; 686529,$p' nat2021.csv

Line deletion

Syntax

sed 'line d' file
Command Description Comment
sed '8d' file Delete line 8
sed '8d; 16d' file Delete lines 8 and 16
sed -e '8d' -e '16d' file Delete lines 8 and 16
sed '8,16 d' file Delete lines from 8 to 16
sed '8,$ d' file Delete lines from line 8 to the end of the file
sed '1~8d' file Delete from line 1, every 8 lines ~ not supported by BSD sed (MacOS)

Exercice

!!! question "Delete everything from line 10 to 686 529."

??? example "Click to show the solution"
bash sed '10,686529d' nat2021.csv

Use of Regular Expression

Syntax

sed 'RegEx' file
Command Description
sed '/^#/d' file Delete lines starting by #
sed -n '/[0-9][0-9][0-9][0-9]/p' file Print lines matching any number with 4 digits
sed -E -n '/[0-9]{4}/p' file Print lines matching any number with 4 digits using extended regular expressions

??? Note "Summary of sed Regex Operators" | Operator | Description | |----------|----------| . | Matches any character except newline | sed 's/a.b/c/g' ^ | Matches the start of a line | sed '/^apple/d' $ | Matches the end of a line | sed '/end$/d' * | Matches 0 or more occurrences of the preceding character | sed 's/a*b/c/g' [] | Matches any one character in the class | sed 's/[aeiou]/X/g' [^] | Matches any character not in the class | sed 's/[^a-z]/X/g' () | Groups characters (Extended Regex) | sed -E ’s/(apple | ` | OR operator (Extended Regex) + | Matches 1 or more occurrences (Extended Regex) | sed -E 's/a+b/c/g' ? | Matches 0 or 1 occurrence (Extended Regex) | sed -E 's/a?b/c/g' {n,m} | Matches between n and m occurrences (Extended Regex) | sed -E 's/a{2,4}/c/g' \ | Escapes special characters | sed 's/./X/g'

Exercice

!!! question "Select all line that match PIERRE in the 2000s."

??? example "Click to show the solution"
bash sed -E -n '/;PIERRE;2[0-9]{3}/p' nat2021.csv

Subsitution

Syntax

sed 's/pattern/replacement/' file
Command Description
s/pattern/replacement/ Substitute the first occurrence of pattern with replacement
s/pattern/replacement/2 Substitute the second occurrence of pattern with replacement
s/pattern/replacement/g Substitute all occurrences of pattern with replacement
s/pattern/replacement/i Substitute the first occurrence of pattern with replacement, ignoring case
s/pattern/replacement/gi Substitute all occurrences of pattern with replacement, ignoring case

Exercice

!!! question "Replace all numbers from last colum by XX"

??? example "Click to show the solution"
bash sed -E 's/[0-9]+.?$/XX/' nat2021.csv # /!\ s/[0-9]+$/XX/ does not work because an unprintable character exist at the end of line (\r). Using .? allow to match this unprintable character.

!!! question "Replace all numbers by X"

??? example "Click to show the solution"
bash sed -E 's/[0-9]/X/g' nat2021.csv

Capturing

It is possible to extract part of a line. Let's take the example of the extraction of a value from an attribute (tag=value) with tag Name of the 9th column of a GFF/GTF file.

Syntax

sed -n 's/.*START\([^END]*\)END.*/\1/p' file.txt
  • -n Suppresses default output (only prints matches).
  • s/.../.../p Substitutes text and prints only the matched part.
  • .* Matches anything before the START marker.
  • START The fixed pattern before the part we want.
  • \( Start of capture group (tells sed to remember this part).
  • [^END]* Captures everything until it reaches the END marker.
  • \) End of capture group.
  • END The fixed text after the part we want.
  • .* Matches everything after the END marker.
  • \1 Prints the first captured group (here only 1 has been captured).
  • p Explicitly prints the result (only used with -n).

Exercice

!!! question "List all names that are associated to PIERRE (e.g. OLIVIER that is used to do PIERRE-OLIVER)"

??? example "Click to show the solution"
bash sed -n 's/.*;PIERRE-\([^;]*\);.*/\1/p' nat2021.csv | sort -u