Skip to content
Snippets Groups Projects
bash_manip-7-sed.md 7.81 KiB
Newer Older
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
# SED
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
## setup
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!"
    {%
    include-markdown "pages/bash_manip/bash_manip-0-setup.md"
    %}
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed


## Concept

`sed` (short for Stream Editor) is a powerful command-line tool used for text manipulation, allowing you to **search**, **replace**, **delete**, and **modify** text within files or streams.

**Syntax**

```bash
sed [Option(s)] 'Command(s)' [File(s)]
```
??? Note "Available Options"
    | Option | Description |
    |----------|----------|
    | -n | Suppress automatic printing of pattern space
    | -e | Add the script to the commands to be executed
    | -f | Add the script file to the commands to be executed
    | -i | Edit files in place (makes backup if extension supplied)
    | -r | Use extended regular expressions in the script
    | -s | Treat files as separate rather than as a single continuous long stream

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
At our level the options the most useful would be `-n`, `-i` and `-e`

Skipping the option part, `sed` commands can be shaped in different way :
<pattern>
```bash
# case1 by line number
sed '<integer>FLAG'
# case2 by line matching
sed '/<pattern>/FLAG'
# case2.2 by line matching
sed '/<pattern>/FLAG <string>'
# case3 by match
sed 'FLAG/<pattern>/<string>/'
# case3.2 by match
sed 'FLAG/<pattern>/<string>/FLAG'
```

??? Note "Available FLAGs"
    | Command | Description | Comment | Case1 `sed '<integer>FLAG'` | Case2 `sed '/<pattern>/FLAG'`| Case2.2 `sed '/<pattern>/FLAG <string>'` | Case3 `sed 'FLAG/<pattern>/<string>/'` | Case3.2 `sed 'FLAG/<pattern>/<string>/FLAG'`|
    |----------|----------| ----| ----| ----| ----| ----| ----|
    q | Quit after a line (`/<pattern>/q` or `<integer>q`) | | x | x | | | |
    d | Delete lines (`/<pattern>/d` or `<integer>d`) | | x | x | | | |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
    p | Print matched lines (`-n '/<pattern>/p'`) | Only with `-n` option | | x | | | |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
    a | Append text after a line (`/<pattern>/a Add new text after`) | On macOS (BSD sed) the command requires a backslash (`\`) and a newline. | | | x | | |
    i | Insert text before a line (`/<pattern>/i Add new text before`) | On macOS (BSD sed) the command requires a backslash (`\`) and a newline | | | x | | |
    c | Change entire line (`/<pattern>/c This is a new line`) | | | | x | | |
    y | character transliteration (`y/<characters>/<characters>/`) | | | | | x | |
    s | Substitute first match on each line (`s/<pattern>/<string>/`) | | | | | x | |
    s + g | Global - Substitute all occurrences on each line (`s/<pattern>/<string>/g`) | | | | | x |  x |
    s + i | Case-insensitive - Substitute all occurrences on each line (`s/<pattern>/<string>/i`) | | | | | x | x |
    s + p | Print modified lines (`s/<pattern>/<string>/p`) | | | | | x | x |
    s + g + i + p | A combination of s + flags i,p,g is possilbe (`s/<pattern>/<string>/pig`) | | | | | x | x |

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

## Line selection 

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Syntax
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

```bash
sed -n 'line p' file
```

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| Command | Description | Comment |
|----------|----------| ----------|
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -n '8p' file` | Print line 8 |
| `sed -n '8p; 16p' file` | Print lines 8 and 16 |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -n '8p; 16p' file` | Print lines 8 and 16 |
| `sed -n -e '8p' -e '16p' file` | Print lines 8 and 16 |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -n '8,16 p' file` | Print lines from 8 to 16 |
| `sed '8,$ p' file` | Print lines from line 8 to the end of the file |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -n '1~8 p' file` | Print from line 1, every 8 lines | `~` not supported by BSD sed (MacOS)
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Exercice

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
!!! question "Print the header and line 686 529 until the end."

??? example "Click to show the solution"  
    ```bash
    sed -n '1p; 686529,$p' nat2021.csv
    ```
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

## Line deletion 

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Syntax
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

```bash
sed 'line d' file
```

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| Command | Description | Comment |
|----------|----------| ----------|
| `sed '8d' file` | Delete line 8 | 
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed '8d; 16d' file` | Delete lines 8 and 16 |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -e '8d' -e '16d' file` | Delete lines 8 and 16 |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed '8,16 d' file` | Delete lines from 8 to 16 |
| `sed '8,$ d' file` | Delete lines from line 8 to the end of the file |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed '1~8d' file` | Delete from line 1, every 8 lines | `~` not supported by BSD sed (MacOS)
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Exercice

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
!!! question "Delete everything from line 10 to 686 529."
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
??? example "Click to show the solution"  
    ```bash
    sed '10,686529d' nat2021.csv
    ```
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
## Use of Regular Expression 

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Syntax
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

```bash
sed 'RegEx' file
```

| Command | Description |
|----------|----------|
| `sed '/^#/d' file` | Delete lines starting by # |
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
| `sed -n '/[0-9][0-9][0-9][0-9]/p' file` | Print lines matching any number with 4 digits |
| `sed -E -n '/[0-9]{4}/p' file` | Print lines matching any number with 4 digits using extended regular expressions |

??? Note "Summary of sed Regex Operators"
    | Operator | Description |
    |----------|----------|
    . | Matches any character except newline | sed 's/a.b/c/g'
    ^ | Matches the start of a line | sed '/^apple/d'
    $ | Matches the end of a line | sed '/end$/d'
    * | Matches 0 or more occurrences of the preceding character | sed 's/a*b/c/g'
    [] | Matches any one character in the class | sed 's/[aeiou]/X/g'
    [^] | Matches any character not in the class | sed 's/[^a-z]/X/g'
    () | Groups characters (Extended Regex) | `sed -E ’s/(apple
    ` | ` | OR operator (Extended Regex)
    + | Matches 1 or more occurrences (Extended Regex) | sed -E 's/a+b/c/g'
    ? | Matches 0 or 1 occurrence (Extended Regex) | sed -E 's/a?b/c/g'
    {n,m} | Matches between n and m occurrences (Extended Regex) | sed -E 's/a{2,4}/c/g'
    \ | Escapes special characters | sed 's/\./X/g'
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Exercice

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
!!! question "Select all line that match PIERRE in the 2000s."
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
??? example "Click to show the solution"  
    ```bash
    sed -E -n '/;PIERRE;2[0-9]{3}/p' nat2021.csv
    ```
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
## Subsitution

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Syntax
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

```bash
sed 's/pattern/replacement/' file
```

| Command | Description |
|----------|----------|
| `s/pattern/replacement/` | Substitute the first occurrence of pattern with replacement |
| `s/pattern/replacement/2` | Substitute the second occurrence of pattern with replacement |
| `s/pattern/replacement/g` | Substitute all occurrences of pattern with replacement |
| `s/pattern/replacement/i` | Substitute the first occurrence of pattern with replacement, ignoring case |
| `s/pattern/replacement/gi` | Substitute all occurrences of pattern with replacement, ignoring case |

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Exercice

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
!!! question "Replace all numbers from last colum by XX"

??? example "Click to show the solution"  
    ```bash
    sed -E 's/[0-9]+.?$/XX/' nat2021.csv
    # /!\ s/[0-9]+$/XX/ does not work because an unprintable character exist at the end of line (\r). Using .? allow to match this unprintable character.
    ```

!!! question "Replace all numbers by X"
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
??? example "Click to show the solution"  
    ```bash
    sed -E 's/[0-9]/X/g' nat2021.csv
    ```
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

## Capturing
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

It is possible to extract part of a line. Let's take the example of the extraction of a value from an attribute (`tag=value`) with tag `Name` of the 9th column of a GFF/GTF file.

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
### Syntax
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed

```bash
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
sed -n 's/.*START\([^END]*\)END.*/\1/p' file.txt
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
```

* `-n` Suppresses default output (only prints matches).
* `s/.../.../p` Substitutes text and prints only the matched part.
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
* `.*` Matches anything before the START marker.
* `START` The fixed pattern before the part we want.
* `\(` Start of capture group (tells sed to remember this part).
* `[^END]*` Captures everything until it reaches the END marker.
* `\)` End of capture group.
* `END` The fixed text after the part we want.
* `.*` Matches everything after the END marker.
* `\1` Prints the first captured group (here only 1 has been captured).
* `p` Explicitly prints the result (only used with -n).

### Exercice

!!! question "List all names that are associated to PIERRE (e.g. OLIVIER that is used to do PIERRE-OLIVER)"

??? example "Click to show the solution"  
    ```bash
    sed -n 's/.*;PIERRE-\([^;]*\);.*/\1/p' nat2021.csv | sort -u
    ```