Skip to content
Snippets Groups Projects
Commit 117fa308 authored by jacques.dainat_ird.fr's avatar jacques.dainat_ird.fr
Browse files

add flags

parent b8ebf795
No related branches found
No related tags found
No related merge requests found
Pipeline #84431 passed
# Extracting from files
# SED
## setup
```bash
# or curl -O instead of wget
wget https://ftp.ensembl.org/pub/release-113/gff3/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.113.gff3.gz
gunzip Saccharomyces_cerevisiae.R64-1-1.113.gff3.gz
mv Saccharomyces_cerevisiae.R64-1-1.113.gff3 yeast.gff
```
You should now have a file called `yeast.gff` in your working directory.
```
##gff-version 3
###
I sgd gene 335 649 . + . ID=gene:YAL069W;biotype=protein_coding;description=Dubious open reading frame%3B unlikely to encode a functional protein%2C based on available experimental and comparative sequence data [Source:SGD%3BAcc:S000002143];gene_id=YAL069W;logic_name=sgd
I sgd mRNA 335 649 . + . ID=transcript:YAL069W_mRNA;Parent=gene:YAL069W;biotype=protein_coding;tag=Ensembl_canonical;transcript_id=YAL069W_mRNA
I sgd exon 335 649 . + . Parent=transcript:YAL069W_mRNA;Name=YAL069W_mRNA-E1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=YAL069W_mRNA-E1;rank=1
I sgd CDS 335 649 . + 0 ID=CDS:YAL069W;Parent=transcript:YAL069W_mRNA;protein_id=YAL069W
###
```
The GFF/GTF format describe genomics features, such as genes, exons, CDS in a standardized format.
Every line starting with `#` is a comment.
Each line is a feature and contains 9 fields (tabulation separated).
??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!"
{%
include-markdown "pages/bash_manip/bash_manip-0-setup.md"
%}
## Concept
......@@ -44,7 +27,39 @@ sed [Option(s)] 'Command(s)' [File(s)]
| -r | Use extended regular expressions in the script
| -s | Treat files as separate rather than as a single continuous long stream
At our level the options most useful would be `-n` and `-i`
At our level the options the most useful would be `-n`, `-i` and `-e`
Skipping the option part, `sed` commands can be shaped in different way :
<pattern>
```bash
# case1 by line number
sed '<integer>FLAG'
# case2 by line matching
sed '/<pattern>/FLAG'
# case2.2 by line matching
sed '/<pattern>/FLAG <string>'
# case3 by match
sed 'FLAG/<pattern>/<string>/'
# case3.2 by match
sed 'FLAG/<pattern>/<string>/FLAG'
```
??? Note "Available FLAGs"
| Command | Description | Comment | Case1 `sed '<integer>FLAG'` | Case2 `sed '/<pattern>/FLAG'`| Case2.2 `sed '/<pattern>/FLAG <string>'` | Case3 `sed 'FLAG/<pattern>/<string>/'` | Case3.2 `sed 'FLAG/<pattern>/<string>/FLAG'`|
|----------|----------| ----| ----| ----| ----| ----| ----|
q | Quit after a line (`/<pattern>/q` or `<integer>q`) | | x | x | | | |
d | Delete lines (`/<pattern>/d` or `<integer>d`) | | x | x | | | |
p | Print matched lines (`-n '/<pattern>/p'`) | | | x | | | |
a | Append text after a line (`/<pattern>/a Add new text after`) | On macOS (BSD sed) the command requires a backslash (`\`) and a newline. | | | x | | |
i | Insert text before a line (`/<pattern>/i Add new text before`) | On macOS (BSD sed) the command requires a backslash (`\`) and a newline | | | x | | |
c | Change entire line (`/<pattern>/c This is a new line`) | | | | x | | |
y | character transliteration (`y/<characters>/<characters>/`) | | | | | x | |
s | Substitute first match on each line (`s/<pattern>/<string>/`) | | | | | x | |
s + g | Global - Substitute all occurrences on each line (`s/<pattern>/<string>/g`) | | | | | x | x |
s + i | Case-insensitive - Substitute all occurrences on each line (`s/<pattern>/<string>/i`) | | | | | x | x |
s + p | Print modified lines (`s/<pattern>/<string>/p`) | | | | | x | x |
s + g + i + p | A combination of s + flags i,p,g is possilbe (`s/<pattern>/<string>/pig`) | | | | | x | x |
## Line selection
......@@ -58,6 +73,8 @@ sed -n 'line p' file
|----------|----------|
| `sed -n '8p' file` | Print line 8 |
| `sed -n '8p; 16p' file` | Print lines 8 and 16 |
| `sed -n '8p; 16p' file` | Print lines 8 and 16 |
| `sed -n -e '8p' -e '16p' file` | Print lines 8 and 16 |
| `sed -n '8,16 p' file` | Print lines from 8 to 16 |
| `sed '8,$ p' file` | Print lines from line 8 to the end of the file |
| `sed -n '1~8 p' file` | Print from line 1, every 8 lines |
......@@ -75,6 +92,7 @@ sed 'line d' file
|----------|----------|
| `sed '8d' file` | Delete line 8 |
| `sed '8d; 16d' file` | Delete lines 8 and 16 |
| `sed -e '8d' -e '16d' file` | Delete lines 8 and 16 |
| `sed '8,16 d' file` | Delete lines from 8 to 16 |
| `sed '8,$ d' file` | Delete lines from line 8 to the end of the file |
| `sed '1~8d' file` | Delete from line 1, every 8 lines |
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment