Newer
Older
# GREP (part2)
## setup
??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!"
{%
include-markdown "pages/bash_manip/bash_manip-0-setup.md"
%}
## Searching patterns (grep)
In order to use regular expression in grep you should use the `-E` option:
```bash
grep -E pattern file
```
Back to our data file `nat2021.csv` containing first names given to children born in France since 1900.
Let's play with some RegEx...
!!! question "How to define the data structure using regex that match all lines excepted header (e.g.`1;PRENOMS;1904;1430)`?"
??? example "Click to show the solution"
```bash
[1|2];[A-Za-z]+;[0-9]{4};[0-9]+
[1|2];[A-Za-z\-]+;[0-9]{4};[0-9]+ # in case we want to handle the compassed first name (-)
[1|2];[A-Za-z\-_]+;[0-9]{4};[0-9]+ # in case we want to take care of _PRENOMS_RARES too (- and _)
!!! question "What names have been provided more than 10 000 times in 1980?"
??? example "Click to show the solution"
```bash
grep -E '[1|2];[A-Za-z]+;1980;[0-9]{5,}' nat2021.csv # add | wc -l to count
!!! question "What names have been provided more than 10 000 time in 1980?"
??? example "Click to show the solution"
```bash
grep -E '[1|2];[A-Za-z]+;1980;[2-9]{1}[0-9]{4,}' nat2021.csv
!!! question "List all names provided more than 20 000 times/year over all the years? In a second time try to remove redundancy (using `cut`, `sort` and `uniq`). In a third time count the number of lines."
??? example "Click to show the solution"
```bash
grep -E '[1|2];[A-Za-z]+;.*;[2-9]{1}[0-9]{4,}' nat2021.csv
??? example "Click to show the solution without redundancy"
grep -E '[1|2];[A-Za-z]+;.*;[2-9]{1}[0-9]{4,}' nat2021.csv | cut -d ';' -f 2 | sort -u
??? example "Click to show the solution without redundancy"
grep -E '[1|2];[A-Za-z]+;.*;[2-9]{1}[0-9]{4,}' nat2021.csv | cut -d ';' -f 2 | sort -u | wc -l
# Result = 21