Skip to content
Snippets Groups Projects
bash_manip-3-grep.md 3.1 KiB
Newer Older
# Grep
## setup
??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!"
    {%
    include-markdown "pages/bash_manip/bash_manip-0-setup.md"
    %}

## Concept

`Grep` stands for "global regular expression print". It searches through the contents of files for lines that match a specified **pattern**.  
The basic syntax is:  
```bash
grep [options] pattern [file...]
It has a lot of options but the most common ones are:


| Command | Explanation |
|---------|-------------|
| -i | Ignore case distinctions in patterns and data. |
| -v | Invert the match, showing lines that do not match the pattern. |
| -n | Prefix each line of output with the line number. |
| -c | Print only a count of matching lines per file. |
| -o | Print each match on a new line. |
## Excercice
!!! question "How many lines contain the number `2` in `nat2021.csv` file?"
??? example "Click to show the solution"  
    ```bash
    grep 2 nat2021.csv
    # 553461
    ```

!!! question "How many occurence of number `2` exists in `nat2021.csv` file?"

??? example "Click to show the solution"  
    ```bash
    grep -o 2 nat2021.csv | wc -l
    # -o makes grep print each match on a new line.
    # wc -l counts the number of lines, which equals the total occurrences
    # 871258
    ```

!!! question "Select all line related of the year 2001 in `nat2021.csv` file"

Pay attention that value 2021 may occur in 2 different columns: `annais` (column3) and `nombre` (column4)

??? example "Click to show the solution"  
    ```bash
    grep ";2021;" nat2021.csv
    ```

!!! question "How many diffent names have been provided in 2021 (_PRENOMS_RARES count for 1)?"

??? example "Click to show the solution"  
    ```bash
    grep ";2021;" nat2021.csv | wc -l
    # result: 13501
    ```

!!! question "Is there more diversity in male or female names in 2021?"

??? example "Click to show the solution"  
    ```bash
    # female - field one contains male female information (-f 1) then count female (grep -c 2) 
    grep ";2021;" nat2021.csv | cut -d ';'  -f 1 | grep -c 2
    # result: 7112
    # male - field one contains male female information (-f 1) then count male (grep -c 1) 
    grep ";2021;" nat2021.csv | cut -d ';'  -f 1 | grep -c 1
    # result: 6389
    ```

!!! question "How many person are called PARIS in 2021?"

??? example "Click to show the solution"  
    ```bash
    # female
    grep "PARIS;2021;" nat2021.csv
    # result 16 (5 male and 11 female)
    ```

The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques/2540004?sommaire=4767262#documentation)) are set as `_PRENOMS_RARES`.

!!! question "Could you find the number of rare name per year ? Do you see any pattern?"

??? example "Click to show the solution"  
    ```bash
    grep ";_PRENOMS_RARES;" nat2021.csv
    ```
    People tends to provide more and more rare names.


!!! question "What year was the most prolific year for the name ZINEDINE?"

??? example "Click to show the solution"  
    ```bash
    # command
    grep ";ZINEDINE;" nat2021.csv | sort -n -t ';' -k4
    # result: 1998
    ```