diff --git a/docs/pages/bash_manip/bash_manip-0-setup.md b/docs/pages/bash_manip/bash_manip-0-setup.md new file mode 100644 index 0000000000000000000000000000000000000000..4ea75296f028e0ecd77cbbc7fd03c73f66e34ae1 --- /dev/null +++ b/docs/pages/bash_manip/bash_manip-0-setup.md @@ -0,0 +1,21 @@ +To do this exercice you will use data of first names given to children born in France since 1900 downloaded from "Institut national de la statistique et des études économiques" (see [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628) for details). + +```bash +wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip +unzip nat2021_csv.zip +``` + +You should now have a file called `nat2021.csv` in your working directory. + +The data contained in this file have this shape: +``` +sexe;preusuel;annais;nombre +2;SANDRINE;1973;17605 +1;JEAN;1960;17607 +1;_PRENOMS_RARES;1904;1430 +``` + +The first line is the header where `preusuel` means `prenom usuel` and `annais` means `année naissance`. +The subsequent lines are the data. +1 in column sex means male and 2 means female. +`_PRENOMS_RARES` are rare first names. They are classified as rare following criteria described [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628#documentation). \ No newline at end of file diff --git a/docs/pages/bash_manip/bash_manip-2-basics.md b/docs/pages/bash_manip/bash_manip-2-basics.md index 7414c27eff6d4de716a508c3e9f1043deb2a0395..9454def00bf564bc6abda9de847d6378f70febd0 100644 --- a/docs/pages/bash_manip/bash_manip-2-basics.md +++ b/docs/pages/bash_manip/bash_manip-2-basics.md @@ -1,25 +1,10 @@ -# Extracting from files +# Basic commands -To do this exercice you will use data of first names given to children born in France since 1900 downloaded from "Institut national de la statistique et des études économiques" (see [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628) for details). +## setup -```bash -wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip -unzip nat2021_csv.zip -``` - -You should now have a file called `nat2021.csv` in your working directory. - -The data contained in this file have this shape: -``` -sexe;preusuel;annais;nombre -2;SANDRINE;1973;17605 -1;JEAN;1960;17607 -1;_PRENOMS_RARES;1904;1430 -``` - -The first line is the header where `preusuel` means `prenom usuel` and `annais` means `année naissance`. -The subsequent lines are the data. -`_PRENOMS_RARES` are rare first names. They are classified as rare following criteria described [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628#documentation). +{% + include-markdown "pages/bash_manip/bash_manip-0-setup.md" +%} ## Displaying sample (head, tail) @@ -139,12 +124,12 @@ The `uniq` command can be used to remove the redundancy. But result need to be s You can redirect a result and store it in a file thanks to the `>` redirection: `command > filename` -!!! question "Save all the names from 2005 in a dedicated file?" +!!! question "Choose a command used before and save the result in a dedicated file?" ??? example "Click to show the solution" ```bash # command - grep ";2025;" nat2021.csv > names2005.txt + sort -n -t ';' -k4 nat2021.csv | tail -n 100 | cut -d";" -f 2 | sort | uniq > names2005.txt ``` ## Final question diff --git a/docs/pages/bash_manip/bash_manip-3-grep.md b/docs/pages/bash_manip/bash_manip-3-grep.md index f5a375dd9410ef6d8e2f0b536108ab2d5feb2a76..ecfebc41ca67984de5ad1431fdd061ff70dab07f 100644 --- a/docs/pages/bash_manip/bash_manip-3-grep.md +++ b/docs/pages/bash_manip/bash_manip-3-grep.md @@ -1,27 +1,61 @@ -# Extracting from files +# Grep -To do this exercice you will need to download French First name data from "Institut national de la statistique -et des études économiques" +## setup +??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!" + {% + include-markdown "pages/bash_manip/bash_manip-0-setup.md" + %} + +## Concept + +`Grep` stands for "global regular expression print". It searches through the contents of files for lines that match a specified **pattern**. +The basic syntax is: ```bash -wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip -unzip nat2021_csv.zip +grep [options] pattern [file...] ``` -You should now have a file called `nat2021.csv` in your working directory. +It has a lot of options but the most common ones are: + + +| Command | Explanation | +|---------|-------------| +| -i | Ignore case distinctions in patterns and data. | +| -v | Invert the match, showing lines that do not match the pattern. | +| -n | Prefix each line of output with the line number. | +| -c | Print only a count of matching lines per file. | +| -o | Print each match on a new line. | +## Excercice +!!! question "How many lines contain the number `2` in `nat2021.csv` file?" -## Searching patterns (grep) +??? example "Click to show the solution" + ```bash + grep 2 nat2021.csv + # 553461 + ``` + +!!! question "How many occurence of number `2` exists in `nat2021.csv` file?" + +??? example "Click to show the solution" + ```bash + grep -o 2 nat2021.csv | wc -l + # -o makes grep print each match on a new line. + # wc -l counts the number of lines, which equals the total occurrences + # 871258 + ``` !!! question "Select all line related of the year 2001 in `nat2021.csv` file" +Pay attention that value 2021 may occur in 2 different columns: `annais` (column3) and `nombre` (column4) + ??? example "Click to show the solution" ```bash grep ";2021;" nat2021.csv ``` -!!! question "How many names have been provided in 2021?" +!!! question "How many diffent names have been provided in 2021 (_PRENOMS_RARES count for 1)?" ??? example "Click to show the solution" ```bash @@ -29,19 +63,19 @@ You should now have a file called `nat2021.csv` in your working directory. # result: 13501 ``` -!!! question "Is there more diversity in male or female names in 2021"? +!!! question "Is there more diversity in male or female names in 2021?" ??? example "Click to show the solution" ```bash - # female - grep ";2021;" nat2021.csv | grep "^2" | wc -l + # female - field one contains male female information (-f 1) then count female (grep -c 2) + grep ";2021;" nat2021.csv | cut -d ';' -f 1 | grep -c 2 # result: 7112 - # male - grep ";2021;" nat2021.csv | grep "^1" | wc -l + # male - field one contains male female information (-f 1) then count male (grep -c 1) + grep ";2021;" nat2021.csv | cut -d ';' -f 1 | grep -c 1 # result: 6389 ``` -!!! question "How many person are called PARIS in 2021"? +!!! question "How many person are called PARIS in 2021?" ??? example "Click to show the solution" ```bash @@ -52,7 +86,7 @@ You should now have a file called `nat2021.csv` in your working directory. The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques/2540004?sommaire=4767262#documentation)) are set as `_PRENOMS_RARES`. -!!! question "Could you find all rare name ? Do you see any pattern?" +!!! question "Could you find the number of rare name per year ? Do you see any pattern?" ??? example "Click to show the solution" ```bash @@ -61,7 +95,7 @@ The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques People tends to provide more and more rare names. -!!! question "What year was the most prolific fot the name ZINEDINE?" +!!! question "What year was the most prolific year for the name ZINEDINE?" ??? example "Click to show the solution" ```bash @@ -71,20 +105,3 @@ The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques ``` - -## Redirecting an output (>) - -You can redirect a result and store it in a file thanks to the `>` redirection: -`command > filename` - -!!! question "Save all the names from 2005 in a dedicated file?" - -??? example "Click to show the solution" - ```bash - # command - grep ";2025;" nat2021.csv > names2005.txt - ``` - - - - diff --git a/docs/pages/bash_manip/bash_manip-4-awk copy.md b/docs/pages/bash_manip/bash_manip-4-awk copy.md deleted file mode 100644 index 449859824017fe06683bef2ad4da66561b09127c..0000000000000000000000000000000000000000 --- a/docs/pages/bash_manip/bash_manip-4-awk copy.md +++ /dev/null @@ -1,23 +0,0 @@ -# Extracting from files - -To do this exercice you will need to download French First name data from "Institut national de la statistique -et des études économiques" - -```bash -wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip -unzip nat2021_csv.zip -``` - -You should now have a file called `nat2021.csv` in your working directory. - - -## Filtering a file (awk) - - - -## Replacing patterns (sed) - - - - - diff --git a/docs/pages/bash_manip/bash_manip-4-awk.md b/docs/pages/bash_manip/bash_manip-4-awk.md new file mode 100644 index 0000000000000000000000000000000000000000..bf121f8b5c721da9205e5da5faaab98f7a0a33a8 --- /dev/null +++ b/docs/pages/bash_manip/bash_manip-4-awk.md @@ -0,0 +1,20 @@ +# AWK + +## setup + +??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!" + {% + include-markdown "pages/bash_manip/bash_manip-0-setup.md" + %} + + +## Filtering a file (awk) + + + +## Replacing patterns (sed) + + + + + diff --git a/docs/pages/bash_manip/bash_manip-4-regex.md b/docs/pages/bash_manip/bash_manip-4-regex.md new file mode 100644 index 0000000000000000000000000000000000000000..d846cca7f03d2aacc6e8866ecfdc5388c579ceed --- /dev/null +++ b/docs/pages/bash_manip/bash_manip-4-regex.md @@ -0,0 +1,46 @@ +# Regular Expression + +Regular expressions (regex) are sequences of characters that define a search pattern. They are used for pattern matching within strings. +It is a powerful tools for text processing and can be used in various command-line utilities like `grep`, `sed`, and `awk` to search, match, and manipulate text. + +## Regular Expression Summary + +| Symbol | Description | Example | Matches | +|--------|-------------|---------|---------| +| `.` | Any single character except newline | `a.b` | `aab`, `acb`, `a1b` | +| `^` | Start of a line | `^abc` | `abc` at the start of a line | +| `$` | End of a line | `abc$` | `abc` at the end of a line | +| `*` | Zero or more of the preceding element | `ab*c` | `ac`, `abc`, `abbc` | +| `+` | One or more of the preceding element | `ab+c` | `abc`, `abbc` | +| `?` | Zero or one of the preceding element | `ab?c` | `ac`, `abc` | +| `{n}` | Exactly n of the preceding element | `a{3}` | `aaa` | +| `{n,}` | n or more of the preceding element | `a{2,}` | `aa`, `aaa`, `aaaa` | +| `{n,m}`| Between n and m of the preceding element | `a{2,3}` | `aa`, `aaa` | +| `[]` | Any one of the characters within the brackets | `[abc]` | `a`, `b`, `c` | +| `[^]` | Any one character not within the brackets | `[^abc]` | Any character except `a`, `b`, `c` | +| `|` | Alternation (OR) | `a|b` | `a`, `b` | +| `()` | Grouping | `(abc)` | `abc` | +| `\d` | Any digit (0-9) | `\d` | `0`, `1`, `2`, ..., `9` | +| `\D` | Any non-digit | `\D` | Any character except `0-9` | +| `\w` | Any word character (alphanumeric + underscore) | `\w` | `a`, `b`, `1`, `_` | +| `\W` | Any non-word character | `\W` | Any character except `a-z`, `A-Z`, `0-9`, `_` | +| `\s` | Any whitespace character | `\s` | Space, tab, newline | +| `\S` | Any non-whitespace character | `\S` | Any character except space, tab, newline | + +It is possible to use POSIX character classes: + +| Symbol | Description | +|--------|-------------| +| [:alnum:] | equivqlent to A-Za-z0-9 | +| [:alpha:] | equivalent to A-Za-z | +| [:blank:] | equivalent to space or tab | +| [:digit:] | equivalent to 0-9 | + + +!!! Warning + Do not confound with **Globbing** (Pathname expansion) used to match filename! + `?` Any single character + `*` Zero or more characters + `[]` Specify a range. Any character of the range or none of them using `!` inside the bracket. + `{term1,term2}` Specify a list of terms separated by commas and each term must be a name or a wildcard. + `{term1..term2}` Called brace expansion, this syntax expands all the terms between term1 and term2 (Letters or Integers). \ No newline at end of file diff --git a/docs/pages/bash_manip/bash_manip-5-grep2.md b/docs/pages/bash_manip/bash_manip-5-grep2.md new file mode 100644 index 0000000000000000000000000000000000000000..81f11cfc7a42402725a3ce9def4be7b357b0201b --- /dev/null +++ b/docs/pages/bash_manip/bash_manip-5-grep2.md @@ -0,0 +1,83 @@ +# GREP (part2) + +## setup + +??? quote "First click and follow the instructions below only if you start the course at this stage! Otherwise skip this step!" + {% + include-markdown "pages/bash_manip/bash_manip-0-setup.md" + %} + + +## Searching patterns (grep) + +!!! question "Select all line related of the year 2001 in `nat2021.csv` file" + +??? example "Click to show the solution" + ```bash + grep ";2021;" nat2021.csv + ``` + +!!! question "How many names have been provided in 2021?" + +??? example "Click to show the solution" + ```bash + grep ";2021;" nat2021.csv | wc -l + # result: 13501 + ``` + +!!! question "Is there more diversity in male or female names in 2021"? + +??? example "Click to show the solution" + ```bash + # female + grep ";2021;" nat2021.csv | grep "^2" | wc -l + # result: 7112 + # male + grep ";2021;" nat2021.csv | grep "^1" | wc -l + # result: 6389 + ``` + +!!! question "How many person are called PARIS in 2021"? + +??? example "Click to show the solution" + ```bash + # female + grep "PARIS;2021;" nat2021.csv + # result 16 (5 male and 11 female) + ``` + +The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques/2540004?sommaire=4767262#documentation)) are set as `_PRENOMS_RARES`. + +!!! question "Could you find all rare name ? Do you see any pattern?" + +??? example "Click to show the solution" + ```bash + grep ";_PRENOMS_RARES;" nat2021.csv + ``` + People tends to provide more and more rare names. + + +!!! question "What year was the most prolific fot the name ZINEDINE?" + +??? example "Click to show the solution" + ```bash + # command + grep ";ZINEDINE;" nat2021.csv | sort -n -t ';' -k4 + # result: 1998 + ``` + + +You can redirect a result and store it in a file thanks to the `>` redirection: +`command > filename` + +!!! question "Select all the names from 2005 in a dedicated file?" + +??? example "Click to show the solution" + ```bash + # command + grep ";2005;" nat2021.csv + ``` + + + + diff --git a/docs/pages/cheat_sheet/bash/bash.md b/docs/pages/cheat_sheet/bash/bash.md index 67c6a4160a5a6f15da6e0c8303a125f10e5c034d..cd96e058b5586ad85ac10514e9407241ad6e1b2e 100644 --- a/docs/pages/cheat_sheet/bash/bash.md +++ b/docs/pages/cheat_sheet/bash/bash.md @@ -5,4 +5,9 @@ <iframe id="iframepdf" src="../Bash_cheat_sheet_level2.pdf" frameborder="0" width="640" height="480" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe> </br> # level 3 - Programming -<iframe id="iframepdf" src="../Bash_cheat_sheet_level3.pdf" frameborder="0" width="640" height="480" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe> \ No newline at end of file +<iframe id="iframepdf" src="../Bash_cheat_sheet_level3.pdf" frameborder="0" width="640" height="480" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe> + +# Interesting ressources + +* [Software carpentry](https://swcarpentry.github.io/shell-novice/index.html) +* [gentoo linux](https://devmanual.gentoo.org/tools-reference/bash/index.html) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index c6d5a3c44fbd4c3ec0e932c6c2d5c01e6a710ad2..a989ce08f904fcfd355a6dd65752f283df5e7101 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -107,8 +107,9 @@ nav: - Course overview: pages/bash_manip/bash_manip-0-overview.md - Introduction: pages/bash_manip/bash_manip-1-introduction.md - Basic commands: pages/bash_manip/bash_manip-2-basics.md - - RegEx: pages/bash_manip/bash_manip-3-grep.md - - Grep: pages/bash_manip/bash_manip-3-grep.md + - Grep (part1): pages/bash_manip/bash_manip-3-grep.md + - Regular expressions: pages/bash_manip/bash_manip-4-regex.md + - Grep (part2): pages/bash_manip/bash_manip-5-grep2.md - Awk: pages/bash_manip/bash_manip-4-awk.md - Sed: pages/bash_manip/bash_manip-5-sed.md - Bash scripting: