Skip to content
Snippets Groups Projects
Commit bcd1acb6 authored by jacques.dainat_ird.fr's avatar jacques.dainat_ird.fr
Browse files

improve everything

parent 43222ca5
No related branches found
No related tags found
No related merge requests found
Pipeline #84265 passed
Showing
with 338 additions and 76 deletions
# Unix basics
# 🚀 **Ready to master Unix? Let's get started!**
<img src="pages/images/index/flatpak-steam.avif" alt="drawing" width="300"/>
<img src="pages/images/index/MacOSX.png" alt="drawing" width="300"/>
......@@ -9,11 +9,59 @@
width="20" height="20"/>
[GitHub repository]( {{config.repo_url}})
As most of users, our daily usage of a computer is ds done through a Graphical User Interface (GUI) in which we do some typing of text in combination with the clicking of drop-down menus and option buttons. Before appearance of GUI everything was made via the Command Line Interface (CLI). The simplification given by the GUI has been at the cost of reducing flexibility and control available in CLI.
By understanding the underlying concepts of the Unix command-line, you will be able to better understand how computer works. And by Learning the usage of a Unix command-line interface, you will be able to gain in productivity when it comes to handle many tasks as like file-management and jobs control. It will allow you to work remotly on High-performance computer (HPC) and open the acces of numerous bioinformatics tools that are only available in command-line.
It may seem like a waste of time and energy to learn, but I in bioinformatics Unix command-line is central and allow to lever a great potnetial of automation, tools, data/file processing, etc.
The courses in this repository are designed to guide participants from the fundamentals of the Unix terminal to advanced scripting techniques. It is divided into **three modules**, each focusing on key aspects of Unix usage, from navigating the command line to processing data efficiently and automating tasks with Bash scripting.
## Unix Basics
<!--part1-start-->
**Introduction to the Terminal: Demystifying the Black Screen**
Topics covered include:
- Introduction to the terminal and command line
- Navigating the file system
- Basic commands for file and directory management
- Understanding permissions and file ownership
- Using basic text editors
- Understanding and using bash commands
🎯 **Who is this course for?**
✅ Beginners who want to gain confidence using the terminal
<!--part1-end-->
## Data & File Manipulation
<!--part2-start-->
**Working with Text and Data**
This part focuses on data and file manipulation using powerful tools like `awk`, `sed`, and `grep`. Topics covered include:
- Searching and filtering with `grep` and `find`
- Editing and transforming data using `sed` and `awk`
- Sorting, cutting, and extracting data from structured files
- Handling large datasets efficiently with Unix tools
- Combining commands with pipes (`|`) and redirections (`>` and `>>`)
🎯 **Who is this course for?**
✅ Anyone needing efficient data manipulation tools
<!--part2-end-->
## Bash Scripting
<!--part3-start-->
**Automating Tasks in Unix**
This part of the course is dedicated to writing Bash scripts to automate repetitive and complex tasks. Topics covered include:
- Writing and executing Bash scripts (`.sh` files)
- Understanding variables, loops, and conditionals
- Handling user inputs and arguments
- Using functions to structure scripts
- Error handling and debugging
- Best practices for writing maintainable scripts
🎯 **Who is this course for?**
✅ Anyone looking to automate repetitive tasks in Unix
<!--part3-end-->
<!--rest-start-->
## Pre-course setup
The [setup](pages/course-information/setup)
......@@ -27,3 +75,5 @@ course, so make sure you've gone through them all before the course begins.
## Contact
To contact me, please send a mail at {{config.extra.contact}}.
<!--rest-end-->
\ No newline at end of file
# Unix Basics
![](../images/1-intro/DEC_VT100_terminal_transparent.png){: style="height:250px;width:250px;float:left"}
As most of users, our daily usage of a computer is done through a Graphical User Interface (GUI) in which we do some typing of text in combination with the clicking of drop-down menus and option buttons. Before appearance of GUI everything was made via the Command Line Interface (CLI). The simplification given by the GUI has been at the cost of reducing flexibility and control available in CLI.
By understanding the underlying concepts of the Unix command-line, you will be able to better understand how computer works. And by Learning the usage of a Unix command-line interface, you will be able to gain in productivity when it comes to handle many tasks as like file-management and jobs control. It will allow you to work remotly on High-performance computer (HPC) and open the acces of numerous bioinformatics tools that are only available in command-line.
It may seem like a waste of time and energy to learn, but I in bioinformatics Unix command-line is central and allow to lever a great potential of automation, tools, data/file processing, etc.
{%
include-markdown "../../index.md"
start="<!--part1-start-->"
end="<!--part1-end-->"
%}
{%
include-markdown "../../index.md"
start="<!--rest-start-->"
end="<!--rest-end-->"
%}
\ No newline at end of file
# Commands
Bash 4.3 comes with 58 embeded commands:
Bash 4.3 comes with 58 embeded commands (built-in):
```
bash defines the following built-in commands: :, ., [, alias, bg, bind, break, builtin,
......@@ -12,9 +12,13 @@ Bash 4.3 comes with 58 embeded commands:
```
They are built-in for performance reasons.
There are many more commands available on your machine. Unix machines have several hundreds of different commands.
There are many more commands available on your machine (e.g. `ls`). Unix machines have several hundreds of different commands.
A good place to look at them is ([http://ss64.com/mac](http://ss64.com/mac) or [http://ss64.com/bash](http://ss64.com/bash).
!!! note
To check if a command is a `built-in` command run `type command`.
Try `type cd` and `type ls`
**No worries! As most of people we only need to use a very small subset of those commands.**
......@@ -34,6 +38,17 @@ It is possible to save the output of a command in a variable:
var=$(command)
```
## Getting help
There are several ways to get help for a command in Bash:
| Command | Works For | Example | Notes |
|-------------------------|---------------------------------|----------------|-------|
| `man <command>` | Most external commands | `man ls` | Doesn't work for shell built-ins like `cd` |
| `<command> --help` | Many external commands | `ls --help` | Some commands may not support `--help` |
| `info <command>` | Commands with info pages | `info grep` | Not all commands have an info page |
| `help <command>` | Shell built-ins | `help cd` | Works only for built-in commands like `cd`, `exit` |
## String Commands Together
It is possible to execute several commands in one line using `;` in between commands:
......
# Unix Data/File Manipulation
![](images/gff.png){: style="width:450px;float:left;margin-right: 10px;"}
Unlock the power of your data with the magic of Bash and Linux! Learning how to manipulate files and process data directly from the command line is an essential skill for anyone dealing with large datasets, automation, or efficient workflows. Imagine being able to extract, filter, and transform information in seconds—without ever opening a spreadsheet or writing complex scripts in high-level programming languages.
Mastering tools like grep, awk, sed, and cut allows you to navigate through massive files, clean up messy data, and reshape information with just a few keystrokes. Whether you’re a researcher handling genomic sequences, a data analyst working with logs, or a developer automating tasks, knowing how to harness the power of text processing in Linux will save you time and effort.
There’s something incredibly satisfying about chaining simple commands together to solve complex problems effortlessly. Once you start discovering the possibilities, you’ll never look at data the same way again!
{%
include-markdown "../../index.md"
start="<!--part2-start-->"
end="<!--part2-end-->"
%}
{%
include-markdown "../../index.md"
start="<!--rest-start-->"
end="<!--rest-end-->"
%}
\ No newline at end of file
In order to perform data/file analysis and manipulation you must known how commands work the usefull commands for these tasks.
{%
include-markdown "../bash/bash-7-commands.md"
%}
## Useful commands
For efficiently analyzing and manipulating data and files directly from the command line, there is a bunch of essential commands:
| Command | Explanation |
|---------|-------------|
| **`grep`** | Search for patterns within files |
| **`awk`** | A powerful programming language for pattern scanning and processing |
| **`sed`** | Stream editor for filtering and transforming text |
| `cut` | Remove sections from each line of files |
| `sort` | Sort lines of text files |
| `uniq` | Report or omit repeated lines |
| `head` | Display the beginning of a file |
| `tail` | Display the end of a file |
| `tr` | Translate or delete characters |
| `wc` | Print newline, word, and byte counts for each file |
| `find` | Search for files in a directory hierarchy |
| `diff` | Compare files line by line |
| `comm` | Compare two sorted files line by line |
| `tee` | Read from standard input and write to standard output and files |
# Extracting from files
To do this exercice you will need to download Frech First name data from "Institut national de la statistique
et des études économiques"
To do this exercice you will use data of first names given to children born in France since 1900 downloaded from "Institut national de la statistique et des études économiques" (see [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628) for details).
```bash
wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip
......@@ -10,6 +9,18 @@ unzip nat2021_csv.zip
You should now have a file called `nat2021.csv` in your working directory.
The data contained in this file have this shape:
```
sexe;preusuel;annais;nombre
2;SANDRINE;1973;17605
1;JEAN;1960;17607
1;_PRENOMS_RARES;1904;1430
```
The first line is the header where `preusuel` means `prenom usuel` and `annais` means `année naissance`.
The subsequent lines are the data.
`_PRENOMS_RARES` are rare first names. They are classified as rare following criteria described [here](https://www.insee.fr/fr/statistiques/8205621?sommaire=8205628#documentation).
## Displaying sample (head, tail)
When you have a huge dataset, it can be interesting to only display the beginning or the end of the file, to have an idea of how the file is structured.
......@@ -38,61 +49,21 @@ Using commands `head` and `tail` allows to do this tasks.
wc -c nat2021.csv
```
!!! question "Count the number of line of `nat2021.csv` file"
??? example "Click to show the solution"
```bash
wc -l nat2021.csv
```
## Searching patterns (grep)
!!! question "Select all line related of the year 2001 in `nat2021.csv` file"
??? example "Click to show the solution"
```bash
grep ";2021;" nat2021.csv
```
!!! question "How many names have been provided in 2021?"
??? example "Click to show the solution"
```bash
grep ";2021;" nat2021.csv | wc -l
# result: 13501
```
!!! question "Is there more diversity in male or female names in 2021"?
??? example "Click to show the solution"
```bash
# female
grep ";2021;" nat2021.csv | grep "^2" | wc -l
# result: 7112
# male
grep ";2021;" nat2021.csv | grep "^1" | wc -l
# result: 6389
```
!!! question "How many person are called PARIS in 2021"?
!!! question "Count the number of word in `nat2021.csv` file"
??? example "Click to show the solution"
```bash
# female
grep "PARIS;2021;" nat2021.csv
# result 16 (5 male and 11 female)
wc -w nat2021.csv
```
The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques/2540004?sommaire=4767262#documentation)) are set as `_PRENOMS_RARES`.
!!! question "Could you find all rare name ? Do you see any pattern?"
!!! question "Count the number of line of `nat2021.csv` file"
??? example "Click to show the solution"
```bash
grep ";_PRENOMS_RARES;" nat2021.csv
wc -l nat2021.csv
```
People tends to provide more and more rare names.
!!! question "Could you explain the similarity of the result between word and line count?"
## Sorting a tabular file (sort)
......@@ -117,22 +88,21 @@ sort -n nat2021.csv | head
!!! question "Do you observe any difference?"
!!! question "What year and what name has been the most provided among the records?"
!!! question "What name has been the most provided in a single year among the records? What year was that?"
??? example "Click to show the solution"
```bash
# command
sort -n -t ';' -k4 nat2021.csv
# result: JEAN
# result: JEAN in 1946 (53547 times)
```
!!! question "What year was the most prolific fot the name ZINEDINE?"
!!! question "Can refine the previous command to provide the top 100 of names per year the most provided?"
??? example "Click to show the solution"
```bash
# command
grep ";ZINEDINE;" nat2021.csv | sort -n -t ';' -k4
# result: 1998
sort -n -t ';' -k4 nat2021.csv | tail -n 100
```
## Extracting columns (cut)
......@@ -142,13 +112,14 @@ The `cut` command allows to cut a line at a specific character and extract a sel
* `-d` specify the separator
* `-f` specify the field to extract
!!! question "How to extract the name of the top 100 names/year the most provided"
!!! question "How to extract only the name of the top 100 names/year the most provided"
??? example "Click to show the solution"
```bash
# command
sort -n -t ';' -k4 nat2021.csv | tail -n 100 | cut -d";" -f 2
```
## Remove redundancy (uniq)
The `uniq` command can be used to remove the redundancy. But result need to be sorted to make it work properly/
......@@ -160,11 +131,8 @@ The `uniq` command can be used to remove the redundancy. But result need to be s
sort -n -t ';' -k4 nat2021.csv | tail -n 100 | cut -d";" -f 2 | sort | uniq
```
!!! question "How many time the name JEAN has been provided in total?"
??? example "Click to show the solution"
It start to be too complicated for the command you have seen so far, you need to use a command specific to column data `awk`
!!! warning
You should realise that `uniq` needs sorted data to work appropriately.
## Redirecting an output (>)
......@@ -179,16 +147,13 @@ You can redirect a result and store it in a file thanks to the `>` redirection:
grep ";2025;" nat2021.csv > names2005.txt
```
## Final question
## Filtering a file (awk)
## Replacing patterns (sed)
!!! question "How many time the name JEAN has been provided in total?"
??? example "Click to show the solution"
It start to be too complicated for the command you have seen so far, you need to use a command specific to column data `awk`
## Combining commands (| && ;)
......
# Extracting from files
To do this exercice you will need to download French First name data from "Institut national de la statistique
et des études économiques"
```bash
wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip
unzip nat2021_csv.zip
```
You should now have a file called `nat2021.csv` in your working directory.
## Searching patterns (grep)
!!! question "Select all line related of the year 2001 in `nat2021.csv` file"
??? example "Click to show the solution"
```bash
grep ";2021;" nat2021.csv
```
!!! question "How many names have been provided in 2021?"
??? example "Click to show the solution"
```bash
grep ";2021;" nat2021.csv | wc -l
# result: 13501
```
!!! question "Is there more diversity in male or female names in 2021"?
??? example "Click to show the solution"
```bash
# female
grep ";2021;" nat2021.csv | grep "^2" | wc -l
# result: 7112
# male
grep ";2021;" nat2021.csv | grep "^1" | wc -l
# result: 6389
```
!!! question "How many person are called PARIS in 2021"?
??? example "Click to show the solution"
```bash
# female
grep "PARIS;2021;" nat2021.csv
# result 16 (5 male and 11 female)
```
The rare name ([see here for documentation](https://www.insee.fr/fr/statistiques/2540004?sommaire=4767262#documentation)) are set as `_PRENOMS_RARES`.
!!! question "Could you find all rare name ? Do you see any pattern?"
??? example "Click to show the solution"
```bash
grep ";_PRENOMS_RARES;" nat2021.csv
```
People tends to provide more and more rare names.
!!! question "What year was the most prolific fot the name ZINEDINE?"
??? example "Click to show the solution"
```bash
# command
grep ";ZINEDINE;" nat2021.csv | sort -n -t ';' -k4
# result: 1998
```
## Redirecting an output (>)
You can redirect a result and store it in a file thanks to the `>` redirection:
`command > filename`
!!! question "Save all the names from 2005 in a dedicated file?"
??? example "Click to show the solution"
```bash
# command
grep ";2025;" nat2021.csv > names2005.txt
```
# Extracting from files
To do this exercice you will need to download French First name data from "Institut national de la statistique
et des études économiques"
```bash
wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip
unzip nat2021_csv.zip
```
You should now have a file called `nat2021.csv` in your working directory.
## Filtering a file (awk)
## Replacing patterns (sed)
# Extracting from files
To do this exercice you will need to download French First name data from "Institut national de la statistique
et des études économiques"
```bash
wget https://www.insee.fr/fr/statistiques/fichier/2540004/nat2021_csv.zip
unzip nat2021_csv.zip
```
You should now have a file called `nat2021.csv` in your working directory.
## Filtering a file (awk)
## Replacing patterns (sed)
docs/pages/bash_manip/images/gff.png

694 KiB

# Bash scripting
![](images/bash_scripting.png){: style="width:300px;float:left;margin-right: 10px;"}
Mastering Bash scripting is like unlocking a hidden superpower in the world of computing. It transforms repetitive tasks into automated workflows, giving you more time to focus on what truly matters. Whether you’re managing files, processing data, or controlling system operations, a well-crafted Bash script can accomplish in seconds what would take minutes—or even hours—by hand.
With Bash, you gain control over your environment, harnessing the full potential of Linux with just a few lines of code. It’s not just about efficiency; it’s about elegance—writing commands that work seamlessly together to streamline complex operations. As you dive into scripting, you’ll develop a deeper understanding of how systems function, making you a more capable and confident user.
Imagine setting up automated backups, processing large datasets, or creating custom command-line tools tailored to your exact needs. The possibilities are endless, and the skills you gain will serve you across any technical field. Learning Bash scripting isn’t just useful—it’s empowering, turning the terminal from a black box into a canvas for your creativity.
{%
include-markdown "../../index.md"
start="<!--part3-start-->"
end="<!--part3-end-->"
%}
{%
include-markdown "../../index.md"
start="<!--rest-start-->"
end="<!--rest-end-->"
%}
\ No newline at end of file
# Scripting
In this section, we will learn how to write scripts in Bash, allowing us to automate tasks and create powerful command-line programs.
## What is a script?
* A Bash script is a text file containing a sequence of commands.
......
docs/pages/bash_script/images/bash_scripting.png

120 KiB

......@@ -91,9 +91,11 @@ extra:
# page tree
nav:
- Unix basics:
- Home:
- Introduction: index.md
- Setup: pages/course-information/setup.md
- Unix basics:
- Course overview: pages/bash/bash-0-overview.md
- Introduction: pages/bash/bash-1-introduction.md
- The basics: pages/bash/bash-2-the-basics.md
- Navigating files and directories: pages/bash/bash-3-navigating.md
......@@ -101,9 +103,16 @@ nav:
- Special_characters: pages/bash/bash-5-special_characters.md
- PATH: pages/bash/bash-6-path.md
- Commands: pages/bash/bash-7-commands.md
- Unix file manipulation:
- File manipulation: pages/bash_file/bash-extracting_from_files.md
- Unix data/file manipulation:
- Course overview: pages/bash_manip/bash_manip-0-overview.md
- Introduction: pages/bash_manip/bash_manip-1-introduction.md
- Basic commands: pages/bash_manip/bash_manip-2-basics.md
- RegEx: pages/bash_manip/bash_manip-3-grep.md
- Grep: pages/bash_manip/bash_manip-3-grep.md
- Awk: pages/bash_manip/bash_manip-4-awk.md
- Sed: pages/bash_manip/bash_manip-5-sed.md
- Bash scripting:
- Course overview: pages/bash_script/bash_script-0-overview.md
- Introduction: pages/bash_script/bash_script-1-intro.md
- Data structures: pages/bash_script/bash_script-2-data_structure.md
- Conditional structures: pages/bash_script/bash_script-3-conditional.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment