Skip to content
Snippets Groups Projects
bash_manip-4-regex.md 3.59 KiB
Newer Older
# Regular Expression

jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
Regular expressions (regex) are sequences of characters that define a search pattern. They are used for pattern matching within **strings**.  
It is a powerful tools for text processing and can be used in various command-line utilities like `grep`, `sed`, and `awk` to search, match, and manipulate text.

## Regular Expression Summary

| Symbol | Description | Example | Matches |
|--------|-------------|---------|---------|
| `.`    | Any single character except newline | `a.b` | `aab`, `acb`, `a1b` |
| `^`    | Start of a line | `^abc` | `abc` at the start of a line |
| `$`    | End of a line | `abc$` | `abc` at the end of a line |
| `*`    | Zero or more of the preceding element | `ab*c` | `ac`, `abc`, `abbc` |
| `+`    | One or more of the preceding element | `ab+c` | `abc`, `abbc` |
| `?`    | Zero or one of the preceding element | `ab?c` | `ac`, `abc` |
| `{n}`  | Exactly n of the preceding element | `a{3}` | `aaa` |
| `{n,}` | n or more of the preceding element | `a{2,}` | `aa`, `aaa`, `aaaa` |
| `{n,m}`| Between n and m of the preceding element | `a{2,3}` | `aa`, `aaa` |
| `[]`   | Any one of the characters within the brackets | `[abc]` | `a`, `b`, `c` |
| `[^]`  | Any one character not within the brackets | `[^abc]` | Any character except `a`, `b`, `c` |
| `|`    | Alternation (OR) | `a|b` | `a`, `b` |
| `()`   | Grouping | `(abc)` | `abc` |
| `\d`   | Any digit (0-9) | `\d` | `0`, `1`, `2`, ..., `9` |
| `\D`   | Any non-digit | `\D` | Any character except `0-9` |
| `\w`   | Any word character (alphanumeric + underscore) | `\w` | `a`, `b`, `1`, `_` |
| `\W`   | Any non-word character | `\W` | Any character except `a-z`, `A-Z`, `0-9`, `_` |
| `\s`   | Any whitespace character | `\s` | Space, tab, newline |
| `\S`   | Any non-whitespace character | `\S` | Any character except space, tab, newline |

It is possible to use POSIX character classes:  

| Symbol | Description |
|--------|-------------|
| [:alnum:] | equivqlent to A-Za-z0-9 |
| [:alpha:] | equivalent to A-Za-z |
| [:blank:] | equivalent to space or tab |
| [:digit:] | equivalent to 0-9 |


!!! Warning
    Do not confound with **Globbing** (Pathname expansion) used to match filename!
    `?`  Any single character  
jacques.dainat_ird.fr's avatar
jacques.dainat_ird.fr committed
    `*`  Zero or more characters  
    `[]` Specify a range. Any character of the range or none of them using `!` inside the bracket.  
    `{term1,term2}`  Specify a list of terms separated by commas and each term must be a name or a wildcard.  
    `{term1..term2}` Called brace expansion, this syntax expands all the terms between term1 and term2 (Letters or Integers).  

## Example

```
??????@ start @??????
I love having cake on Sundays.
Macarons are great, but Mille-feuille is on another level!
What are you up to next Sunday?
Feel free to reach out by email at me@example.com.
Otherwise give me a call at 123-456-789.
Cheers!
??????@ end @??????
```

!!! question "Find lines with a question"

Right it is line ending with `?`, but how to avoid the first and last lane? 

??? example "Click to show the solution"  

    ```bash
    grep -E "[A-Za-z ]+\?$" text.txt
    grep -E "[[:alnum:] ]+\?$" text.txt
    grep -E "[^?]+\?$" text.tx
    ```

!!! question "Find lines with email address"

??? example "Click to show the solution"  

    ```bash
    grep -E "[a-zA-Z0-9]+@[a-zA-Z0-9]+\.com" text.txt
    grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" text.txt # more generalized
    ```

!!! question "Find the phone number ensuring the format XXX-XXX-XXX"

??? example "Click to show the solution"  

    ```bash
    grep -E '[0-9]{3}-[0-9]{3}-[0-9]{3}' text.txt
    grep -E '\d{3}-\d{3}-\d{3}' text.txt
    ```