Newer
Older
Regular expressions (regex) are sequences of characters that define a search pattern. They are used for pattern matching within **strings**.
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
It is a powerful tools for text processing and can be used in various command-line utilities like `grep`, `sed`, and `awk` to search, match, and manipulate text.
## Regular Expression Summary
| Symbol | Description | Example | Matches |
|--------|-------------|---------|---------|
| `.` | Any single character except newline | `a.b` | `aab`, `acb`, `a1b` |
| `^` | Start of a line | `^abc` | `abc` at the start of a line |
| `$` | End of a line | `abc$` | `abc` at the end of a line |
| `*` | Zero or more of the preceding element | `ab*c` | `ac`, `abc`, `abbc` |
| `+` | One or more of the preceding element | `ab+c` | `abc`, `abbc` |
| `?` | Zero or one of the preceding element | `ab?c` | `ac`, `abc` |
| `{n}` | Exactly n of the preceding element | `a{3}` | `aaa` |
| `{n,}` | n or more of the preceding element | `a{2,}` | `aa`, `aaa`, `aaaa` |
| `{n,m}`| Between n and m of the preceding element | `a{2,3}` | `aa`, `aaa` |
| `[]` | Any one of the characters within the brackets | `[abc]` | `a`, `b`, `c` |
| `[^]` | Any one character not within the brackets | `[^abc]` | Any character except `a`, `b`, `c` |
| `|` | Alternation (OR) | `a|b` | `a`, `b` |
| `()` | Grouping | `(abc)` | `abc` |
| `\d` | Any digit (0-9) | `\d` | `0`, `1`, `2`, ..., `9` |
| `\D` | Any non-digit | `\D` | Any character except `0-9` |
| `\w` | Any word character (alphanumeric + underscore) | `\w` | `a`, `b`, `1`, `_` |
| `\W` | Any non-word character | `\W` | Any character except `a-z`, `A-Z`, `0-9`, `_` |
| `\s` | Any whitespace character | `\s` | Space, tab, newline |
| `\S` | Any non-whitespace character | `\S` | Any character except space, tab, newline |
It is possible to use POSIX character classes:
| Symbol | Description |
|--------|-------------|
| [:alnum:] | equivqlent to A-Za-z0-9 |
| [:alpha:] | equivalent to A-Za-z |
| [:blank:] | equivalent to space or tab |
| [:digit:] | equivalent to 0-9 |
!!! Warning
Do not confound with **Globbing** (Pathname expansion) used to match filename!
`?` Any single character
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
`*` Zero or more characters
`[]` Specify a range. Any character of the range or none of them using `!` inside the bracket.
`{term1,term2}` Specify a list of terms separated by commas and each term must be a name or a wildcard.
`{term1..term2}` Called brace expansion, this syntax expands all the terms between term1 and term2 (Letters or Integers).
## Example
```
??????@ start @??????
I love having cake on Sundays.
Macarons are great, but Mille-feuille is on another level!
What are you up to next Sunday?
Feel free to reach out by email at me@example.com.
Otherwise give me a call at 123-456-789.
Cheers!
??????@ end @??????
```
!!! question "Find lines with a question"
Right it is line ending with `?`, but how to avoid the first and last lane?
??? example "Click to show the solution"
```bash
grep -E "[A-Za-z ]+\?$" text.txt
grep -E "[[:alnum:] ]+\?$" text.txt
grep -E "[^?]+\?$" text.tx
```
!!! question "Find lines with email address"
??? example "Click to show the solution"
```bash
grep -E "[a-zA-Z0-9]+@[a-zA-Z0-9]+\.com" text.txt
grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" text.txt # more generalized
```
!!! question "Find the phone number ensuring the format XXX-XXX-XXX"
??? example "Click to show the solution"
```bash
grep -E '[0-9]{3}-[0-9]{3}-[0-9]{3}' text.txt
grep -E '\d{3}-\d{3}-\d{3}' text.txt
```