# Regular Expression Regular expressions (regex) are sequences of characters that define a search pattern. They are used for pattern matching within **strings**. It is a powerful tools for text processing and can be used in various command-line utilities like `grep`, `sed`, and `awk` to search, match, and manipulate text. ## Regular Expression Summary | Symbol | Description | Example | Matches | |--------|-------------|---------|---------| | `.` | Any single character except newline | `a.b` | `aab`, `acb`, `a1b` | | `^` | Start of a line | `^abc` | `abc` at the start of a line | | `$` | End of a line | `abc$` | `abc` at the end of a line | | `*` | Zero or more of the preceding element | `ab*c` | `ac`, `abc`, `abbc` | | `+` | One or more of the preceding element | `ab+c` | `abc`, `abbc` | | `?` | Zero or one of the preceding element | `ab?c` | `ac`, `abc` | | `{n}` | Exactly n of the preceding element | `a{3}` | `aaa` | | `{n,}` | n or more of the preceding element | `a{2,}` | `aa`, `aaa`, `aaaa` | | `{n,m}`| Between n and m of the preceding element | `a{2,3}` | `aa`, `aaa` | | `[]` | Any one of the characters within the brackets | `[abc]` | `a`, `b`, `c` | | `[^]` | Any one character not within the brackets | `[^abc]` | Any character except `a`, `b`, `c` | | `|` | Alternation (OR) | `a|b` | `a`, `b` | | `()` | Grouping | `(abc)` | `abc` | | `\d` | Any digit (0-9) | `\d` | `0`, `1`, `2`, ..., `9` | | `\D` | Any non-digit | `\D` | Any character except `0-9` | | `\w` | Any word character (alphanumeric + underscore) | `\w` | `a`, `b`, `1`, `_` | | `\W` | Any non-word character | `\W` | Any character except `a-z`, `A-Z`, `0-9`, `_` | | `\s` | Any whitespace character | `\s` | Space, tab, newline | | `\S` | Any non-whitespace character | `\S` | Any character except space, tab, newline | It is possible to use POSIX character classes: | Symbol | Description | |--------|-------------| | [:alnum:] | equivqlent to A-Za-z0-9 | | [:alpha:] | equivalent to A-Za-z | | [:blank:] | equivalent to space or tab | | [:digit:] | equivalent to 0-9 | !!! Warning Do not confound with **Globbing** (Pathname expansion) used to match filename! `?` Any single character `*` Zero or more characters `[]` Specify a range. Any character of the range or none of them using `!` inside the bracket. `{term1,term2}` Specify a list of terms separated by commas and each term must be a name or a wildcard. `{term1..term2}` Called brace expansion, this syntax expands all the terms between term1 and term2 (Letters or Integers). ## Example ``` ??????@ start @?????? I love having cake on Sundays. Macarons are great, but Mille-feuille is on another level! What are you up to next Sunday? Feel free to reach out by email at me@example.com. Otherwise give me a call at 123-456-789. Cheers! ??????@ end @?????? ``` !!! question "Find lines with a question" Right it is line ending with `?`, but how to avoid the first and last lane? ??? example "Click to show the solution" ```bash grep -E "[A-Za-z ]+\?$" text.txt grep -E "[[:alnum:] ]+\?$" text.txt grep -E "[^?]+\?$" text.tx ``` !!! question "Find lines with email address" ??? example "Click to show the solution" ```bash grep -E "[a-zA-Z0-9]+@[a-zA-Z0-9]+\.com" text.txt grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" text.txt # more generalized ``` !!! question "Find the phone number ensuring the format XXX-XXX-XXX" ??? example "Click to show the solution" ```bash grep -E '[0-9]{3}-[0-9]{3}-[0-9]{3}' text.txt grep -E '\d{3}-\d{3}-\d{3}' text.txt ```