-
Notifications
You must be signed in to change notification settings - Fork 257
Regular Expressions
A regular expression (or "regex") is a sequence of characters that is used to match a longer list of characters.
For example, this is a regular expression:
[df]og
which matches the words dog
and fog
.
You can use different combination of symbols in order to match different sets of characters.
"match" means "there is/are one or more matches in the text".
For example, let's take this text:
"There's a dog running in the fog."
regex 1:
dog
matches: 1
This regex matches the text.
regex 2
[df]og
matches: 2
This regex matches the text.
regex 3
rain
matches: 0
This regex doesn't match the text.
Square brackets are used whenever you want to establish an "OR" relation between what's inside them.
For example
[ab]
matches "a" and "b".
This regex:
file[01234]
will match
- file0
- file1
- file2
- file3
- file4
You can also exclude something from your regex with the ^
operator.
For instance, this regex:
[^\s]*
means: everything except spaces.
A quantifier is a symbol that tells how much times you want to match a specific character.
Symbol | Times | Example | Example match |
---|---|---|---|
? | 0-1 times | hello? |
"hell" or "hello" |
+ | >= 1 times | hello+ |
"hello" or "helloo" or "hellooo" or ... |
* | >= 0 times | hello* |
"hell" or "hello" or "helloo" or ... |
{n} | exactly n times | hel{2}o |
"hello" |
{n,m} | n to m times | hel{0,3}o |
"heo" or "helo" or "hello" or "helllo" |
Dot means any character!
For example, this regex:
.{4}
will match any word long exactly 4 characters.
It's recommended to use the symbol \s
instead of a plain space or a newline character to match blank spaces. This can help avoiding unexpected behaviors that may occur sometimes.
In order to match a digit you can use the symbol \d
instead of [0123456789]
.
A capturing group is a way to keep track of some information that you want to save and use later. This explanation may not tell you exactly how powerful they are, but trust me, they're powerful.
Let's start with an example:
([^\s]*)
This regex creates a new group containing the first word of the passed string.
i.e. if you pass the string hello world
, group(1) will be hello
.
We can extend this concept multiple times.
([^\s]*)\s---\s(.*)
This regex means:
- match everything until the first space and put it inside group(1)
- match a single space
- match 3 dashes
- match a single space
- match everything from this point until the end and put it inside group(2)
So, if we apply this regex on the string hi --- how are you?
, group(1) will be hi
and group(2) will be how are you?
.
- Regex101 -> my favourite regex tester
- Rubular -> another regex tester
- Regex Primer: Part 1 -> this is the tutorial which I followed when I was learning regular expressions
- Regex Primer: Part 2 -> as above, but the second part
Francesco Andreuzzi, Italy, [email protected]