Skip to content

Regular Expressions

Francesco edited this page Nov 1, 2017 · 8 revisions

What is a regular expression

A regular expression (or "regex") is a sequence of characters that is used to match a longer list of characters.

For example, this is a regular expression:
[df]og
which matches the words dog and fog.

You can use different combination of symbols in order to match different sets of characters.

What do you mean for "match"?

"match" means "there is/are one or more matches in the text".

For example, let's take this text:
"There's a dog running in the fog."

regex 1:
dog
matches: 1
This regex matches the text.

regex 2
[df]og
matches: 2
This regex matches the text.

regex 3
rain
matches: 0
This regex doesn't match the text.

Square brackets []

Square brackets are used whenever you want to establish an "OR" relation between what's inside them.

For example
[ab]
matches "a" and "b".

This regex:
file[01234]
will match

  • file0
  • file1
  • file2
  • file3
  • file4

You can also exclude something from your regex with the ^ operator.

For instance, this regex:

[^\s]*

means: everything except spaces.

Quantifiers

A quantifier is a symbol that tells how much times you want to match a specific character.

Symbol Times Example Example match
? 0-1 times hello? "hell" or "hello"
+ >= 1 times hello+ "hello" or "helloo" or "hellooo" or ...
* >= 0 times hello* "hell" or "hello" or "helloo" or ...
{n} exactly n times hel{2}o "hello"
{n,m} n to m times hel{0,3}o "heo" or "helo" or "hello" or "helllo"

Dot (.)

Dot means any character!

For example, this regex:
.{4} will match any word long exactly 4 characters.

Matching spaces

It's recommended to use the symbol \s instead of a plain space or a newline character to match blank spaces. This can help avoiding unexpected behaviors that may occur sometimes.

Matching digits

In order to match a digit you can use the symbol \d instead of [0123456789].

Capturing groups

A capturing group is a way to keep track of some information that you want to save and use later. This explanation may not tell you exactly how powerful they are, but trust me, they're powerful.

Let's start with an example:

([^\s]*)

This regex creates a new group containing the first word of the passed string.

i.e. if you pass the string hello world, group(1) will be hello.

We can extend this concept multiple times.

([^\s]*)\s---\s(.*)

This regex means:

  1. match everything until the first space and put it inside group(1)
  2. match a single space
  3. match 3 dashes
  4. match a single space
  5. match everything from this point until the end and put it inside group(2)

So, if we apply this regex on the string hi --- how are you?, group(1) will be hi and group(2) will be how are you?.

Great sites

Clone this wiki locally