A pure Swift implementation of a Regular Expression Engine
Trying again with V2 using DFAs instead of NFAs to get grep-like performance
To avoid compiling overhead it is possible to create a Regex
instance
// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")
let string = "RegEx is tough, but useful."
// Search for matches
let words = regex.match(string)
/*
words = [
RegexMatch(match: "RegEx", groups: []),
RegexMatch(match: "is", groups: []),
RegexMatch(match: "tough", groups: []),
RegexMatch(match: "but", groups: []),
RegexMatch(match: "useful", groups: []),
]
*/
If compiling overhead is not an issue it is possible to use the =~
operator to match a string
let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []
/*
fourLetterWords = [
RegexMatch(match: "beer", groups: []),
RegexMatch(match: "very", groups: []),
RegexMatch(match: "nice", groups: []),
]
*/
By default the Global
flag is active. To change which flag are active, add a /
at the start of the pattern, and add /<flags>
at the end. The available flags are:
g
Global
- Allows multiple matchesi
Case Insensitive
- Case insensitive matchingm
Multiline
-^
and$
also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")
Pattern | Description | Supported |
---|---|---|
. |
[^\n\r] |
|
[^] |
[\s\S] |
|
\w |
[A-Za-z0-9_] |
|
\W |
[^A-Za-z0-9_] |
|
\d |
[0-9] |
|
\D |
[^0-9] |
|
\s |
[\ \r\n\t\v\f] |
|
\S |
[^\ \r\n\t\v\f] |
|
[ABC] |
Any in the set | |
[^ABC] |
Any not in the set | |
[A-Z] |
Any in the range inclusively |
Pattern | Description | Supported |
---|---|---|
^ |
Beginning of string | |
$ |
End of string | |
\b |
Word boundary | |
\B |
Not word boundary |
Pattern | Description | Supported |
---|---|---|
\0 |
Octal escaped character | |
\00 |
Octal escaped character | |
\000 |
Octal escaped character | |
\xFF |
Hex escaped character | |
\uFFFF |
Unicode escaped character | |
\cA |
Control character | |
\t |
Tab | |
\n |
Newline | |
\v |
Vertical tab | |
\f |
Form feed | |
\r |
Carriage return | |
\0 |
Null | |
\. |
. |
|
\\ |
\ |
|
\+ |
+ |
|
\* |
* |
|
\? |
? |
|
\^ |
^ |
|
\$ |
$ |
|
\{ |
{ |
|
\} |
} |
|
\[ |
[ |
|
\] |
] |
|
\( |
( |
|
\) |
) |
|
\/ |
/ |
|
| |
` | ` |
Pattern | Description | Supported |
---|---|---|
(ABC) |
Capture group | |
(<name>ABC) |
Named capture group | |
\1 |
Back reference | |
\'name' |
Named back reference | |
(?:ABC) |
Non-capturing group | |
(?=ABC) |
Positive lookahead | |
(?!ABC) |
Negative lookahead | |
(?<=ABC) |
Positive lookbehind | |
(?<!ABC) |
Negative lookbehing |
Pattern | Description | Supported |
---|---|---|
+ |
One or more | |
* |
Zero or more | |
? |
Optional | |
{n} |
n | |
{,} |
Same as * |
|
{,n} |
n or less | |
{n,} |
n or more | |
{n,m} |
n to m |
Pattern | Description | Supported |
---|---|---|
+? |
One or more | |
*? |
Zero or more | |
?? |
Optional | |
{n}? |
n | |
{,n}? |
n or less | |
{n,}? |
n or more | |
{n,m}? |
n to m |
Pattern | Description | Supported |
---|---|---|
| |
Everything before or everything after |
Pattern | Description | Supported |
---|---|---|
i |
Case insensitive | |
g |
Global | |
m |
Multiline |
(Similar to before)
- Lexer (String input to Tokens)
- Parser (Tokens to NFA)
- Compiler (NFA to DFA)
- Optimizer (Simplify DFA (eg.
char(a), char(b)
->string(ab)
) for better performance) - Engine (Matches an input String using the DFA)
Swift treats \r\n
as a single Character
. Use \n\r
to have both.
- regexr.com - Regex testing
- swtch.com - Implementing Regular Expressions
- Powerset construction - NFA to DFA
- Minimization