Skip to content

Just Say No to Regex

Ben Yu edited this page Aug 7, 2024 · 24 revisions

String manipulation (matching, extraction, splitting, removals, replacements etc.) in Java traditionally resorts to two approaches.

For the simplest cases (such as taking the part before, after a delimiter, or between two delimiters), it takes a input.indexOf(myChar) and then a input.substring(startIndex, endIndex) call. Along the way some remember to check the index being -1 and some just feel lucky and not bother.

For anything more complex, there's regex.

But regex in Java is in a sad state:

Luckily, you don't really need regex as you may have thought!

In this page I'll try to give a few examples so hopefully you can see where I'm going.

Regex Alternative

Imagine you need to find the ChromeOS version from the device model number that looks like "Linux,CrOS,eve|x86_64,EVE D6B-A6B-C4C-F8N-P8A-A36|10863.0.0". In summary, the device model string is in the format of {OS}|{hardware}|{OS-version}.

Being a regex wizard, you may come up with the regex pattern like "^\\w+,CrOS,[^|]+\\|[^|]+\\|([0-9\\.]+)". But it's not quite easy to read is it (at least to the regex muggles)?

Let's just say no to regex. Try the following:

int version = new StringFormat("{...},CrOS,{...}|{hardward}|{version}")
    .parseOrThrow(deviceModel, (hardware, v) -> Integer.parseInt(v));
  • The {hardware}, {version} syntax are placeholders captured by the lambda.
  • {...} is a wildcard placeholder not captured by the lambda.
  • All other characters (, |) are literal.

The code is intuitive to read. And StringFormat does no backtracking.

More Examples

Need to split around a pattern?

Substring.consecutive(Character::isSpace)
    .repeatedly()
    .split(...);

Need to replace some patterns?

Substring.between("<password>", "</password>")
    .repeatedly()
    .replaceAllFrom(input, pwd -> "***");

Want string substitution?

String template = "{who} is going to {where}";
Map<String, String> substitutions = Map.of(
  "who", "Arya",
  "where", "Braavos"
);

// Matches all {placeholder} syntaxes
Substring.RepeatingPattern placeholders =
    Substring.word()
        .immediatelyBetween("{", INCLUSIVE, "}", INCLUSIVE)
        .repeatedly();

// Returns "Arya is going to Braavos"
String result = placeholders.replaceAllFrom(
    template,
    // Skip the braces to turn {who} to "who",
    // then look up the map to get "Arya".
    placeholder -> substitutions.get(placeholder.skip(1, 1).toString()));

Fiddling with Indexes? Or Not

Did we also talk about the simple cases where you may be used to using indexOf()? Fiddling with indexes can be prone to off-by-one errors and unreadable code. Instead, consider using either StringForamt like:

new StringFormat("'{quoted}'").scan(input, quoted -> quoted);

Or Substring like:

Substring.between('\'', '\'').repeatedly().from(input);

Life will be easier without regexes, my friend.