Skip to content

StringFormat Explained

Ben Yu edited this page Nov 27, 2023 · 33 revisions

Parsing

This class is not a "Swiss Army Knife" library like regex. But it's a lot simpler to use and produces more readable code for simple yet mundane string parsing and manipulation tasks.

Without explanation, see if you can intuitively guess what the following code does?

Optional<LogFile> log =
    new StringFormat("/home/{usr}/log/{year}/{month}/{day}/job-{shard_id}.log")
        .parse(
            logFileName,
            (usr, year, month, day, shardId) ->
                LogFile.builder()
                   .setUser(usr)
                   .setDate(parseInt(year), parseInt(month), parseInt(day))
                   .setShard(shardId)
                   .build());

Yeah just trust your intuitition, it does exactly what it looks like doing!

(Starting from v6.7, there is a convenient parseOrThrow() method that throws if the input can't be parsed, with reasonably informative error message.)

Sometimes you may be searching for sub-patterns from the input string and the sub-pattern may occur 0, 1 or multiple times. You can use the scan() method for these use cases. For example, if there are multiple breakpoint specs from the input string:

List<Breakpoint> breakpoints =
    new StringFormat("breakpoint: {line={line}, color={color}}")
        .scan(inputString, Breakpoint::new)
        .collect(toList());

Both the parse() and scan() methods have overloads that support from 1 to 6 placeholders.

You can also post-filter to ignore matches that don't satisfy a post-condition. For example, if you want to ignore invalid breakpoint specs, just return null for the invalid matches:

List<Breakpoint> breakpoints =
    new StringFormat("breakpoint: {line={line}, color={color}}")
        .scan(
            inputString,
            (line, color) ->
                isNumeric(line) && isValidColor(color) ? new Breakpoint(line, color) : null)
        .collect(toList());

Compile-time Safety

If you use bazel as your build tool, compile-time check is provided out of box.

If you use Maven, we strongly recommend adding both ErrorProne and the mug-errorprone plugin to your annotationProcessor paths. For example:

  <build>
    <pluginManagement>
      <plugins>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <configuration>
            <annotationProcessorPaths>
              <path>
                <groupId>com.google.errorprone</groupId>
                <artifactId>error_prone_core</artifactId>
                <version>2.23.0</version>
              </path>
              <path>
                <groupId>com.google.mug</groupId>
                <artifactId>mug-errorprone</artifactId>
                <version>6.7</version>
              </path>
            </annotationProcessorPaths>
          </configuration>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>

This plugin checks against common programming errors including:

  • The number of lambda parameters doesn't match the number of format placeholders
  • The names of the lambda parameters don't match the placeholders

With the compile-time checks, you can safely define StringFormat as private class constants and reference them many lines away.

Formatting

The compile-time checks make the StringFormat.format() method a safer alternative to String.format() (it's faster too). For example:

private static final StringFormat JOB_ID_FORMAT = "{project_id}@{location}:{job}";

// 200 lines later
  .setJobId(JOB_ID_FORMAT.format(projectId, location, job));

Compared to String.format(), the benefits are:

  • The format string is more human readable with the placeholder names.
  • The StringFormat can be defined as class constant and safely reused across the file, because the compile-time check ensures that the format arguments match the placeholder names. You can't pass the wrong number or pass them in the wrong order!

Combining the parsing and formatting capability, you can round-trip between pojo and string formats.