Skip to content

StringFormat Explained

Ben Yu edited this page Dec 2, 2023 · 33 revisions

Parsing

This class is not a "Swiss Army Knife" library like regex. But it's a lot simpler to use and produces more readable code for simple yet mundane string parsing and manipulation tasks.

Without explanation, see if you can intuitively guess what the following code does?

Optional<LogFile> log =
    new StringFormat("/home/{usr}/log/{year}/{month}/{day}/job-{shard_id}.log")
        .parse(
            logFileName,
            (usr, year, month, day, shardId) ->
                LogFile.builder()
                   .setUser(usr)
                   .setDate(parseInt(year), parseInt(month), parseInt(day))
                   .setShard(shardId)
                   .build());

Yeah just trust your intuitition, it does exactly what it looks like doing!

(Starting from v7.0, there is a convenient parseOrThrow() method that throws if the input can't be parsed, with reasonably informative error message.)

Sometimes you may be searching for sub-patterns from the input string and the sub-pattern may occur 0, 1 or multiple times. You can use the scan() method for these use cases. For example, if there are multiple breakpoint specs from the input string:

List<Breakpoint> breakpoints =
    new StringFormat("breakpoint: {line={line}, color={color}}")
        .scan(inputString, Breakpoint::new)
        .collect(toList());

Both the parse() and scan() methods have overloads that support from 1 to 6 placeholders.

You can also post-filter to ignore matches that don't satisfy a post-condition. For example, if you want to ignore invalid breakpoint specs, just return null for the invalid matches:

List<Breakpoint> breakpoints =
    new StringFormat("breakpoint: {line={line}, color={color}}")
        .scan(
            inputString,
            (line, color) ->
                isNumeric(line) && isValidColor(color) ? new Breakpoint(line, color) : null)
        .collect(toList());

Compile-time Safety

If you use bazel as your build tool, compile-time check is provided out of box.

If you use Maven, we strongly recommend adding both ErrorProne and the mug-errorprone plugin to your annotationProcessor paths. For example:

  <build>
    <pluginManagement>
      <plugins>
        <plugin>
          <artifactId>maven-compiler-plugin</artifactId>
          <configuration>
            <annotationProcessorPaths>
              <path>
                <groupId>com.google.errorprone</groupId>
                <artifactId>error_prone_core</artifactId>
                <version>2.23.0</version>
              </path>
              <path>
                <groupId>com.google.mug</groupId>
                <artifactId>mug-errorprone</artifactId>
                <version>7.0</version>
              </path>
            </annotationProcessorPaths>
          </configuration>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>

This plugin checks against common programming errors including:

  • The number of lambda parameters doesn't match the number of format placeholders
  • The names of the lambda parameters don't match the placeholders

With the compile-time checks, you can safely define StringFormat as private class constants and reference them many lines away.

Formatting

The compile-time checks make the StringFormat.format() method a safer alternative to String.format() (it's faster too). For example:

private static final StringFormat JOB_ID_FORMAT = "{project_id}@{location}:{job}";

// 200 lines later
  .setJobId(JOB_ID_FORMAT.format(projectId, location, job));

Compared to String.format(), the benefits are:

  • The format string is more human readable with the placeholder names.
  • The StringFormat can be defined as class constant and safely reused across the file, because the compile-time check ensures that the format arguments match the placeholder names. You can't pass the wrong number or pass them in the wrong order!

Combining the parsing and formatting capability, you can round-trip between pojo and string formats.

Templating

Suppose you have a SQL library that supports injection-safe parameterization. It may look like this:

public class Query {
  private String sql;

  /** Don't allow users to pass in arbitrary unsafe SQL. */
  private Query(String unsafeSql) {...}

  /** Users have to use compile-time string literals. No dynamic values. */
  public static Query create(@CompileTimeConstant String sql);

  /** Dynamic parameters are provided through bind() */
  public Query bind(String paramName, Object paramValue);

  /** Append trusted sql snippet. */
  public Query append(TrustedSql snippet);

  public Result execute(Db db);
}

And the usual usage pattern is like:

Query query = Query.create("select name from Students where id = @id")
    .bind("id", idString);

The API allows to define parameters in a template and then pass parameter values by chaining bind() calls.

It still leaves a few things to be desired:

  1. Sometimes the SQL can be long and complex. It'd be nice to extract the query template as a template. But in doing so the template with the placeholder names are far away from the bind() calls. If you made a typo to the wrong placeholder names, or passed fewer parameters, you get a runtime error.

  2. Occassionally it's desirable to also parameterize by table names, column names or even sub-queries.

For the bullet point #2, the usual workaround is to use the append() method (similar to StringBuilder):

private static final TrustedSql STUDENTS_TABLE = TrustedSql.fromFlag(studentsTableFlag);

Query getStudentName = Query.create("select name from ")
    .append(STUDENTS_TABLE)
    .append(" where id = @id")
    .bind("id", studentId);

But the SQL gets fragmented and becomes harder to read.

Let's see if we can use StringFormat to help address these issues. We'll use the StringFormat.template() SPI to provide the same template syntax used by StringFormat, but plug in our custom rules.

public class Query {
  ...

  public static StringFormat.To<Query> template(@CompileTimeConstant sqlTemplate) {
    return StringFormat.template(
        sqlTemplate,
        // For template("select * from {tbl} where id = {id};").with(tableName, id)
        //     fragments = ["select * from ", " wehere id = ", ";"],
        //     placeholders = [`{tbl}`: tableName, `{id}`: id]
        (List<String> fragments, BiStream<Substring.Match, Object> placeholders) -> {
          Iterator<String> it = fragments.iterator();
          BiStream.Builder<String, Object> parameters = BiStream.builder();
          String sqlString = placeholders.collect(new StringBuilder(), (builder, placeholder, value) -> {
              if (value instanceof TrustedSql) {  // trusted, just add to the sql string
                builder.append(value);
              } else {
                // translate "{id}" to "@id".
                String paramName = "@" + placeholde.skip(1, 1);
                builder.append(paramName);
                parameters.add(paramName, value);
              }
            })
            .append(it.next());  // append the last ";"
          // Create the Query and bind all parameter values
          return parameters.build().collect(new Query(sqlString), Query::bind);
        });
  }
}

We can use this method to parameterize by both table names and values:

private static final StringFormat.To<Query> GET_NAME_BY_ID =
    Query.template("select name from {table} where id = {id}");
private static final TrustedSql STUDENTS_TABLE = TrustedSql.fromFlag(studentsTableFlag);
private static final TrustedSql TEACHERS_TABLE = TrustedSql.fromFlag(teachersTableFlag);

// 200 lines later
Query getStudentName = GET_NAME_BY_ID.with(STUDENTS_TABLE, studentId);
Query getTeacherName = GET_NAME_BY_ID.with(TEACHERS_TABLE, teacherId);

What we have accomplished:

  • Retain the safety provided by the original Query API.
  • Parameterize by table name (or any other parts of the query) without compromising sql readability.
  • Define the query templates as class constants with StringFormat's compile-time safety to ensure parameter correctness.
  • Light-weight syntax without having to chain the bind() calls.