Skip to content

Commit

Permalink
[SPARK-48678][CORE] Performance optimizations for SparkConf.get(Confi…
Browse files Browse the repository at this point in the history
…gEntry)

### What changes were proposed in this pull request?

This PR proposes two micro-optimizations for `SparkConf.get(ConfigEntry)`:

1. Avoid costly `Regex.replaceAllIn` for variable substitution: if the config value does not contain the substring `${` then it cannot possibly contain any variables, so we can completely skip the regex evaluation in such cases.
2. Avoid Scala collections operations, including `List.flatten` and `AbstractIterable.mkString`, for the common case where a configuration does not define a prepended configuration key.

### Why are the changes needed?

Improve performance.

This is primarily motivated by unit testing and benchmarking scenarios but it will also slightly benefit production queries.

Spark tries to avoid excessive configuration reading in hot paths (e.g. via changes like #46979). If we do accidentally introduce such read paths, though, then this PR's optimizations will help to greatly reduce the associated perf. penalty.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Correctness should be covered by existing unit tests.

To measure performance, I did some manual benchmarking by running

```
val conf = new SparkConf()
conf.set("spark.network.crypto.enabled", "true")
```

followed by

```
conf.get(NETWORK_CRYPTO_ENABLED)
```

10 million times in a loop.

On my laptop, the optimized code is ~7.5x higher throughput than the original.

We can also compare the before-and-after flamegraphs from a `while(true)` configuration reading loop, showing a clear difference in hotspots before and after this change:

**Before**:

![image](https://github.com/apache/spark/assets/50748/a60cec03-2400-46a5-95f5-f33b88a4872a)

**After**:

![image](https://github.com/apache/spark/assets/50748/10a45575-148b-4f5f-a431-b8036fe59866)

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Github Copilot

Closes #47049 from JoshRosen/SPARK-48678-sparkconf-perf-optimizations.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
JoshRosen authored and HyukjinKwon committed Jun 23, 2024
1 parent 84d278c commit 4b37eb8
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,14 @@ private[spark] abstract class ConfigEntry[T] (
def defaultValueString: String

protected def readString(reader: ConfigReader): Option[String] = {
val values = Seq(
prependedKey.flatMap(reader.get(_)),
alternatives.foldLeft(reader.get(key))((res, nextKey) => res.orElse(reader.get(nextKey)))
).flatten
if (values.nonEmpty) {
Some(values.mkString(prependSeparator))
} else {
None
// SPARK-48678: performance optimization: this code could be expressed more succinctly
// using flatten and mkString, but doing so adds lots of Scala collections perf. overhead.
val maybePrependedValue: Option[String] = prependedKey.flatMap(reader.get)
val maybeValue: Option[String] = alternatives
.foldLeft(reader.get(key))((res, nextKey) => res.orElse(reader.get(nextKey)))
(maybePrependedValue, maybeValue) match {
case (Some(prependedValue), Some(value)) => Some(s"$prependedValue$prependSeparator$value")
case _ => maybeValue.orElse(maybePrependedValue)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,9 @@ private[spark] class ConfigReader(conf: ConfigProvider) {
def substitute(input: String): String = substitute(input, Set())

private def substitute(input: String, usedRefs: Set[String]): String = {
if (input != null) {
// SPARK-48678: performance optimization: skip the costly regex processing
// if the string cannot possibly contain a variable reference:
if (input != null && input.contains("${")) {
ConfigReader.REF_RE.replaceAllIn(input, { m =>
val prefix = m.group(1)
val name = m.group(2)
Expand Down

0 comments on commit 4b37eb8

Please sign in to comment.