Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Add spark session extension for Hyperspace #504

Merged
merged 6 commits into from
Nov 15, 2021

Conversation

paryoja
Copy link
Contributor

@paryoja paryoja commented Oct 12, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

Added a feature to enable Hyperspace with SparkSessionExtention

Does this PR introduce any user-facing change?

Yes. Users now can enable Hyperspace as follows

spark-shell -c spark.sql.extensions=com.microsoft.hyperspace.HyperspaceSparkSessionExtension

or

val spark = SparkSession
       .builder()
       .appName("...")
       .master("...")
       .config("spark.sql.extensions", "com.microsoft.hyperspace.HyperspaceSparkSessionExtension")
       .getOrCreate()

How was this patch tested?

Manually with spark-shell. If an automated test is required, I will add it.

@paryoja paryoja changed the title Add spark session extension for Hyperspace [WIP] Add spark session extension for Hyperspace Oct 18, 2021
@paryoja paryoja closed this Oct 29, 2021
@paryoja paryoja deleted the feature/exetension branch October 29, 2021 04:47
@paryoja paryoja restored the feature/exetension branch October 29, 2021 04:48
@paryoja paryoja reopened this Oct 29, 2021
@paryoja paryoja changed the title [WIP] Add spark session extension for Hyperspace Add spark session extension for Hyperspace Oct 29, 2021
@paryoja paryoja force-pushed the feature/exetension branch 2 times, most recently from 923872a to 84b2969 Compare November 4, 2021 03:25
@sezruby sezruby added the enhancement New feature or request label Nov 4, 2021
@sezruby sezruby linked an issue Nov 4, 2021 that may be closed by this pull request
1 task
add configure to control enabling hyperspace
add dummy rule to avoid different behavior of Extensions / apply hyperspace
add test for hyperspace extension
Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @paryoja!
@clee704 Could you have another look and approve the PR?

override def apply(extensions: SparkSessionExtensions): Unit = {
extensions.injectOptimizerRule { sparkSession =>
// Enable Hyperspace to leverage indexes.
sparkSession.enableHyperspace()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the new model, enableHyperspace only exists for backward compatibility. I think it's better to factor out the rule insertion code out of this method and invoke the method here and from enableHyperspace.

Copy link
Contributor Author

@paryoja paryoja Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clee704 Do you mean this part?

      if (!sparkSession.sessionState.experimentalMethods.extraOptimizations.contains(
          ApplyHyperspace)) {
        sparkSession.sessionState.experimentalMethods.extraOptimizations ++=
          ApplyHyperspace :: Nil
      }
      if (!sparkSession.sessionState.experimentalMethods.extraStrategies.contains(
          BucketUnionStrategy)) {
        sparkSession.sessionState.experimentalMethods.extraStrategies ++=
          BucketUnionStrategy :: Nil
      }

Where should I put this code because package object hyperspace is quite customer side interfaces, so not sure if it is ok to create a function like

package object hyperspace {

  /**
   * Hyperspace-specific implicit class on SparkSession.
   */
  implicit class Implicits(sparkSession: SparkSession) {

    def enableHyperspace(): SparkSession = {
      HyperspaceConf.setHyperspaceApplyEnabled(sparkSession, true)
      addOptimizationsIfNeeded()
      sparkSession
    }

    private def addOptimizationsIfNeeded(): Unit = {
      if (!sparkSession.sessionState.experimentalMethods.extraOptimizations.contains(
          ApplyHyperspace)) {
        sparkSession.sessionState.experimentalMethods.extraOptimizations ++=
          ApplyHyperspace :: Nil
      }
      if (!sparkSession.sessionState.experimentalMethods.extraStrategies.contains(
          BucketUnionStrategy)) {
        sparkSession.sessionState.experimentalMethods.extraStrategies ++=
          BucketUnionStrategy :: Nil
      }
    }
  }
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put addOptimizationsIfNeeded() in a companion object HyperspaceSparkSessionExtension and call the method from here and enableHyperspace().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clee704 can you check whether I implemented as you expected?

src/main/scala/com/microsoft/hyperspace/package.scala Outdated Show resolved Hide resolved
@paryoja paryoja requested a review from clee704 November 12, 2021 02:46
*
* @param sparkSession Spark session that will use Hyperspace
*/
def addOptimizationsIfNeeded(sparkSession: SparkSession): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to locate the function in package.scala

@sezruby sezruby merged commit d8c4b79 into microsoft:master Nov 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST]: Enable hyperspace with SparkSessionExtention
3 participants