Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: add Scala example #346

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/.scalafmt.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
version = "3.7.3"
runner.dialect = scala213
48 changes: 45 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,52 @@
## Delta Sharing examples
In this folder there are examples taken from the delta.io/delta-sharing quickstart guide and docs. They are available in Python and can be run if the prerequisites are satisfied.
The profile file from the open, example Delta Sharing Server is downloaded and located in this folder.

The profile file(open-datasets.share) from the open, example Delta Sharing Server is downloaded and located in this folder.

### Python examples
In this folder there are examples taken from the delta.io/delta-sharing quickstart guide and docs. They are available in Python and can be run if the prerequisites are satisfied.


### Scala example
In addition, there is a Scala example(`main.scala`). You can easily run the example using [Scala-CLI](https://scala-cli.virtuslab.org/).

```sh
curl -sSLf https://virtuslab.github.io/scala-cli-packages/scala-setup.sh | sh
```

```sh
scala-cli run main.scala
```

When using Java 9 or later, remove comment-out from the lines(L17~18) as shown below to add java options

```diff
- ///> using javaOptions "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
+ //> using javaOptions "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED"
- ///> using javaOptions "--add-opens=java.base/sun.security.action=ALL-UNNAMED"
+ //> using javaOptions "--add-opens=java.base/sun.security.action=ALL-UNNAMED"
```
to avoid the following exception.

```
Exception in thread "main" java.lang.IllegalAccessError:
class org.apache.spark.storage.StorageUtils$ (in unnamed module @0xa803f94)
cannot access class sun.nio.ch.DirectBuffer
```



### Prerequisites
* For Python examples, Python3.6+, Delta-Sharing Python Connector, PySpark need to be installed, see [the project docs](https://github.com/delta-io/delta-sharing) for details.
* For Scala example
* [Scala-CLI](https://scala-cli.virtuslab.org/), which downloads Scala compiler and all the dependencies. See the installation guide at the official website → https://scala-cli.virtuslab.org/install
* Java 8 or later

### Instructions

Python
* To run the example of PySpark in Python run `spark-submit --packages io.delta:delta-sharing-spark_2.12:0.6.2 ./python/quickstart_spark.py`
* To run the example of pandas DataFrame in Python run `python3 ./python/quickstart_pandas.py`
* To run the example of pandas DataFrame in Python run `python3 ./python/quickstart_pandas.py`

Scala
- To run the example of Scala with Spark, run `scala-cli run main.scala`
- To enable editor support(completion, jump to definition, etc.) for the Scala example, setup [Metals](https://scalameta.org/metals/)(Scala language server) and run `scala-cli setup-ide .`
62 changes: 62 additions & 0 deletions examples/main.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
//> using scala "2.12.18"

//> using dep "org.apache.spark::spark-core:3.4.0"
//> using dep "org.apache.spark::spark-sql:3.4.0"
//> using dep "org.apache.spark::spark-catalyst:3.4.0"
//> using dep "io.delta::delta-sharing-spark:0.7.0"
//> using dep "io.delta::delta-core:2.4.0"
//> using dep "org.json4s::json4s-ast:3.5.3"
//> using dep "org.codehaus.jackson:jackson-mapper-asl:1.9.13"
//> using dep "com.fasterxml.jackson.core:jackson-core:2.8.8"
//> using dep "org.json4s::json4s-core:3.5.3"
//> using dep "org.json4s::json4s-jackson:3.5.3"
//> using dep "org.apache.httpcomponents:httpclient:4.5.14"

// When using Java 9+, remove the first slash character from the line 17 and 18

///> using javaOptions "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED",
///> using javaOptions "--add-opens=java.base/sun.security.action=ALL-UNNAMED"

// hack to violate encapsulation to use DeltaSharingRestClient
package io.delta.sharing.spark

import org.apache.spark.sql.SparkSession
import scala.collection.JavaConverters._
import java.sql.Timestamp
import java.nio.file.Path
import org.apache.hadoop.conf.Configuration

object App {
def main(args: Array[String]): Unit = {
val share = "open-datasets.share"
val conf = new Configuration()
val provider: DeltaSharingProfileProvider =
new io.delta.sharing.spark.DeltaSharingFileProfileProvider(
conf,
share
)
val client = new io.delta.sharing.spark.DeltaSharingRestClient(provider)
println(client.listAllTables().mkString("\n"))

val spark = SparkSession
.builder()
.appName("example")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val result = spark.read
.format("deltaSharing")
.load(
share + "/#delta_sharing.default.owid-covid-data"
)
.where($"iso_code" === "USA")
.select(
$"iso_code",
$"total_cases",
$"human_development_index"
)
result.show(10)

}

}