You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.
How can one reproduce the bug?
Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings.
Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.
What is the expected behavior?
The insert/update timings should match or be similar.
What is your host/environment?
Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Both jars below hosted in S3 buckets.
elasticsearch-spark-30_2.12-8.6.0.jar
opensearch-spark-30_2.12-1.0.1.jar
What is the bug?
Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.
How can one reproduce the bug?
Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings.
Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.
What is the expected behavior?
The insert/update timings should match or be similar.
What is your host/environment?
Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Both jars below hosted in S3 buckets.
elasticsearch-spark-30_2.12-8.6.0.jar
opensearch-spark-30_2.12-1.0.1.jar
Do you have any screenshots?
Yes
Test Timings and configs.docx
The text was updated successfully, but these errors were encountered: