Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement compared to Elasticsearch #500

Open
susasidharan opened this issue Jul 31, 2024 · 2 comments
Open

Performance Improvement compared to Elasticsearch #500

susasidharan opened this issue Jul 31, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@susasidharan
Copy link

What is the bug?

Performed A/B testing, comparing Opensearch index data ingestion from Databricks using elasticsearch-spark-30_2.12-8.6.0.jar vs opensearch-spark-30_2.12-1.0.1.jar. The test using Opensearch Spark as the connector had timings that was 2-3 times more that of Elasticsearch Spark connector.

How can one reproduce the bug?

Test 1: Create 10 separate Opensearch index (same schema) with Parent/Child records. Run the insert or update operations into 10 indices in parallel from databricks using elasticsearch spark connector first and record the timings. Then use Opensearch spark connector and record the timings.
Test 2: Create one Opensearch index. Run insert/update operations from databricks using elasticsearch spark connector and notice the timings. Then use Opensearch spark connector and notice the timings.

What is the expected behavior?

The insert/update timings should match or be similar.

What is your host/environment?

Opensearch 2.11, Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
Both jars below hosted in S3 buckets.
elasticsearch-spark-30_2.12-8.6.0.jar
opensearch-spark-30_2.12-1.0.1.jar

Do you have any screenshots?

Yes
Test Timings and configs.docx

@susasidharan susasidharan added bug Something isn't working untriaged labels Jul 31, 2024
@dblock dblock removed the untriaged label Aug 19, 2024
@dblock
Copy link
Member

dblock commented Aug 19, 2024

Catch All Triage - 1, 2, 3

@Pallavi-AWS
Copy link
Member

@anirudha will you be able to help out on this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants