Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the delete by query endpoint to match the OpenSearch API #350

Merged

Conversation

Xtansia
Copy link
Collaborator

@Xtansia Xtansia commented Nov 1, 2023

Description

Use the correct delete by query endpoint as available in OpenSearch.

Issues Resolved

Closes #348

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Xtansia Xtansia marked this pull request as ready for review November 1, 2023 04:25
@Xtansia Xtansia changed the title Correct the delete by query endpoint to match OpenSearch's Correct the delete by query endpoint to match the OpenSearch API Nov 1, 2023
@harshavamsi
Copy link
Collaborator

LGTM!

@harshavamsi harshavamsi merged commit c29174f into opensearch-project:main Nov 2, 2023
18 checks passed
@Xtansia Xtansia deleted the fix/correct-delete-by-query branch December 11, 2023 23:45
@borgoat
Copy link

borgoat commented Mar 4, 2024

Hello! Any chance this makes it into a release? I'd rather avoid having to build from source...

@Xtansia
Copy link
Collaborator Author

Xtansia commented Mar 4, 2024

Hello! Any chance this makes it into a release? I'd rather avoid having to build from source...

Hi @borgoat, I'll look into kicking off a release for this sometime this week

@Xtansia
Copy link
Collaborator Author

Xtansia commented Mar 18, 2024

Hello! Any chance this makes it into a release? I'd rather avoid having to build from source...

@borgoat v1.1.0 should now be on Maven which includes this PR, please create a ticket if you run into any further issues

@borgoat
Copy link

borgoat commented Apr 4, 2024

Hello @Xtansia

It seems this fixed part of it, but it broke somewhere else now: the delete by query now works.

However, it seems the delete method goes on to try another scan and delete later, which fails without much info

Shouldn't it early-return here1 when successful?

Stacktrace
ERROR [main] glue.ProcessLauncher (Logging.scala:logError(98)): Exception in User Class
org.opensearch.hadoop.rest.OpenSearchHadoopInvalidRequest: null
null
	at org.opensearch.hadoop.rest.RestClient.checkResponse(RestClient.java:476) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.hadoop.rest.RestClient.execute(RestClient.java:435) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.hadoop.rest.RestClient.execute(RestClient.java:429) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.hadoop.rest.RestClient.execute(RestClient.java:409) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.hadoop.rest.RestRepository.scroll(RestRepository.java:325) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:104) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
>>	at org.opensearch.hadoop.rest.RestRepository.delete(RestRepository.java:431) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.spark.sql.OpenSearchRelation.insert(DefaultSource.scala:559) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.opensearch.spark.sql.DefaultSource.createRelation(DefaultSource.scala:107) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:591) ~[spark-catalyst_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:96) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:83) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:81) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:124) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) ~[spark-sql_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1]
	at com.yeekatee.analytics.spark.OpenSearchBatchLoadJob$.uploadFinancialInstruments(OpenSearchBatchLoadJob.scala:189) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at com.yeekatee.analytics.spark.OpenSearchBatchLoadJob$.jobDefinition(OpenSearchBatchLoadJob.scala:82) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at com.yeekatee.analytics.spark.OpenSearchBatchLoadJob$.main(OpenSearchBatchLoadJob.scala:63) ~[ed98faf975f157b85bdd1c6d9ecc5a8a9cef0694f14c8c2c9831c7668c8d526a.jar:0.1.0]
	at GlueApp$.main(b2038cf13549728447ed4627151b61b75bf6e895991eebb0348f40d929dafe19.scala:3) ~[b2038cf13549728447ed4627151b61b75bf6e895991eebb0348f40d929dafe19.scala.jar:?]
	at GlueApp.main(b2038cf13549728447ed4627151b61b75bf6e895991eebb0348f40d929dafe19.scala) ~[b2038cf13549728447ed4627151b61b75bf6e895991eebb0348f40d929dafe19.scala.jar:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_402]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_402]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_402]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_402]
	at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke(ProcessLauncher.scala:65) ~[AWSGlueSparkResourceManager-1.0.jar:?]
	at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke$(ProcessLauncher.scala:65) ~[AWSGlueSparkResourceManager-1.0.jar:?]
	at com.amazonaws.services.glue.ProcessLauncher$$anon$2.invoke(ProcessLauncher.scala:212) ~[AWSGlueSparkResourceManager-1.0.jar:?]
	at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:400) ~[AWSGlueSparkResourceManager-1.0.jar:?]
	at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:45) ~[AWSGlueSparkResourceManager-1.0.jar:?]
	at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala) ~[AWSGlueSparkResourceManager-1.0.jar:?]

Footnotes

  1. https://github.com/opensearch-project/opensearch-hadoop/blob/4072a46541990f25345103decc213835b6c7be9f/mr/src/main/java/org/opensearch/hadoop/rest/RestRepository.java#L383-L387

@Xtansia
Copy link
Collaborator Author

Xtansia commented Apr 5, 2024

@borgoat It's unfortunate there's an issue in the construction of exception messages for invalid requests meaning the message is unhelpful (null \n null) in this instance. I've made a PR to improve this in #438, that would help with debugging your issue.

The fact it's given that error message means OpenSearch is reporting a client error 4xx status code but without an actual error response body, which is interesting. Are you able to share your opensearch.* Spark settings being used and some details about your cluster such as version & self hosted or managed etc? If you have access to the cluster logs that may also shed some light on what's failing if you're able to find any items relating to the failed request.

@borgoat
Copy link

borgoat commented Apr 5, 2024

Good point! This is a Amazon OpenSearch Service managed cluster, version is OpenSearch_2_9_R20230928-P3 - I'll try with 2.11 now, we had to upgrade since a while anyway...

The configuration doesn't have much to it, just what I needed to make this work in Glue:

    conf
      .set("spark.sql.caseSensitive", "true")
      .set(ConfigurationOptions.OPENSEARCH_NODES, args(ArgOpenSearchNode))
      .set(ConfigurationOptions.OPENSEARCH_PORT, "443")
      .set(ConfigurationOptions.OPENSEARCH_NET_USE_SSL, "true")
      .set(ConfigurationOptions.OPENSEARCH_NODES_WAN_ONLY, "true")
      .set(ConfigurationOptions.OPENSEARCH_AWS_SIGV4_ENABLED, "true")
      .set(ConfigurationOptions.OPENSEARCH_AWS_SIGV4_REGION, "eu-west-1")
      .set(ConfigurationOptions.OPENSEARCH_MAPPING_ID, "id")

The OpenSearch error logs are empty, I couldn't find anything useful there.

I don't know if it's relevant to reproduce the error, but I should mention that the target index is actually an alias to 1 other index.

@borgoat
Copy link

borgoat commented Apr 5, 2024

Some more info: I'm now testing against 2.11, and to the index itself, not the alias - still the same behaviour

It seems the return code is a 403 when starting the scan:

image

I'm using AdministratorAccess right now to rule out any potential IAM misconfiguration.

@Xtansia
Copy link
Collaborator Author

Xtansia commented Apr 8, 2024

@borgoat The fact you're getting a 403 means it is a permissions issue. Potentially the IAM credentials you're running as aren't correctly mapped to an OpenSearch role/user. If it was correctly mapped but the internal OpenSearch role itself was missing permissions it should result in something like OpenSearchHadoopRemoteException: security_exception: no permissions for [indices:data/read/search] and User [name=foobar, backend_roles=[], requestedTenant=null] rather than the null null case.

@borgoat
Copy link

borgoat commented Apr 12, 2024

Hey @Xtansia I tried to debug this further.

First, I tried the exact same _search query, in Postman, with the same IAM credentials, and got a successful response.

I then extracted the query params and headers generated by the opensearch-hadoop client, and ran the query again. Now here's the interesting bit:

{
    "message": "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical String for this request should have been\n'GET\n/financial-instruments/_search\n_source=false&scroll=10m&size=500&sort=_doc\nhost:search-opensearchdomai-8nj4cmsbjotp-2ddjfrdiqgaipkggf4fn4iaokm.eu-west-1.es.amazonaws.com\nx-amz-content-sha256:cf6106deb8138e08a46406374390e2b4fb91ebae2aacc09780869022112363b9\nx-amz-date:20240412T075528Z\nx-amz-security-token:IQoJb3JpZ2luX2VjECAaCWV1LXdlc3QtMSJIMEYCIQD3FP0nlI7nxzwhEDJz8OaXQuBhpndAkkAIqZB4hrQVigIhAKYhACI5tvZe72eXd2d7uUdbrJV7cOO4/eBAvjlI3TFdKqYDCFkQARoMMjc1MjE0NzE2OTkzIgwchQ9IIRtFi48dPQ4qgwMtwZig+TD2HueIA7eI6b+sZSz9SQPhDHb8bHYKcQiLXmmqJii1AijTsmfnsh84UroqTR3e8HE/DjG6fhsjJiL0rriWqAGXzhkcC3Uyo/UWfOiGscwxYdBvFZW0RoJ4ye7MFACZvOmcupg36CCBjJNYsbxlXLsjwkZ6MEiOJySZxsqak1mKd6HixJ0Y7d2gPUfGBCXLgvv1V9iZ1v3FSRqKgXYBHXP038DP5aREJYYzRVeFncLUW+67EqWO3QwX2gr0T4/yerU09j2HDJjDAc5mzgD/sNjuF5JXqsylukIiuq+HSn1cfXDKdUe/O4pFc0n8Kev8sjnFSLVSekTMu9qtPS4slzV5rKNGuLnzxrgfsQwtBnw+RTCQE0zTjRwgFsA3EKJalDQhkstHrtovqBTprpaftkRffdQYEl87MtgYbjbCjxjfF4tNZzZLImh7cDvrCX/C/W1a2Ayfeg7lsiCZKQ5deE7DshdySwF4N1dENYy661kCGo5AWXPMkrLwuRmoOU8w/sXjsAY6pQECjD85EaqOMGrXJudf1HH0+8YQ46AmZSsPaNYP/jYGFxaI1jDepBFDkTTstBaAU7jfZa+7xsmmOhvPpR+N4oS2r2T4lM1jrSs5gBmqYlMcjxqsEUaEUiQo5OjedN7WbXkBHD2W9nJshEjIHLwec9J1ae/tQdFXlywW/kta+BfE1Bll0lZmEEMQstI5hqrx9FnlmZHuAbr81EgIdAWg46hjW82mjSs=\n\nhost;x-amz-content-sha256;x-amz-date;x-amz-security-token\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\n20240412T075528Z\n20240412/eu-west-1/es/aws4_request\n0d1dc2559389fba18bbe116edb45e899daf7c441190406ac8b543c2969b474b2'\n"
}

It seems the request produced by this client is not canonical, and therefore the signature is invalid.

And indeed, looking at the way params are appended here,1 they are not alphabetically sorted.

        sb.append("/_search?scroll=10m&_source=false&size=");
        sb.append(batchSize);
        sb.append("&sort=_doc");

You must also sort the parameters in the canonical query string alphabetically by key name. 2

I think this could be it!

Footnotes

  1. https://github.com/opensearch-project/opensearch-hadoop/blob/4072a46541990f25345103decc213835b6c7be9f/mr/src/main/java/org/opensearch/hadoop/rest/RestRepository.java#L404-L406

  2. https://docs.aws.amazon.com/IAM/latest/UserGuide/create-signed-request.html#create-canonical-request

@Xtansia
Copy link
Collaborator Author

Xtansia commented Apr 15, 2024

@borgoat The signer internally sorts the params when building up the canonical request so it's not that. I did however figure out what was happening, turns out if the AWS SDK's signer sees a POST request with query params but no body content, it will move the query params to the body when calculating the signature, resulting in a mismatch with what the service is expecting:
Screenshot 2024-04-15 at 5 11 10 PM
The left is what the signer calculated, the right is what the service said it was expecting, notice the query params are nowhere to be seen as they've been treated as body content, and as such the body SHA256 is different as well and not the expected SHA256 of an empty string e3b0c....
I'm currently working on putting in a PR to fix this now: #443

@Xtansia
Copy link
Collaborator Author

Xtansia commented Apr 30, 2024

@borgoat The fix for this is now released in v1.2.0, could you please confirm this solves your issue?

@borgoat
Copy link

borgoat commented May 10, 2024

@borgoat The fix for this is now released in v1.2.0, could you please confirm this solves your issue?

Yes, it seems 1.2.0 fixed this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Delete by query
3 participants