-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct the delete by query endpoint to match the OpenSearch API #350
Correct the delete by query endpoint to match the OpenSearch API #350
Conversation
Signed-off-by: Thomas Farr <[email protected]>
a280dfe
to
8b41ab5
Compare
LGTM! |
Hello! Any chance this makes it into a release? I'd rather avoid having to build from source... |
Hi @borgoat, I'll look into kicking off a release for this sometime this week |
@borgoat v1.1.0 should now be on Maven which includes this PR, please create a ticket if you run into any further issues |
Hello @Xtansia It seems this fixed part of it, but it broke somewhere else now: the delete by query now works. However, it seems the delete method goes on to try another scan and delete later, which fails without much info Shouldn't it early-return here1 when successful? Stacktrace
Footnotes |
@borgoat It's unfortunate there's an issue in the construction of exception messages for invalid requests meaning the message is unhelpful ( The fact it's given that error message means OpenSearch is reporting a client error |
Good point! This is a Amazon OpenSearch Service managed cluster, version is The configuration doesn't have much to it, just what I needed to make this work in Glue: conf
.set("spark.sql.caseSensitive", "true")
.set(ConfigurationOptions.OPENSEARCH_NODES, args(ArgOpenSearchNode))
.set(ConfigurationOptions.OPENSEARCH_PORT, "443")
.set(ConfigurationOptions.OPENSEARCH_NET_USE_SSL, "true")
.set(ConfigurationOptions.OPENSEARCH_NODES_WAN_ONLY, "true")
.set(ConfigurationOptions.OPENSEARCH_AWS_SIGV4_ENABLED, "true")
.set(ConfigurationOptions.OPENSEARCH_AWS_SIGV4_REGION, "eu-west-1")
.set(ConfigurationOptions.OPENSEARCH_MAPPING_ID, "id") The OpenSearch error logs are empty, I couldn't find anything useful there. I don't know if it's relevant to reproduce the error, but I should mention that the target index is actually an alias to 1 other index. |
@borgoat The fact you're getting a 403 means it is a permissions issue. Potentially the IAM credentials you're running as aren't correctly mapped to an OpenSearch role/user. If it was correctly mapped but the internal OpenSearch role itself was missing permissions it should result in something like |
Hey @Xtansia I tried to debug this further. First, I tried the exact same I then extracted the query params and headers generated by the opensearch-hadoop client, and ran the query again. Now here's the interesting bit: {
"message": "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\n\nThe Canonical String for this request should have been\n'GET\n/financial-instruments/_search\n_source=false&scroll=10m&size=500&sort=_doc\nhost:search-opensearchdomai-8nj4cmsbjotp-2ddjfrdiqgaipkggf4fn4iaokm.eu-west-1.es.amazonaws.com\nx-amz-content-sha256:cf6106deb8138e08a46406374390e2b4fb91ebae2aacc09780869022112363b9\nx-amz-date:20240412T075528Z\nx-amz-security-token:IQoJb3JpZ2luX2VjECAaCWV1LXdlc3QtMSJIMEYCIQD3FP0nlI7nxzwhEDJz8OaXQuBhpndAkkAIqZB4hrQVigIhAKYhACI5tvZe72eXd2d7uUdbrJV7cOO4/eBAvjlI3TFdKqYDCFkQARoMMjc1MjE0NzE2OTkzIgwchQ9IIRtFi48dPQ4qgwMtwZig+TD2HueIA7eI6b+sZSz9SQPhDHb8bHYKcQiLXmmqJii1AijTsmfnsh84UroqTR3e8HE/DjG6fhsjJiL0rriWqAGXzhkcC3Uyo/UWfOiGscwxYdBvFZW0RoJ4ye7MFACZvOmcupg36CCBjJNYsbxlXLsjwkZ6MEiOJySZxsqak1mKd6HixJ0Y7d2gPUfGBCXLgvv1V9iZ1v3FSRqKgXYBHXP038DP5aREJYYzRVeFncLUW+67EqWO3QwX2gr0T4/yerU09j2HDJjDAc5mzgD/sNjuF5JXqsylukIiuq+HSn1cfXDKdUe/O4pFc0n8Kev8sjnFSLVSekTMu9qtPS4slzV5rKNGuLnzxrgfsQwtBnw+RTCQE0zTjRwgFsA3EKJalDQhkstHrtovqBTprpaftkRffdQYEl87MtgYbjbCjxjfF4tNZzZLImh7cDvrCX/C/W1a2Ayfeg7lsiCZKQ5deE7DshdySwF4N1dENYy661kCGo5AWXPMkrLwuRmoOU8w/sXjsAY6pQECjD85EaqOMGrXJudf1HH0+8YQ46AmZSsPaNYP/jYGFxaI1jDepBFDkTTstBaAU7jfZa+7xsmmOhvPpR+N4oS2r2T4lM1jrSs5gBmqYlMcjxqsEUaEUiQo5OjedN7WbXkBHD2W9nJshEjIHLwec9J1ae/tQdFXlywW/kta+BfE1Bll0lZmEEMQstI5hqrx9FnlmZHuAbr81EgIdAWg46hjW82mjSs=\n\nhost;x-amz-content-sha256;x-amz-date;x-amz-security-token\ne3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'\n\nThe String-to-Sign should have been\n'AWS4-HMAC-SHA256\n20240412T075528Z\n20240412/eu-west-1/es/aws4_request\n0d1dc2559389fba18bbe116edb45e899daf7c441190406ac8b543c2969b474b2'\n"
} It seems the request produced by this client is not canonical, and therefore the signature is invalid. And indeed, looking at the way params are appended here,1 they are not alphabetically sorted. sb.append("/_search?scroll=10m&_source=false&size=");
sb.append(batchSize);
sb.append("&sort=_doc");
I think this could be it! Footnotes |
@borgoat The signer internally sorts the params when building up the canonical request so it's not that. I did however figure out what was happening, turns out if the AWS SDK's signer sees a |
@borgoat The fix for this is now released in v1.2.0, could you please confirm this solves your issue? |
Yes, it seems 1.2.0 fixed this. Thanks! |
Description
Use the correct delete by query endpoint as available in OpenSearch.
Issues Resolved
Closes #348
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.