Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot creation with wait_for_completion: response time longer than snapshot duration #65661

Closed
supasteev0 opened this issue Dec 1, 2020 · 1 comment
Labels
>bug needs:triage Requires assignment of a team area label

Comments

@supasteev0
Copy link

Elasticsearch version:
7.10.0

Plugins installed:
repository-s3

JVM version:
openjdk version "15.0.1" 2020-10-20

Description of the problem including expected versus actual behavior:
When executing a snapshot creation command with wait_for_completion=true query parameter, I noticed that the response is returned much later than the effective snapshot duration.
Indeed, the response is returned after 3 minutes when the snapshot duration is 1 or 2 seconds.

The snapshot repository is a S3 repository.
ES cluster is hosted on Kubernetes.
Same behavior with ES 7.9 on AWS, so I guess this is not related to the hosting platform.

Steps to reproduce:

$ ES_HOST=http://localhost:9200
$ INDEX=test
$ printf -v BODY '{"indices":"%s"}' "$INDEX"
$ time curl -XPUT ${ES_HOST}/_snapshot/test/test?wait_for_completion=true \
-H 'Content-Type: application/json' \
-d "$(echo $BODY)"

$ {"snapshot":{"snapshot":"test","uuid":"7CLDN7wSQxGA4QOo-RURkA","version_id":7100099,"version":"7.10.0","indices":["test"],"data_streams":[],"include_global_state":true,"state":"SUCCESS","start_time":"2020-12-01T09:31:44.898Z","start_time_in_millis":1606815104898,"end_time":"2020-12-01T09:31:45.298Z","end_time_in_millis":1606815105298,"duration_in_millis":400,"failures":[],"shards":{"total":2,"failed":0,"successful":2}}}
real	3m1,495s
user	0m0,034s
sys	0m0,008s

Is this due to the fact that the snapshot data transfer time to the s3 bucket is actually longer than the snapshot duration ?
I'm currently working on a cronjob to snapshot from cluster A and restore to cluster B, and I need to know if I can rely on wait_for_completion or if I should manage the wait time myself (meaning I would trigger the restore as soon as the snapshot state is SUCCESS).

Thank you for your help

@supasteev0 supasteev0 added >bug needs:triage Requires assignment of a team area label labels Dec 1, 2020
@original-brownbear
Copy link
Member

Hi @supasteev0

this is not a bug but actually a feature. Your snapshot repository likely contains some snapshots that were taken by an ES version older than 7.6. This means your repository does not have the benefits of changes to the repository format introduced in #46250 yet so we rely on an artificial wait to work around certain S3 file system consistency limitations (#51074). Deleting all snapshots from versions before 7.6 from the repository will make the artificial wait go away.

-> It's fair to interpret the wait time as the actual time the snapshot takes for purposes of the CRON job but ideally I would recommend just deleting older snapshots or starting from a fresh repository to work around this issue.

I'll close this for now since it doesn't look like a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug needs:triage Requires assignment of a team area label
Projects
None yet
Development

No branches or pull requests

2 participants