Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] testGetDeploymentStats_WithStartedStoppedDeployments failure #82327

Closed
benwtrent opened this issue Jan 6, 2022 · 5 comments
Closed

[ML] testGetDeploymentStats_WithStartedStoppedDeployments failure #82327

benwtrent opened this issue Jan 6, 2022 · 5 comments
Assignees
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

Repro line:

./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:javaRestTest' --tests "org.elasticsearch.xpack.ml.integration.PyTorchModelIT.testGetDeploymentStats_WithStartedStoppedDeployments" -Dtests.seed=5E2AFE7B6589657 -Dbuild.snapshot=false -Dtests.locale=ar-TN -Dtests.timezone=America/Campo_Grande -Druntime.java=17 -Dlicense.key="x-pack/license-tools/src/test/resources/public.key"

Reproduces locally?:
yes
Applicable branches:
master
Failure excerpt:

org.elasticsearch.xpack.ml.integration.PyTorchModelIT > testGetDeploymentStats_WithStartedStoppedDeployments FAILED
    org.elasticsearch.client.ResponseException: method [POST], host [http://127.0.0.1:64466], URI [/_ml/trained_models/foo/deployment/_infer], status line [HTTP/1.1 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Error in inference process: [inference native process died unexpectedly with failure [[2:3] [pytorch_result] unknown field [request_id]]]"}],"type":"status_exception","reason":"Error in inference process: [inference native process died unexpectedly with failure [[2:3] [pytorch_result] unknown field [request_id]]]"},"status":500}
        at __randomizedtesting.SeedInfo.seed([5E2AFE7B6589657:157DC45ADCA763B5]:0)
        at app//org.elasticsearch.client.RestClient.convertResponse(RestClient.java:346)
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:312)
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:287)
        at app//org.elasticsearch.xpack.ml.integration.PyTorchModelIT.infer(PyTorchModelIT.java:752)
        at app//org.elasticsearch.xpack.ml.integration.PyTorchModelIT.testGetDeploymentStats_WithStartedStoppedDeployments(PyTorchModelIT.java:360)
@benwtrent benwtrent added >test-failure Triaged test failures from CI :ml Machine learning labels Jan 6, 2022
@benwtrent benwtrent self-assigned this Jan 6, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jan 6, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent
Copy link
Member Author

OK, when build snapshot is false, the build provided is WAY too old. It doesn't have commit: elastic/ml-cpp@3a628c1 in it.

In fact its from commit: 6dde25e6ff38a3

@droberts195
Copy link
Contributor

@imotov
Copy link
Contributor

imotov commented Jan 6, 2022

Just failed in release tests in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request+release-tests/32/consoleText not sure if it was after or before the fix was applied.

@droberts195
Copy link
Contributor

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request+release-tests/32/ says Build #32 (06-Jan-2022 20:05:49).

The upload of the artifacts that fixed this happened in https://elasticsearch-ci.elastic.co/job/elastic+machine-learning+main+combine-artifacts/90/, which was Build #90 (06-Jan-2022 21:55:01).

So that explains why the correct artifacts were not available at the required time.

I believe this is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants