Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Katib Experiment was not successful #101

Open
mvlassis opened this issue Aug 8, 2024 · 4 comments
Open

AssertionError: Katib Experiment was not successful #101

mvlassis opened this issue Aug 8, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@mvlassis
Copy link
Contributor

mvlassis commented Aug 8, 2024

Bug Description

This issue was encountered in the deploy-cfk-to-eks (1.8) action in bundle-kubeflow repository. The full logs can be found here.

The katib-integration test in test_notebook.py fails and raises an AssertionError. This is the relevant log call from the logs:

-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR    test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
      1 @retry(
      2     wait=wait_exponential(multiplier=2, min=1, max=10),
      3     stop=stop_after_attempt(30),
      4     reraise=True,
      5 )
      6 def assert_experiment_succeeded(client, experiment):
      7     """Wait for the Katib Experiment to complete successfully."""
----> 8     assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
FAILED

Because the error was encountered during a Github action, I couldn't access the deployment and investigate further.

Note that this issue was not encountered during a previous run of the Github action, which can be found here. It's not clear whether this issue is reproducible or just intermittent.

To Reproduce

From the main page of the bundle-kubeflow repository, go to Actions, select the "Create EKS cluster, deploy CKF and run bundle test" action, and run it with the following options:

  • Comma-separated list of bundle versions e.g. "1.7","1.8": 1.8
  • Kubernetes version to be used for the AKS cluster: Leave empty
  • Branch to run the UATs from e.g. main or track/1.8: Leave empty

Environment

This job tries to deploy the UATs, using the following configuration from the dependencies.yaml file found here:

  • K8S_VERSION: "1.29"
  • JUJU_VERSION: "3.4"
  • JUJU_VERSION_WITH_PATCH: "3.4.4"
  • UATS_BRANCH: "track/1.8"

Relevant Log Output

-------------------------------- live log call ---------------------------------
INFO     test_notebooks:test_notebooks.py:44 Running katib-integration.ipynb...
ERROR    test_notebooks:test_notebooks.py:58 Cell In[8], line 8, in assert_experiment_succeeded(client, experiment)
      1 @retry(
      2     wait=wait_exponential(multiplier=2, min=1, max=10),
      3     stop=stop_after_attempt(30),
      4     reraise=True,
      5 )
      6 def assert_experiment_succeeded(client, experiment):
      7     """Wait for the Katib Experiment to complete successfully."""
----> 8     assert client.is_experiment_succeeded(name=experiment), f"Katib Experiment was not successful."
AssertionError: Katib Experiment was not successful.
FAILED

Additional Context

No response

@mvlassis mvlassis added the bug Something isn't working label Aug 8, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6112.

This message was autogenerated

@misohu
Copy link
Member

misohu commented Aug 14, 2024

@orfeas-k can you rerun it and make sure its gone ?

@misohu
Copy link
Member

misohu commented Aug 14, 2024

@orfeas-k
Copy link
Contributor

I reran the CI here https://github.com/canonical/bundle-kubeflow/actions/runs/10388848456/job/28765995378 and it looks like it succeeds which means that we have to deal with an intermittent issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants