Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The workflow did not pull the remote image to create a docker container when it was running. #271

Closed
lanzhixi opened this issue Apr 7, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@lanzhixi
Copy link
Contributor

lanzhixi commented Apr 7, 2024

I successfully imported the piece I made into the platform and created the workflow. An error occurred during runtime. The rest log did not report an error message. However, after troubleshooting, I found that the piece I made did not create a docker container (image publish was successful).
So I copied ml_domino_pieces , and only modified the REGISTRY_NAME in config.toml to be my own account name (this is just to ensure that the publish image is successful). The rest of the places were not modified. After importing to the platform, it is still the same. No docker container is created when running the workflow.
No errors were reported throughout the process, but the docker container did not run and the cloud image was not pulled. What could be the reason? Are there any other operations required to import the piece I made to work properly?
Thanks for your answer!

@vinicvaz
Copy link
Collaborator

vinicvaz commented Apr 7, 2024

Hey @lanzhixi, sorry you are facing issues in creating pieces, I'll try to help you to debug what is happening.

  1. What environment are you running Domino? Are you running locally using docker compose?
  2. When you run the workflow, what happens with the piece and with your workflow? Do they go to the failed state or they get stucked in running or other state? Can you send me a screenshot of your workflow after running it?
  3. Your ml_domino_pieces is just a copy of ours, right? If so, can you make it public for a while so I can try to reproduce it ?

@lanzhixi
Copy link
Contributor Author

lanzhixi commented Apr 8, 2024

Hey @vinicvaz ,Thank you for your reminder, I will standardize my questioning method next time.
The latest attempt is to successfully run the piece I made. The reason is that I am used to naming with uppercase letters, and in order to make the image build successful, all lowercase letters are filled in config.toml. This may cause the platform to not find the corresponding image based on the repository name at all, so it appears that the cloud image is not being pulled.

@luiztauffer
Copy link
Member

thanks for spotting the issue @lanzhixi ! Can you confirm you're now able to run your Pieces?

@vinicvaz vinicvaz self-assigned this Apr 8, 2024
@vinicvaz vinicvaz added the bug Something isn't working label Apr 8, 2024
@lanzhixi
Copy link
Contributor Author

lanzhixi commented Apr 9, 2024

@luiztauffer , @vinicvaz
Thank you for your continued attention. I am sure that if I don't make the mistakes I made before, I can successfully create the piece and successfully import the domino to run. This is so cool!
But in my recent attempts, I have encountered some problems (the image can be pulled normally, but there are problems running), and I would like to ask you for help.
image

After successfully trying to use the template, my confidence increased and I started to create my own lightGBMPiece. However, the lightGBMPiece I wrote had problems running. After adding tests, I found that the problem was not in the code. I'd like to ask you to help me take a look.(I think I may have missed something, but I checked it many times and it is consistent with the format of the case.)
image

thanks!

@vinicvaz
Copy link
Collaborator

vinicvaz commented Apr 9, 2024

Hey @lanzhixi , thanks for reporting this.
Yes, you are right, the tests will not run for your piece, and there are few reasons why:

  1. I think there is a missing argument in your lightgbm dockerfile, you should include RUN apt-get install libgomp1 . I tried to run it locally after building and got the error:
[ImportError: libgomp.so.1: cannot open shared object file: No such file or directory](https://stackoverflow.com/questions/43764624/importerror-libgomp-so-1-cannot-open-shared-object-file-no-such-file-or-direc)

This command should fix it.

  1. Not sure but I think you have to fix some things in your code, like the missing evaluation data, which I Think is required for early stop

  2. The third thing is our fault. The way Domino runs tests on github environment still a bit tricky, I'll try to explain here what and why the things happen.
    First lets imagine a scenario with multiple pieces and a lot of dependency conflicts in a same repository. For this scenario, installing all the pieces dependencies in the github actions environment would be impossible, so to avoid that we indeed build the docker images in github actions environment and and run each image independently, where each image listen to the tests env. Basically what we are doing is separating the tests and piece code environment, where tests are running in github actions root environment and the piece code will run in a docker container running inside the github actions root environment.
    The way we do that is basically done in 3 steps:

  • First we build all the images and save a map for each piece name and corresponding image. Example: LightGBMTrainPiece: ghcr.io/lanzhixi/piece_test:0.1.5-group0
  • Based on the piece name defined on the test we run your built docker image starting a really tiny HTTP server in your piece container. This HTTP server will listen the request from the piece_dry_run function, pass it to your piece_function and return the results to the test function.

This is the way we've found to run the pieces in their isolated environment, the problems with that are:

  • We can't mock internal functions
  • We can't read files from outside of piece build. I guess this is one of the problems you're facing. You can add your test file (iris.csv) to build your build context (saving it on the piece folder should be enough) and on your test you can use the relative path to it or even the absolute path on the container, like /home/domino/pieces_repository/pieces/LightGBMTrainPiece/iris.csv.
  • It is hard to debug

Additional information / Alternative Solution

  • You can always test your piece in your local environment before sending it to github, it will help you to debug things like code error (not docker image problems like the point 1)
  • You can skip a piece running in a specific environment using the skip_env decorator:
  • You can always develop your piece function as a separate function, or even in a jupyter notebook and after ensuring all the code is right you can bring it to the Domino framework and add tests to it.
from domino.testing.utils import skip_envs

@skip_envs('github')
def test_my_piece():
  ...

This might be useful for tests that you can't run yet in github actions but want to run locally, like here.

I know this is a lot of information to get so if you have any questions feel free to send it here.

@lanzhixi
Copy link
Contributor Author

Hey @vinicvaz ,Thank you very much for your detailed answer, it's very useful to me!
I successfully ran my piece by modifying the dockerfile.
屏幕截图 2024-04-10 172303
But I had to add the skip_env decorator to the test file to skip the test, otherwise the test would still report an error.
image
That's all for now.
I will continue to improve lightGBM related pieces in the future. If necessary, I will be happy to submit them to the ml_domino_pieces repository.
Thanks!

@luiztauffer
Copy link
Member

awesome job @lanzhixi! let us know how your ML pieces evolve, we would be happy to integrate them into existing repos or to juts list them in the open source gallery!

I suggest you also take a look into how to display results from your piece at the GUI, it can be very useful for ML pieces such as LightGBM: https://docs.domino-workflows.io/pieces/create_pieces#piecepy
Example of a piece producing a Plotly image: https://github.com/Tauffer-Consulting/ml_domino_pieces/blob/main/pieces/PCAInferencePiece/piece.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants