-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kedro-MLflow on AWS batch causes every node to be logged as a separate run #432
Comments
Hi @hugocool, this is a common feature request, and this is partially possible yet. First, I want to insist the fact that on AWS batch each node is logged in a separated run is a feature, not a bug 😄 AWS batch nodes are orchestrator nodes and they don't have the same purpose that That said, you request is valid: you may want to propagate a mlflow run id through different orchestrator nodes. Some good news:
Both solutions are valid and quite easy to setup. You are not the first one who wants to add some configuration overriding at runtime through CLI args (see #395) but I am quite reluctant to add some extra API when I think kedro will enable it natively with |
Would it not make sense to use a kedro run parameter override? so basically i need to override the custom kedro run command send to each docker container on batch to be one could probably make this work through the Also, I dont know at the top of my head how to push this run.id to kedro-mlflow.
And then try to pass it as a global, but i cant do Do you have an example of how to override the mlflow.tracking.run.id? I would love to contribute a full working solution, and incorporate it into a kedro-aws extension that is compatible with kedro-mlflow! |
If AWS Batch has some unique ID environment variable injected to every container (like run ID, but specific to the AWS Batch service itself), you can follow the same idea we have for kedro-sagemaker where we first add a node to a pipeline to "start mlflow run" which adds a tag to mlflow ( |
Yes, @marrrcin suggestion is likely the best way to do it: as explained, you need to start mlflow yourself in the container (e.g. by setting manually the |
Thanks @marrrcin for the suggestion, I had not thought of that! So the main difference with your suggestion would be the way the mlflow run id is communicated between the orchestrator and the AWS batch containers. In one approach, it is communicated through the docker run command for the container running in AWS batch, so the Kedro runner (AWSBatchRunner) could set this through the The approach mentioned by @marrrcin, if I understand correctly, leverages a identifier issued by AWS batch which shared across all the containers resulting from a single run. This could be One other thing to take into account is that with the recent addition of support for prefect 2.0, there is now also the possibility of using the prefect AWS batch job. Since I want to migrate to using a single open source orchestrator for as many projects as possible, I would like the method to work for prefect as well. |
I close the issue in favor of #395. I hope we can make it work after the 0.19 release with |
Description
When running kedro pipelines on AWS batch with kedro-mlflow every node is logged as a separate run, this is because the pipeline is executed on batch by running each node in a docker container with a separate docker run command, i.e
kedro run --node=...
.Context
This is undesirable since these are not separate runs, simply individual nodes and this quickly pollutes your mlflow tracking server.
Therefore each kedro run command issued to batch should be made aware of the already
active_run
this node is part of.Possible Implementation
While the changes to the batchrunner should be implemented in the deployment pattern, kedro-mlflow should allow one to pass a
mlflow_run_id
cli kwarg which sets therun_id
.Im currently implementing a solution using the configloader, a custom cloudrunner and changes to the batchrunner cli.
Im curious whether there is a better/more minimal alternative.
Possible Alternatives
setting a environment variable?
overriding the run_id with the git commit? (although this is difficult in batch since the container should be made aware of the git-commit)
The text was updated successfully, but these errors were encountered: