-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I can't mount a volume in my container locally #3835
Comments
I just realized that my pipeline was running in a different namespace than the notebook server (kubeflow and not kubeflow-user). Maybe that's what caused the error? Indeed the pvc doesn't exist in the kubeflow namespace ....But I'm still a little bit lost on how to proceed. I'm just starting with Kubernetes/Kubeflow and the beginning is quite rough 😄 |
Your
The pvc needs to be in the same namespace as the pipeline.
The easiest way way is to just install KFP from Google Cloud Marketplace: https://console.cloud.google.com/marketplace/details/google-cloud-ai-platform/kubeflow-pipelines If you're on Windows or Mac OS X it should be pretty easy to just install Docker Desktop with Kubernetes and then use Kubeflow Pipelines Standalone deployment. Installing KFP using Kubeflow tools (kfctl, minikf) is not the most up to date and supported deployment option... |
Can you give us some feedback on how to make the initial experience better? TBH, You seem to have chosen some pretty hard and rough path to learn KFP: A very specialized installation option (minifk) for a different product (KF vs KFP (yes, I know this is confusing)). Coupled with a very specialized community contrib feature (VolumeOp) based on an advanced Kubernetes feature (PVCs). The road would be smoother if you startwith the official KFP documentation and the core features before moving to some advanced and specialized options. Check the two linked tutorials - they should give you a good jump-start on your pipeline and component building. I wonder whether you really need volumes. Volumes are advanced Kubernetes concepts and are specific to Kubernetes which means that you might be reducing your pipeline portability if you were to depend on them. KFP has great data passing support so you do not need to care about storage methods: Please check the following two tutorials: https://github.com/kubeflow/pipelines/blob/fd5778d/samples/tutorials/Data%20passing%20in%20python%20components.ipynb https://github.com/Ark-kun/kfp_samples/blob/ae1a5b6/2019-10%20Kubeflow%20summit/106%20-%20Creating%20components%20from%20command-line%20programs/106%20-%20Creating%20components%20from%20command-line%20programs.ipynb |
Hello @vlagache! Thank you for the detailed description in the issue and for using MiniKF. As you mentioned, the problem is indeed that pods cannot mount PVCs living in different namespaces. In the latest MiniKF (and KF) release, there is no support for multi-user pipelines, aka pipelines running in the namespace of the user (e.g., However, the question remains: what can you do now to solve this?
Here are the steps you can follow to take a snapshot and get the RokURL of your data volume:
Finally - and unrelated to the your issue - in latest MiniKF you can use the Volumes manager to populate PVCs in user's namespace (instead of creating a Notebook Server, as you did). |
Thank you for your answers @Ark-kun and @elikatsis , I will take the time to read and understand everything and get back to you if I have any more questions , thanks again for your quick answers. Indeed I had not understood that there were KF and KFP. Just to explain a little bit about the path I've taken regarding Kubeflow's apprenticeship...
And regarding your comment about volumes, I was thinking of doing this to import data into my pipeline. To process data in a component it is necessary that this data is in a volume mounted on this component, isn't it ? So I'm going to continue with the new information you gave me and I'll get back to you if I have new problems, thanks again. |
Mounting a volume is not necessary. The preferred model for Argo and KFP is that the system moves the data for you, so you do not have to care about the storage. This has many benefits - portability, simpler component code, data immutability guarantees, caching, etc. Although you can use the data from volumes, doing this opts you out of some system guarantees and can interfere with operation of some services. For example, caching either does not activate or activates when you do not want it. The data system does not know anything about the data you have in the volumes. They're also acting as global variables in some sense, so they bring all the problems that global variables entail. The system-passed data is strictly scoped in contrast. Creating components that process data: I've linked the two tutorials (python and command-line) that should be enough to cover the topic. You use InputPath and OutputPath annotations in python case and inputPath/outputPath placeholders in command-line case. You component only needs to work with the locally available files that the system gives you or takes from you. Importing the data: You just need a component with an output that downloads or extracts the data, saves it as a file and lets the system store it in the artifact repository. Two examples: You can also check this sample pipeline that imports data and processes it: https://github.com/kubeflow/pipelines/blob/2d26a4c/components/XGBoost/_samples/sample_pipeline.py |
Ok thanks for your answer @Ark-kun I'll read all about it, I already started by installing KFP standalone on a Kubernetes cluster that runs with Docker Desktop as you recommended ( with this Readme : https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize ) It's actually much easier, I see the difference between KFP and KF and now 😄 I'm going to go ahead and apply all the codes you provided me. Is it important for you that I close the issue or can I leave it open if I have more questions? |
Great to hear that =) There is also documentation here https://www.kubeflow.org/docs/pipelines/ although some parts may be a bit outdated - I'm working on updating it.
Not very important. You can leave it open for some time. |
I deleted my last question about kfp_endpoint. I was asking what the value of kfp.endpoint is for local use in the
It seems that it works when I remove "host" and that I execute my python file in the console, I have an error in the console but I had the run is well executed, I will be able to advance finally, thank you very much :)
the error is in French, but says "The process can't access the file because this file is used by another process:". |
I'm really glad to hear that!
When KFP is deployed to GKE, it's the main URL to of the UX. If you use port-forwarding, it's the IP address and port. There are some other options...
Maybe you have something like an antivirus which locks the file. I'll try to create a fix... |
@Ark-kun , what is the best practice for mounting directory with dataset for Computer vision training task? Download -> Preprocessing -> Training, how to make data flow between components? |
What steps did you take:
Hello everyone ,
I'm starting with Kubeflow and I'm trying to do a simple pipeline. Right now I'm just trying to use a .csv file that's in a volume . I tried to reproduce what I saw here : #477 , but I use MiniKf where I created a notebook server with a Data-Volume. My csv file is in this datavolume.
According to this : #783 , I've been looking for information about my volume ( copass-vol )
What happened:
I still get the same error message when i run the pipeline (I tried several values for pvc_name , copass-vol , copass-vol-t72cyk1yp , kubeflow-user/copass-vol-t72cyk1yp ):
This step is in Pending state with this message: Unschedulable: persistentvolumeclaim "copass-vol" not found
What did you expect to happen:
See the first line of my csv in the run logs of my pipeline 😄 . There's probably something I don't understand about the volumes.
Environment:
, https://www.kubeflow.org/docs/started/workstation/getting-started-minikf/
KFP version: Build commit: ca58b22
KFP SDK version:
When I execute the command in terminal of my Notebook Server
pip list | grep kfp
the answer is empty, yet I manage to import kfp into my python code . It also say that
Yet python version returns 3.6.9
jovyan@copass-0:~$ python --version Python 3.6.9
Thank you for your answers. Have a nice evening.
The text was updated successfully, but these errors were encountered: