You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm deploying Determined.ai on our local self-hosted k8s cluster. There is a NAS inside the cluster used as shared storage. However, at the moment, determined doesn't seem to support using the local NAS directly for checkpoint storage, I have to manually mount the NAS on each device in the cluster using nfs and then set the checkpoint storage for determined to shared_fs.
Describe the solution you'd like
Supports one of the common NAS protocols for checkpoint storage, like NFS, SMB, WebDAV or so.
Describe alternatives you've considered
For a k8s cluster configured a NFS as a storage class (see this), using Persistent Volume as a checkpoint storage is an alternative.
Additional context
No response
The text was updated successfully, but these errors were encountered:
I think this is a good feature request, and it's not the first time we've heard it. I'll see if I can get somebody on our k8s team to take a closer look.
I have to manually mount the NAS on each device in the cluster using nfs and then set the checkpoint storage for determined to shared_fs.
In the mean time, I think there is a better workaround than what you're currently doing. If you set up the pod with the NAS/SMB/WebDAV/whatever storage, you can mount that storage to /determined_shared_fs/my_storage (the my_storage name is arbitrary, but /determined_shared_fs is not). Then you can configure the shared_fs in your experiment to be something like:
What our system will do under that configuration is mount each node's /any/valid/directory/at/all to /determined_shared_fs. Inside that directory, k8s will have mounted your NAS/SMB/WebDAV/whatever storage to /determined_shared_fs/my_storage, and our python libraries will automatically use the full /determined_shared_fs/my_storage path for all checkpoint information, so the extra mount from the node's filesystem is effectively ignored.
Describe the problem
I'm deploying Determined.ai on our local self-hosted k8s cluster. There is a NAS inside the cluster used as shared storage. However, at the moment, determined doesn't seem to support using the local NAS directly for checkpoint storage, I have to manually mount the NAS on each device in the cluster using nfs and then set the checkpoint storage for determined to
shared_fs
.Describe the solution you'd like
Supports one of the common NAS protocols for checkpoint storage, like NFS, SMB, WebDAV or so.
Describe alternatives you've considered
For a k8s cluster configured a NFS as a storage class (see this), using Persistent Volume as a checkpoint storage is an alternative.
Additional context
No response
The text was updated successfully, but these errors were encountered: