-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Python 3.8, 3.9 #61
Conversation
This pull request has been linked to Shortcut Story #132053: [aks-clusters][Platform] necessary update. |
ipaddress==1.0.23 | ||
msrest==0.6.21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some info about this removal since it was explicitly added in sc-93743
Some exception where moved from msrest into azure-mgmt-core. And we run into that issue because we where pining azure-mgmt-core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me globally
Tested on DSS 12.0 (kit from daily build , revision fd444867f6d5b88f698073369de5327457ed38ba
) with plugin version packaged from this PR
TL;DR
- globally OK
⚠️ We should probably open cards for these two points about theResize cluster
macro:- default node pool name is not the correct one (
None
instead ofnodepool0
) ; probably not related to this PR - node pools can't be deleted with the macro even providing the node pool name ; probably not related to this PR
- default node pool name is not the correct one (
Tested
- check where GPU machines are available (type-wise and quota-wise)
az vm list-skus --all --output table --size Standard_NC --location westus
- compare with Dataiku quotas
- create a new resource group in
westus
region - perform FM setup and deploy FM
- create an elastic fleet and deploy the design node in DSS 11.4.2
- create a container registry in the same region as DSS(
asouilleuxregistry2.azurecr.io
) - Administration > Settings > Containerized execution
- create a new Containerized execution config
- Image registry URL=asouilleuxregistry2.azurecr.io ; Image pre-push hook=Enable push to ACR ; Custom limits=nvidia.com/gpu=1
- create a new Containerized execution config
- package plugin from PR and upgrade the AKS plugin with this new version
- rebuild containerized execution base images
# download spark and hadoop zips from https://downloads.dataiku.com/preview/dss/12.0.0-dev21/
# upload them to the machine
rsync -Pav dataiku-dss-*.tar.gz $DSS_TARGET:~/
ssh -i $SSH_KEY $DSS_TARGET
sudo cp dataiku-dss-hadoop-standalone-libs-generic-hadoop3-12.0.0-dev21.tar.gz /opt/dataiku-dss-11.4.3/
sudo cp dataiku-dss-spark-standalone-12.0.0-dev21-3.3.1-generic-hadoop3.tar.gz /opt/dataiku-dss-11.4.3/
# install hadoop and spark support
sudo su dataiku
cd /data/dataiku/dss_data
./bin/dssadmin install-hadoop-integration -standalone generic-hadoop3 -standaloneArchive /opt/dataiku-dss-11.4.3/dataiku-dss-hadoop-standalone-libs-generic-hadoop3-12.0.0-dev21.tar.gz
./bin/dssadmin install-spark-integration -standaloneArchive /opt/dataiku-dss-11.4.3/dataiku-dss-spark-standalone-12.0.0-dev21-3.3.1-generic-hadoop3.tar.gz
# restart DSS instance
./bin/dss restart
# build container exec images
./bin/dssadmin build-base-image --type container-exec --without-r --with-py39 --with-cuda --cuda-version 11.2
# restart DSS instance
./bin/dss restart
Plugins > ADD PLUGIN
- update the plugin with version built from the PR
Plugins > Installed > AKS clusters > Code environment > CHANGE
- create a new code env py39 with Python 3.9
Administration > Clusters > CREATE AKS CLUSTER
- create cluster asouilleux-cluster with GPUs ✅
- Node pools
- Machine type=Standard_NC6s_v3 ; disk size=0 ; Default number of nodes=1 ; Enable nodes autoscaling=ticked ; Min number of nodes=1 ; Max number of nodes=2 ; Availability zones=unticked ; GPU=ticked
- Advanced options
- Service CIDR=10.1.0.0/16 ; DNS IP=10.1.0.10 ; Load balancer SKU=Standard
- Node pools
- create cluster asouilleux-cluster with GPUs ✅
Administration > Settings > Containerized execution
Default settings > Default cluster
- change value to asouilleux-cluster
Resources for Kubernetes containers > Custom limits
- add nvidia.com/gpu=1
PUSH BASE IMAGES
Administration > Code Envs > NEW PYTHON ENV
- create a new code env with Python 3.9
Packages to install
- add tensorflow==2.11.0
Containerized execution
Build for > Selected container configurations > azure
SAVE AND UPDATE
- create project sc132053
- create a Python Notebook
- Code env=py39
- Containerized exec=azure
- open Python notebook and check the kernel starts ✅
- execute the following ✅
- create a Python Notebook
import tensorflow as tf
tf.config.list_physical_devices('GPU')
# [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
- stop the cluster asouilleux-cluster ✅
- start a cluster asouilleux-cluster2 with minimal options
- Node pools
- Machine type=Standard_B8ms ; disk size=0 ; Default number of nodes=1 ; Enable nodes autoscaling=ticked ; Min number of nodes=1 ; Max number of nodes=2 ; Availability zones=unticked
- Advanced options
- Service CIDR=10.1.0.0/16 ; DNS IP=10.1.0.10 ; Load balancer SKU=Standard
- Node pools
Administration > Settings > Containerized execution
Resources for Kubernetes containers > Custom limits
- remove nvidia.com/gpu=1
Default settings > Default cluster
- change value to asouilleux-cluster2
Administration > Clusters > asouilleux-cluster > Actions
Run kubectl command
✅Delete finished pods
✅Delete all pods
✅Delete finished jobs
✅Inspect node pools
✅Resize cluster
⚠️ default node pool name is not the correct one (not reproduceableNone
instead ofnodepool0
) ; probably not related to this PR⚠️ node pools can't be deleted with the macro even providing the node pool name ; probably not related to this PR- resizing the node pool only works ✅
Test network connectivity
✅
- stop cluster ✅
- change cluster settings
- Identity assumed by cluster components
- Identity type=Managed identities ; Control plane user identity=[DSS identity] ; Kubelet user identity=[DSS identity]
- Identity assumed by cluster components
- start cluster and test cluster connectivity with macro ✅
- stop cluster ✅
- change cluster settings
- Identity assumed by cluster components
- Identity type=Managed identities ; AKS managed identity=ticked ; Assign persmissions for Vnet=ticked ; AKS managed Kubelet identity=ticked ; Assign permissions for ACR=asouilleuxregistry2
- Identity assumed by cluster components
- in Azure portal
- give ownership on asouilleuxregistry2 to asouilleux-dss-id
- give ownership on asouilleux-fm-westus-vn to asouilleux-dss-id
- start cluster and test cluster connectivity with macro ✅
- stop cluster ✅
- create a service principal in Azure
az ad sp create-for-rbac --name asouilleuxClusterServicePrincipal
- change cluster settings
- Identity assumed by cluster components
- Identity type=Service principal ; Application (client) ID=[appId] ; Password=[password]
- Identity assumed by cluster components
- start cluster and test cluster connectivity with macro ✅
- stop cluster ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me, see comment above
[sc-132053]
Basic check on 2.7, 3.6, 3.7, 3.8, 3.9.
On 3.9 I also tested
I haven't tested legacy option (but I don't plan to do that).
It would be nice to test GPU but I did my setup in FranceCentral and I'm not able to provision GPU node here.