General AI ML #2

ryan-tribex · 2024-04-04T15:35:22Z

Feedback received from @huiwen99 :

We noticed that 1.07_How-to-train-a-model.ipynb is using the Tensorflow Keras framework, but as we expect subsequent modules to be in torch (and since our POC is in torch), shall we standardise the training curriculum notebooks to also be in the PyTorch framework?
Consider moving the 1.07 notebook from "Introduction to Deep Learning" to "Practical" after introduction to common libraries. This might give participants better understanding on where/how to use the libraries mentioned.
In 1.08 Intro to common libraries.ipynb, optimizer for torch section is missing.
In 1.09 Suggested resources and tools.ipynb, maybe we can include Docker under tools for model deployment.

re 1: it might just be practical to do so anyway, given the environment complications that might happen when setting up both tensorflow and torch in the same environment (if they don't properly use an environment manager)
re 2: I think this makes sense too. so the necessary changes would be:

1.08 Intro to Transformer Architecture can become 1.07
1.08 Common libraries was mislabelled (it was originally 1.09) but this is fine since it is now 1.08
1.07 How to train a model can become 1.09
1.09 Resources can become 1.10
re 3: not sure what you mean by this @huiwen99, do you mean the torch.optim part of the torch library? or the torch-optimizer package?
re 4: agree, we'll be giving participants a deeper dive into Docker in later units anyhow

huiwen99 · 2024-04-05T01:19:27Z

Yup I meant the torch.optim library. I see it in the updated code now so all is good for that notebook!

huiwen99 · 2024-04-09T07:00:19Z

For Unit 2:

2.04_Handling_clas_imbalances.ipynb
(a) Under "Exploring the Impact of Class Imbalance" section, should include how class imbalance can affect model performance as mentioned in the paragraph. Best if the trained model (without class imbalance techniques) has performance that is significantly lower than a trained model with class imbalance techniques.
(b) Under "Handling Imbalance with Class Weighting" section, is the header "Accounting for Class Imbalance in the Deep Learning Model" redundant? The paragraph below seems like an explanation for the code above. Same for "Incorporating SMOTE for Class Imbalance in Deep Learning" under "Handling Class Imbalance using SMOTE".
(c) Would it be possible to plot a graph to illustrate how SMOTE works? Something like in this tutorial?
2.09_Transfer_Learning_vs_Finetuning.ipynb
(a) Should this notebook be named "Fine-tuning pre-trained models on custom datasets" instead?
(b) Under "Transfer Learning by Freezing Layers of BERT" section, elaborate further that because most layers are fixed, the model isn't able to learn much from the custom data, therefore the accuracy is poor (stuck at 50% for every epoch).
(c) Might be good to highlight the accuracy difference between transfer learning and finetuning (50% vs 100%) and recommend finetuning for custom task and datasets.

huiwen99 · 2024-04-11T09:43:48Z

For Unit 4:

4.01_Pruning_Quantization_KD.ipynb
(a) Under each section, the original model and the new model looks the same when printed. Can we show the impact of each method more clearly? E.g. printing model size or inference speed.
(b) We can also add that using these techniques might have an impact on accuracy, so we would need to weigh the trade-offs.
4.05_Installing_Docker_and_basic_commands.ipynb
(a) Replace "hackathon" with "competition".
(b) Under basic commands, include docker compose as participants are likely going to use it. If possible, can we have another submodule teaching how to use docker compose?
(c) The docker build command should be docker build -t <image-name> . instead.
4.06_Simple_Webapp_using_Docker.ipynb
(a) Add the different REST API calls (i.e. differences between get, post, delete, etc.) so that participants understand them better.
4.07_Writing_a_Dockerfile.ipynb
(a) The example Dockerfile shown did not include EXPOSE 5000 but the following breakdown explanations included it.
(b) Under the explanation for base image, might be good to link to other images commonly used when dockerizing ML models. Suggest to use a fixed base version of nvcr image that will be used in the competition.
4.08_Running_Containers_and_API_Inference.ipynb
(a) The testing of API in Step 7 is unclear if participants wish to try different inputs (might not necessarily be simply numbers). We suggest including Postman in the module to test the API.

WaseemSheriff · 2024-04-11T13:59:39Z

For Unit 2:

2.04_Handling_clas_imbalances.ipynb
(a) Under "Exploring the Impact of Class Imbalance" section, should include how class imbalance can affect model performance as mentioned in the paragraph. Best if the trained model (without class imbalance techniques) has performance that is significantly lower than a trained model with class imbalance techniques.

Updated the examples to illustrate the performance improvement from class imbalance techniques

(b) Under "Handling Imbalance with Class Weighting" section, is the header "Accounting for Class Imbalance in the Deep Learning Model" redundant? The paragraph below seems like an explanation for the code above. Same for "Incorporating SMOTE for Class Imbalance in Deep Learning" under "Handling Class Imbalance using SMOTE".

Removed the two redundant headers

(c) Would it be possible to plot a graph to illustrate how SMOTE works? Something like in this tutorial?

Good point, added

2.09_Transfer_Learning_vs_Finetuning.ipynb
(a) Should this notebook be named "Fine-tuning pre-trained models on custom datasets" instead?

Renamed the file. My intial thought was the purpose is to demo fine-tuning vs transfer learning hands-on , hence the previous name

(b) Under "Transfer Learning by Freezing Layers of BERT" section, elaborate further that because most layers are fixed, the model isn't able to learn much from the custom data, therefore the accuracy is poor (stuck at 50% for every epoch).

I've added an explanation on this below the results

(c) Might be good to highlight the accuracy difference between transfer learning and finetuning (50% vs 100%) and recommend finetuning for custom task and datasets.

I've added an explanation comparing the two approaches and the recommendation.
Other changes: Convert 2.04 to PyTorch from TF. Rest of Unit 2 will also be changed to PyTorch in a future commit.

Hi @huiwen99 Thanks for the feedback. I've made the updates. I've commented the updates next to your original comment above.

WaseemSheriff · 2024-04-12T07:04:41Z

For Unit 4:

4.01_Pruning_Quantization_KD.ipynb
(a) Under each section, the original model and the new model looks the same when printed. Can we show the impact of each method more clearly? E.g. printing model size or inference speed.

I've updated the examples to illustrate the impact of the techniques and difference in the models in terms of size

(b) We can also add that using these techniques might have an impact on accuracy, so we would need to weigh the trade-offs.

After each technique, added a short section called "Tradeoffs" that discusses this

4.05_Installing_Docker_and_basic_commands.ipynb
(a) Replace "hackathon" with "competition".

Updated

(b) Under basic commands, include docker compose as participants are likely going to use it. If possible, can we have another submodule teaching how to use docker compose?

Added

(c) The docker build command should be docker build -t <image-name> . instead.

Updated

4.06_Simple_Webapp_using_Docker.ipynb
(a) Add the different REST API calls (i.e. differences between get, post, delete, etc.) so that participants understand them better.

A new section "Understanding REST API Calls" has been included at the end of the notebook for this

4.07_Writing_a_Dockerfile.ipynb
(a) The example Dockerfile shown did not include EXPOSE 5000 but the following breakdown explanations included it.

Updated and changed to 8000 instead

(b) Under the explanation for base image, might be good to link to other images commonly used when dockerizing ML models. Suggest to use a fixed base version of nvcr image that will be used in the competition.

Included this information. In addition to nvcr, added links to Pytorch & GCP Vertex AI official images

4.08_Running_Containers_and_API_Inference.ipynb
(a) The testing of API in Step 7 is unclear if participants wish to try different inputs (might not necessarily be simply numbers). We suggest including Postman in the module to test the API.

Makes sense, I have added two additional sections for testing the API; one using requests library and other using Postman

Hi @huiwen99 Thanks for the feedback. I've made the updates. I've commented the updates next to your original comment above.

huiwen99 · 2024-04-16T02:41:37Z

Thanks @WaseemSheriff ! Mostly looks good for this unit now.
Just a few more points:

Under 4.2.1_Installing_Docker_and_basic_commands.ipynb, the docker build command is missing a period, and should be docker build -t <image-name> .
Under 4.3.1_Writing_a_Dockerfile.ipynb, can we specify the use of LINUX/ARM64 architecture for the nvidia images?
Under 4.3.2_Running_Containers_and_API_Inference.ipynb, the definition of environment variable in Dockerfile should be ENV MODEL_NAME=MyModel
For 4.4.1_Deploy_pre-trained_model_on_GCP.ipynb, it is unclear how to run the notebook/code on GCP. Can we include a step-by-step guide with screenshots on this?

ryan-tribex · 2024-04-16T04:04:47Z

Some replies to @huiwen99 's questions here:

Under 4.2.1_Installing_Docker_and_basic_commands.ipynb, the docker build command is missing a period, and should be docker build -t <image-name> .

Good catch, obvious typo, fixed.

Under 4.3.1_Writing_a_Dockerfile.ipynb, can we specify the use of LINUX/ARM64 architecture for the nvidia images?

I strongly disagree with point 2 for a couple reasons:

The finals environment is an amd64 machine (as are most CPUs)
The cloud environment is actually amd64 not arm64
Setting arm64 would impede local testing on anything except Apple Silicon macs, and those have Rosetta for x86 compatibility
It's also worth noting that the robomaster python library that we'd eventually need to use for the robotics autonomy doesn't have wheels compiled for arm64, which essentially forces us onto amd64 unless they give better instructions on how to compile their wheels. I found a workaround for this previously to enable using the robomaster SDK on Apple Silicon, but it's far from ideal.

So, if anything, if we wish to prescribe a CPU architecture, we should consider mandating that they be built for amd64 instead!

Under 4.3.2_Running_Containers_and_API_Inference.ipynb, the definition of environment variable in Dockerfile should be ENV MODEL_NAME=MyModel

Similar to 1, straightforward fix, changed.

For 4.4.1_Deploy_pre-trained_model_on_GCP.ipynb, it is unclear how to run the notebook/code on GCP. Can we include a step-by-step guide with screenshots on this?

These notebooks are actually running on the GCP JupyterLab instance, as the screenshots in the notebook indicate. But there's going to be additional notebooks/guides on how to get things running on GCP which should be in the 3.2.* notebooks. Would that address this concern?

huiwen99 · 2024-04-16T05:58:38Z

My bad, yea let's stick to the AMD version.
As for the GCP notebook, the screenshots in the notebook are still broken image links for me so I'm unable to view them. But if there is going to be a tutorial on running code on GCP in 3.2.* then my concern would be addressed.

Thank you so much!

ryan-tribex · 2024-04-16T18:34:34Z

Oh, the image links are broken on the github notebook preview, they're fine if you checkout the branch locally and view the notebook

Example:

ryan-tribex · 2024-04-16T18:37:33Z

In that case, I think that should be everything for the general unit (pending 3.2.* which depends on the dev environment), so I'll close and merge this

* initial commit * initial commit * General AI ML (#2) * initial commit * add Unit 1 files * update Unit 1 files * add Unit 2 files * updates from Unit 1 feedback * updates from Unit 1 feedback * feat: rename fine-tuning notebook * Unit 2 updates * add Unit 4 files * add Unit 4 files * feat: add line about cloud docker * feat: gcp boilerplate docker * feat: update imports * updates from Unit 2 feedback * updates from Unit 2 feedback * updates from Unit 4 feedback * add Unit 4 files * fix: 4.4.1 typos * update file numbering system * fix: typos --------- Co-authored-by: waseem-ga <[email protected]> --------- Co-authored-by: waseem-ga <[email protected]> Co-authored-by: WaseemSheriff <[email protected]>

WaseemSheriff added 3 commits March 31, 2024 08:24

initial commit

335900e

add Unit 1 files

6e563b2

update Unit 1 files

20a896c

ryan-tribex requested a review from huiwen99 April 4, 2024 15:35

ryan-tribex assigned WaseemSheriff Apr 4, 2024

WaseemSheriff and others added 4 commits April 4, 2024 23:45

add Unit 2 files

f11c026

updates from Unit 1 feedback

d3c2718

updates from Unit 1 feedback

8599d3e

feat: rename fine-tuning notebook

ae14689

WaseemSheriff and others added 4 commits April 6, 2024 21:03

Unit 2 updates

602868e

add Unit 4 files

ec2fba0

add Unit 4 files

1036860

feat: add line about cloud docker

003b271

ryan-tribex added 2 commits April 10, 2024 21:02

feat: gcp boilerplate docker

581c34d

feat: update imports

04f49f3

updates from Unit 2 feedback

43a2514

WaseemSheriff added 2 commits April 11, 2024 22:21

updates from Unit 2 feedback

ec6c0a8

updates from Unit 4 feedback

4844099

WaseemSheriff and others added 3 commits April 13, 2024 17:18

add Unit 4 files

fce7e85

fix: 4.4.1 typos

ed525c3

update file numbering system

3bf7596

fix: typos

687f8bf

ryan-tribex changed the title ~~General AI ML Unit 1~~ General AI ML Apr 16, 2024

ryan-tribex merged commit acae882 into general-ai-ml Apr 16, 2024

ryan-tribex deleted the general-ai-ml-wip branch April 20, 2024 04:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General AI ML #2

General AI ML #2

ryan-tribex commented Apr 4, 2024

huiwen99 commented Apr 5, 2024

huiwen99 commented Apr 9, 2024

huiwen99 commented Apr 11, 2024 •

edited

Loading

WaseemSheriff commented Apr 11, 2024

WaseemSheriff commented Apr 12, 2024

huiwen99 commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024 •

edited

Loading

huiwen99 commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024

General AI ML #2

General AI ML #2

Conversation

ryan-tribex commented Apr 4, 2024

huiwen99 commented Apr 5, 2024

huiwen99 commented Apr 9, 2024

huiwen99 commented Apr 11, 2024 • edited Loading

WaseemSheriff commented Apr 11, 2024

WaseemSheriff commented Apr 12, 2024

huiwen99 commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024 • edited Loading

huiwen99 commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024

ryan-tribex commented Apr 16, 2024

huiwen99 commented Apr 11, 2024 •

edited

Loading

ryan-tribex commented Apr 16, 2024 •

edited

Loading