Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General AI ML #2

Merged
merged 20 commits into from
Apr 16, 2024
Merged

General AI ML #2

merged 20 commits into from
Apr 16, 2024

Conversation

ryan-tribex
Copy link
Collaborator

Feedback received from @huiwen99 :

  1. We noticed that 1.07_How-to-train-a-model.ipynb is using the Tensorflow Keras framework, but as we expect subsequent modules to be in torch (and since our POC is in torch), shall we standardise the training curriculum notebooks to also be in the PyTorch framework?
  2. Consider moving the 1.07 notebook from "Introduction to Deep Learning" to "Practical" after introduction to common libraries. This might give participants better understanding on where/how to use the libraries mentioned.
  3. In 1.08 Intro to common libraries.ipynb, optimizer for torch section is missing.
  4. In 1.09 Suggested resources and tools.ipynb, maybe we can include Docker under tools for model deployment.

re 1: it might just be practical to do so anyway, given the environment complications that might happen when setting up both tensorflow and torch in the same environment (if they don't properly use an environment manager)
re 2: I think this makes sense too. so the necessary changes would be:

  • 1.08 Intro to Transformer Architecture can become 1.07
  • 1.08 Common libraries was mislabelled (it was originally 1.09) but this is fine since it is now 1.08
  • 1.07 How to train a model can become 1.09
  • 1.09 Resources can become 1.10
    re 3: not sure what you mean by this @huiwen99, do you mean the torch.optim part of the torch library? or the torch-optimizer package?
    re 4: agree, we'll be giving participants a deeper dive into Docker in later units anyhow

@huiwen99
Copy link
Collaborator

huiwen99 commented Apr 5, 2024

  1. Yup I meant the torch.optim library. I see it in the updated code now so all is good for that notebook!

@huiwen99
Copy link
Collaborator

huiwen99 commented Apr 9, 2024

For Unit 2:

  1. 2.04_Handling_clas_imbalances.ipynb
    (a) Under "Exploring the Impact of Class Imbalance" section, should include how class imbalance can affect model performance as mentioned in the paragraph. Best if the trained model (without class imbalance techniques) has performance that is significantly lower than a trained model with class imbalance techniques.
    (b) Under "Handling Imbalance with Class Weighting" section, is the header "Accounting for Class Imbalance in the Deep Learning Model" redundant? The paragraph below seems like an explanation for the code above. Same for "Incorporating SMOTE for Class Imbalance in Deep Learning" under "Handling Class Imbalance using SMOTE".
    (c) Would it be possible to plot a graph to illustrate how SMOTE works? Something like in this tutorial?
  2. 2.09_Transfer_Learning_vs_Finetuning.ipynb
    (a) Should this notebook be named "Fine-tuning pre-trained models on custom datasets" instead?
    (b) Under "Transfer Learning by Freezing Layers of BERT" section, elaborate further that because most layers are fixed, the model isn't able to learn much from the custom data, therefore the accuracy is poor (stuck at 50% for every epoch).
    (c) Might be good to highlight the accuracy difference between transfer learning and finetuning (50% vs 100%) and recommend finetuning for custom task and datasets.

@huiwen99
Copy link
Collaborator

huiwen99 commented Apr 11, 2024

For Unit 4:

  1. 4.01_Pruning_Quantization_KD.ipynb
    (a) Under each section, the original model and the new model looks the same when printed. Can we show the impact of each method more clearly? E.g. printing model size or inference speed.
    (b) We can also add that using these techniques might have an impact on accuracy, so we would need to weigh the trade-offs.
  2. 4.05_Installing_Docker_and_basic_commands.ipynb
    (a) Replace "hackathon" with "competition".
    (b) Under basic commands, include docker compose as participants are likely going to use it. If possible, can we have another submodule teaching how to use docker compose?
    (c) The docker build command should be docker build -t <image-name> . instead.
  3. 4.06_Simple_Webapp_using_Docker.ipynb
    (a) Add the different REST API calls (i.e. differences between get, post, delete, etc.) so that participants understand them better.
  4. 4.07_Writing_a_Dockerfile.ipynb
    (a) The example Dockerfile shown did not include EXPOSE 5000 but the following breakdown explanations included it.
    (b) Under the explanation for base image, might be good to link to other images commonly used when dockerizing ML models. Suggest to use a fixed base version of nvcr image that will be used in the competition.
  5. 4.08_Running_Containers_and_API_Inference.ipynb
    (a) The testing of API in Step 7 is unclear if participants wish to try different inputs (might not necessarily be simply numbers). We suggest including Postman in the module to test the API.

@WaseemSheriff
Copy link
Collaborator

For Unit 2:

  1. 2.04_Handling_clas_imbalances.ipynb
    (a) Under "Exploring the Impact of Class Imbalance" section, should include how class imbalance can affect model performance as mentioned in the paragraph. Best if the trained model (without class imbalance techniques) has performance that is significantly lower than a trained model with class imbalance techniques.
  • Updated the examples to illustrate the performance improvement from class imbalance techniques

(b) Under "Handling Imbalance with Class Weighting" section, is the header "Accounting for Class Imbalance in the Deep Learning Model" redundant? The paragraph below seems like an explanation for the code above. Same for "Incorporating SMOTE for Class Imbalance in Deep Learning" under "Handling Class Imbalance using SMOTE".

  • Removed the two redundant headers

(c) Would it be possible to plot a graph to illustrate how SMOTE works? Something like in this tutorial?

  • Good point, added
  1. 2.09_Transfer_Learning_vs_Finetuning.ipynb
    (a) Should this notebook be named "Fine-tuning pre-trained models on custom datasets" instead?
  • Renamed the file. My intial thought was the purpose is to demo fine-tuning vs transfer learning hands-on , hence the previous name

(b) Under "Transfer Learning by Freezing Layers of BERT" section, elaborate further that because most layers are fixed, the model isn't able to learn much from the custom data, therefore the accuracy is poor (stuck at 50% for every epoch).

  • I've added an explanation on this below the results

(c) Might be good to highlight the accuracy difference between transfer learning and finetuning (50% vs 100%) and recommend finetuning for custom task and datasets.

  • I've added an explanation comparing the two approaches and the recommendation.

  • Other changes: Convert 2.04 to PyTorch from TF. Rest of Unit 2 will also be changed to PyTorch in a future commit.

Hi @huiwen99 Thanks for the feedback. I've made the updates. I've commented the updates next to your original comment above.

@WaseemSheriff
Copy link
Collaborator

For Unit 4:

  1. 4.01_Pruning_Quantization_KD.ipynb
    (a) Under each section, the original model and the new model looks the same when printed. Can we show the impact of each method more clearly? E.g. printing model size or inference speed.
  • I've updated the examples to illustrate the impact of the techniques and difference in the models in terms of size

(b) We can also add that using these techniques might have an impact on accuracy, so we would need to weigh the trade-offs.

  • After each technique, added a short section called "Tradeoffs" that discusses this
  1. 4.05_Installing_Docker_and_basic_commands.ipynb
    (a) Replace "hackathon" with "competition".
  • Updated

(b) Under basic commands, include docker compose as participants are likely going to use it. If possible, can we have another submodule teaching how to use docker compose?

  • Added

(c) The docker build command should be docker build -t <image-name> . instead.

  • Updated
  1. 4.06_Simple_Webapp_using_Docker.ipynb
    (a) Add the different REST API calls (i.e. differences between get, post, delete, etc.) so that participants understand them better.
  • A new section "Understanding REST API Calls" has been included at the end of the notebook for this
  1. 4.07_Writing_a_Dockerfile.ipynb
    (a) The example Dockerfile shown did not include EXPOSE 5000 but the following breakdown explanations included it.
  • Updated and changed to 8000 instead

(b) Under the explanation for base image, might be good to link to other images commonly used when dockerizing ML models. Suggest to use a fixed base version of nvcr image that will be used in the competition.

  • Included this information. In addition to nvcr, added links to Pytorch & GCP Vertex AI official images
  1. 4.08_Running_Containers_and_API_Inference.ipynb
    (a) The testing of API in Step 7 is unclear if participants wish to try different inputs (might not necessarily be simply numbers). We suggest including Postman in the module to test the API.
  • Makes sense, I have added two additional sections for testing the API; one using requests library and other using Postman

Hi @huiwen99 Thanks for the feedback. I've made the updates. I've commented the updates next to your original comment above.

@huiwen99
Copy link
Collaborator

Thanks @WaseemSheriff ! Mostly looks good for this unit now.
Just a few more points:

  1. Under 4.2.1_Installing_Docker_and_basic_commands.ipynb, the docker build command is missing a period, and should be docker build -t <image-name> .
  2. Under 4.3.1_Writing_a_Dockerfile.ipynb, can we specify the use of LINUX/ARM64 architecture for the nvidia images?
  3. Under 4.3.2_Running_Containers_and_API_Inference.ipynb, the definition of environment variable in Dockerfile should be ENV MODEL_NAME=MyModel
  4. For 4.4.1_Deploy_pre-trained_model_on_GCP.ipynb, it is unclear how to run the notebook/code on GCP. Can we include a step-by-step guide with screenshots on this?

@ryan-tribex
Copy link
Collaborator Author

ryan-tribex commented Apr 16, 2024

Some replies to @huiwen99 's questions here:

  1. Under 4.2.1_Installing_Docker_and_basic_commands.ipynb, the docker build command is missing a period, and should be docker build -t <image-name> .

Good catch, obvious typo, fixed.

  1. Under 4.3.1_Writing_a_Dockerfile.ipynb, can we specify the use of LINUX/ARM64 architecture for the nvidia images?

I strongly disagree with point 2 for a couple reasons:

  1. The finals environment is an amd64 machine (as are most CPUs)
  2. The cloud environment is actually amd64 not arm64
  3. Setting arm64 would impede local testing on anything except Apple Silicon macs, and those have Rosetta for x86 compatibility
  4. It's also worth noting that the robomaster python library that we'd eventually need to use for the robotics autonomy doesn't have wheels compiled for arm64, which essentially forces us onto amd64 unless they give better instructions on how to compile their wheels. I found a workaround for this previously to enable using the robomaster SDK on Apple Silicon, but it's far from ideal.

So, if anything, if we wish to prescribe a CPU architecture, we should consider mandating that they be built for amd64 instead!

  1. Under 4.3.2_Running_Containers_and_API_Inference.ipynb, the definition of environment variable in Dockerfile should be ENV MODEL_NAME=MyModel

Similar to 1, straightforward fix, changed.

  1. For 4.4.1_Deploy_pre-trained_model_on_GCP.ipynb, it is unclear how to run the notebook/code on GCP. Can we include a step-by-step guide with screenshots on this?

These notebooks are actually running on the GCP JupyterLab instance, as the screenshots in the notebook indicate. But there's going to be additional notebooks/guides on how to get things running on GCP which should be in the 3.2.* notebooks. Would that address this concern?

@huiwen99
Copy link
Collaborator

My bad, yea let's stick to the AMD version.
As for the GCP notebook, the screenshots in the notebook are still broken image links for me so I'm unable to view them. But if there is going to be a tutorial on running code on GCP in 3.2.* then my concern would be addressed.

Thank you so much!

@ryan-tribex
Copy link
Collaborator Author

Oh, the image links are broken on the github notebook preview, they're fine if you checkout the branch locally and view the notebook

Example:
image

@ryan-tribex
Copy link
Collaborator Author

In that case, I think that should be everything for the general unit (pending 3.2.* which depends on the dev environment), so I'll close and merge this

@ryan-tribex ryan-tribex changed the title General AI ML Unit 1 General AI ML Apr 16, 2024
@ryan-tribex ryan-tribex merged commit acae882 into general-ai-ml Apr 16, 2024
ryan-tribex added a commit that referenced this pull request Apr 16, 2024
* initial commit

* initial commit

* General AI ML (#2)

* initial commit

* add Unit 1 files

* update Unit 1 files

* add Unit 2 files

* updates from Unit 1 feedback

* updates from Unit 1 feedback

* feat: rename fine-tuning notebook

* Unit 2 updates

* add Unit 4 files

* add Unit 4 files

* feat: add line about cloud docker

* feat: gcp boilerplate docker

* feat: update imports

* updates from Unit 2 feedback

* updates from Unit 2 feedback

* updates from Unit 4 feedback

* add Unit 4 files

* fix: 4.4.1 typos

* update file numbering system

* fix: typos

---------

Co-authored-by: waseem-ga <[email protected]>

---------

Co-authored-by: waseem-ga <[email protected]>
Co-authored-by: WaseemSheriff <[email protected]>
@ryan-tribex ryan-tribex deleted the general-ai-ml-wip branch April 20, 2024 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants