Add Fine-Tune BERT LLM Example #2021

andreyvelich · 2024-03-10T01:48:00Z

I added example to fine-tune BERT model with Yelp review dataset.
This example is based on this tutorial: https://huggingface.co/docs/transformers/en/training
I used split_dataset_by_node to distribute data across PyTorchJob workers.

Also, I moved our SDK examples to these folders: text-classification, image-classification, language-modeling.
Similar to HuggingFace transformers examples: https://github.com/huggingface/transformers/tree/main/examples/pytorch

I think, we should refactor our Training Operator examples directory in the future PRs to make it more ML engineers and data scientists friendly.
Regardless if users are using Kubeflow Python SDK or kubectl and YAMLs to submit jobs they should be able to find the appropriate AI/ML training use-cases.

From my perspective, for the long-term we should guide our users more towards Python SDK to submit Kubeflow Jobs.

Please let me know what do you think @johnugeorge @gaocegege @terrytangyuan @kuizhiqing @deepanker13 @Jeffwan @droctothorpe @tenzen-y @StefanoFioravanzo

/hold for review

review-notebook-app · 2024-03-10T01:48:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Signed-off-by: Andrey Velichkevich <[email protected]>

coveralls · 2024-03-10T01:52:35Z

Pull Request Test Coverage Report for Build 8237084497

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 42.896%

Totals
Change from base Build 8214367884:	0.02%
Covered Lines:	3756
Relevant Lines:	8756

💛 - Coveralls

PeterWrighten · 2024-03-10T06:11:40Z

examples/pytorch/image-classification/Train CNN with FashionMNIST.ipynb

Is it fine to use space blank in filename?

For Notebook files it should be normal, but if we want, we can name files without space blanks.

kuizhiqing · 2024-03-11T06:48:20Z

Happy to see it, thank @andreyvelich for driving this. Maybe we can make a roadmap for the examples and I'd like to contribute too.

One thing from my personal perspective is that if we dedicate to the notebook format or we can support pure python code with markdown or something like that?

tenzen-y · 2024-03-11T07:38:19Z

Happy to see it, thank @andreyvelich for driving this. Maybe we can make a roadmap for the examples and I'd like to contribute too.

+1
Let's create a dedicated issue.

tenzen-y · 2024-03-11T07:52:36Z

examples/pytorch/text-classification/Fine Tune BERT LLM.ipynb

@@ -0,0 +1,683 @@
+{


Line #8. num_workers=3, # Number of PyTorch workers to use.
Should we replace this with 1 aligned with the above comment:

At least 1 GPU on your Kubernetes cluster to fine-tune BERT model on 1 worker.

Or, should we update the above comment?

Reply via ReviewNB

Sure, let me change it to 3.

tenzen-y · 2024-03-11T07:52:36Z

examples/pytorch/text-classification/Fine Tune BERT LLM.ipynb

@@ -0,0 +1,683 @@
+{


Line #9. good_review = "This is one of the best restaurants. I've ever been to."

Reply via ReviewNB

Nice catch, however LLM can easily recognize sentences without full stop 😃
Also, I believe this sentence is grammatically correct, isn't ?

This is one of the best restaurants I've ever been to

It makes sense.
Also, Grammarly says this is fine as well :)

andreyvelich · 2024-03-11T16:00:45Z

Maybe we can make a roadmap for the examples and I'd like to contribute too.

That's good idea, do you want to start issue to identify examples that we want to add (e.g. text-generation, speech recognition, and more)?

One thing from my personal perspective is that if we dedicate to the notebook format or we can support pure python code with markdown or something like that?

I think, we have pros and cons of using Notebook files vs Python files. For example, it's easier to implement E2Es test for Python files, but Notebooks are usually more Data Scientists friendly. I an open to your suggestions @tenzen-y @kuizhiqing @johnugeorge.

Signed-off-by: Andrey Velichkevich <[email protected]>

tenzen-y · 2024-03-11T18:08:50Z

I think, we have pros and cons of using Notebook files vs Python files. For example, it's easier to implement E2Es test for Python files, but Notebooks are usually more Data Scientists friendly. I an open to your suggestions @tenzen-y @kuizhiqing @johnugeorge.

Anyway, I think that we should have a dedicated issue.

tenzen-y

/lgtm
/approve

google-oss-prow · 2024-03-11T18:09:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2024-03-12T10:26:25Z

/assign @kuizhiqing @johnugeorge @deepanker13 For the final review

kuizhiqing · 2024-03-12T11:03:29Z

/lgtm

johnugeorge · 2024-03-12T11:34:24Z

We will add a similar one using Train API
/hold cancel

* Add Fine-Tune BERT LLM Example Signed-off-by: Andrey Velichkevich <[email protected]> * Add 3 GPUs in Notebook requirements Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]>

* Add Fine-Tune BERT LLM Example Signed-off-by: Andrey Velichkevich <[email protected]> * Add 3 GPUs in Notebook requirements Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]> Signed-off-by: deepanker13 <[email protected]>

* Add Fine-Tune BERT LLM Example Signed-off-by: Andrey Velichkevich <[email protected]> * Add 3 GPUs in Notebook requirements Signed-off-by: Andrey Velichkevich <[email protected]> --------- Signed-off-by: Andrey Velichkevich <[email protected]>

google-oss-prow bot added the do-not-merge/hold label Mar 10, 2024

google-oss-prow bot requested a review from jinchihe March 10, 2024 01:48

google-oss-prow bot requested a review from kuizhiqing March 10, 2024 01:48

google-oss-prow bot added approved size/XL labels Mar 10, 2024

Add Fine-Tune BERT LLM Example

1a48a0c

Signed-off-by: Andrey Velichkevich <[email protected]>

andreyvelich force-pushed the add-example-fine-tune-llm branch from 9c14096 to 1a48a0c Compare March 10, 2024 01:51

PeterWrighten reviewed Mar 10, 2024

View reviewed changes

tenzen-y reviewed Mar 11, 2024

View reviewed changes

Add 3 GPUs in Notebook requirements

813cb07

Signed-off-by: Andrey Velichkevich <[email protected]>

tenzen-y reviewed Mar 11, 2024

View reviewed changes

google-oss-prow bot assigned tenzen-y Mar 11, 2024

google-oss-prow bot added the lgtm label Mar 11, 2024

google-oss-prow bot assigned kuizhiqing Mar 12, 2024

google-oss-prow bot removed the do-not-merge/hold label Mar 12, 2024

google-oss-prow bot merged commit 395e8ca into kubeflow:master Mar 12, 2024
37 checks passed

andreyvelich deleted the add-example-fine-tune-llm branch March 12, 2024 12:02

andreyvelich mentioned this pull request Mar 15, 2024

Modify LLM Trainer to support BERT and Tiny LLaMA #2031

Merged

andreyvelich mentioned this pull request Mar 29, 2024

Add more AI/ML Training Examples #2040

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Fine-Tune BERT LLM Example #2021

Add Fine-Tune BERT LLM Example #2021

andreyvelich commented Mar 10, 2024 •

edited

Loading

review-notebook-app bot commented Mar 10, 2024

coveralls commented Mar 10, 2024 •

edited

Loading

PeterWrighten Mar 10, 2024

andreyvelich Mar 11, 2024

kuizhiqing commented Mar 11, 2024

tenzen-y commented Mar 11, 2024

tenzen-y Mar 11, 2024 •

edited

Loading

andreyvelich Mar 11, 2024

tenzen-y Mar 11, 2024 •

edited

Loading

andreyvelich Mar 11, 2024 •

edited

Loading

tenzen-y Mar 11, 2024

andreyvelich commented Mar 11, 2024

tenzen-y commented Mar 11, 2024

tenzen-y left a comment

google-oss-prow bot commented Mar 11, 2024

andreyvelich commented Mar 12, 2024

kuizhiqing commented Mar 12, 2024

johnugeorge commented Mar 12, 2024

Add Fine-Tune BERT LLM Example #2021

Add Fine-Tune BERT LLM Example #2021

Conversation

andreyvelich commented Mar 10, 2024 • edited Loading

review-notebook-app bot commented Mar 10, 2024

coveralls commented Mar 10, 2024 • edited Loading

Pull Request Test Coverage Report for Build 8237084497

Details

💛 - Coveralls

PeterWrighten Mar 10, 2024

Choose a reason for hiding this comment

andreyvelich Mar 11, 2024

Choose a reason for hiding this comment

kuizhiqing commented Mar 11, 2024

tenzen-y commented Mar 11, 2024

tenzen-y Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

andreyvelich Mar 11, 2024

Choose a reason for hiding this comment

tenzen-y Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

andreyvelich Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

tenzen-y Mar 11, 2024

Choose a reason for hiding this comment

andreyvelich commented Mar 11, 2024

tenzen-y commented Mar 11, 2024

tenzen-y left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Mar 11, 2024

andreyvelich commented Mar 12, 2024

kuizhiqing commented Mar 12, 2024

johnugeorge commented Mar 12, 2024

andreyvelich commented Mar 10, 2024 •

edited

Loading

coveralls commented Mar 10, 2024 •

edited

Loading

tenzen-y Mar 11, 2024 •

edited

Loading

tenzen-y Mar 11, 2024 •

edited

Loading

andreyvelich Mar 11, 2024 •

edited

Loading