New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Tuning Palm on Vertex AI #395

Merged

BenoitDherin merged 7 commits into master from genai-palm-tuning-benoit

Sep 25, 2023

Collaborator

BenoitDherin commented Sep 14, 2023 •

edited

Loading

Tuning Vertex Palm requires the newest SDK, so I bumped the google-cloud-aiplatform from 1.26.0 to 1.33.1. This has had no side effect on any of the already installed packages (see pip list diff below).

The notebook also requires the evaluate library from huggingface to evaluate the tuned model output. Importing it the standard way has for side effects to install new packages (harmless) but it also upgrades the dill library to a version that's incompatible with the version Apache Beam requires, which is an issue. This is valid for the latest version of evaluate and the previous version too, because dill is not pinned in their setup.py.
To solve that, I created a separate requirments-without-deps.txt that allows us to install packages without having their dependency installed or upgraded (which means also we need to list them by hand in that same file). make install now install both what's in requirements.txt (with their dependecy) and what's in requirements-without-deps.txt (without their dependencies).

To test the notebook, run make install from this branch. You shouldn't see any conflict.

The diff between our previous setup and the new one if given by the diff below:

53a54
> datasets                                 2.14.5
66a68
> evaluate                                 0.4.0
93c95
< google-cloud-aiplatform                  1.26.0
---
> google-cloud-aiplatform                  1.33.1
131a134
> huggingface-hub                          0.17.2
201a205
> multiprocess                             0.70.15
210a215
> nltk                                     3.8.1
293a299
> responses                                0.18.0
297a304
> rouge-score                              0.1.2
382a390
> xxhash                                   3.3.0

BenoitDherin added 2 commits

September 14, 2023 23:04


          initial import

3613eeb


          precommit

342d594

review-notebook-app bot commented Sep 14, 2023

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenoitDherin marked this pull request as draft

September 14, 2023 23:26

BenoitDherin added 3 commits

September 19, 2023 21:59


          precommit

da16ce2


          precommit

5b4e1d5


          precommit

d2b97c3

BenoitDherin requested review from takumiohym, kylesteckler and sanjanalreddy and removed request for kylesteckler and sanjanalreddy

September 20, 2023 23:14

BenoitDherin self-assigned this

BenoitDherin marked this pull request as ready for review

September 20, 2023 23:16

takumiohym reviewed

View reviewed changes

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

google-cloud-aiplatform is included in the requirements.txt, but newer versions are required? If so, please add an information about the version requirement in markdown.

(Or I think it's ok to update the requirements.txt itself if it doesn't affect other dependencies.)

And it seems sequence-evaluate is not very widely used? (6 stars.)

How about switching it to more popular one like Hugging Face evaluate library?

https://github.com/huggingface/evaluate/tree/main

https://medium.com/@sthanikamsanthosh1994/understanding-bleu-and-rouge-score-for-nlp-evaluation-1ab334ecadcb

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Good catch! Done!

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

I think this bigquery magic is simpler to do the same thing.

%%bigquery df

https://notebook.community/GoogleCloudPlatform/python-docs-samples/notebooks/tutorials/bigquery/BigQuery%20query%20magic

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Great! Done!

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

Could you include the explanation on the key name conventions (input_text and output_text ) since it seems they are fixed name?

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Done! I also added a description of the JSONL format needed to tune.

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

Maybe better to include the expected training time.

(More than a couple of hours?)

Also, adding an info that the job runs on Vertex AI Pipelines would be helpful.

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Good catch! For 100 steps, it's 1h. For 500, it's 3:30 for me. Done!

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

It seems us-central1 is also another option now (but using different environment)

https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models

Also, if the training is in europe-west4 , maybe the bucket should also be in the same region as a best practice? (although the data is small in this case)

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Great! It seems to work with all in us-central1. Done!

Collaborator Author

BenoitDherin Sep 21, 2023 •

edited

Loading

Unfortunately, the LLM tuning part of the pipeline just failed. It doesnt seem to work if not set to europe-west4 (as opposed to us-central1) quite yet. This is certainly temporary, but will keep it this way for now

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

Blue -> BLEU (usual in upper case)

Rouge -> ROUGE

I think the BLEU and ROUGE should be elaborated a bit more. The descriptions looks almost the same.

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Good catch! Done!

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

Blue -> BLEU

Rouge -> ROUGE

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Done!

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

The same comment as the one on the top. I think it's better to use more popular library (which will usually be well-maintained).

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Totally agree. Good catch. Done!

takumiohym reviewed

View reviewed changes

notebooks/vertex_genai/solutions/vertex_llm_tuning.ipynb

		@@ -0,0 +1,1001 @@
		{

Collaborator

takumiohym Sep 21, 2023 •

edited

Loading

It seems the original notebook specifies 100 training steps, but do you think we should use 500?

Given it takes a lot of times, I think it's better to keep it small if it can get reasonable result with it.

Reply via ReviewNB

Collaborator Author

BenoitDherin Sep 21, 2023

Reverted to 100. Agreed. Done!

BenoitDherin added 2 commits

September 21, 2023 22:10


          precommit

afe4dcd


          precommit

884d21e

BenoitDherin merged commit 1b72928 into master

4 checks passed

BenoitDherin deleted the genai-palm-tuning-benoit branch

September 25, 2023 16:51

BenoitDherin restored the genai-palm-tuning-benoit branch

September 25, 2023 16:51

BenoitDherin deleted the genai-palm-tuning-benoit branch

September 25, 2023 21:44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet