Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning Palm on Vertex AI #395

Merged
merged 7 commits into from
Sep 25, 2023
Merged

Tuning Palm on Vertex AI #395

merged 7 commits into from
Sep 25, 2023

Conversation

BenoitDherin
Copy link
Collaborator

@BenoitDherin BenoitDherin commented Sep 14, 2023

Tuning Vertex Palm requires the newest SDK, so I bumped the google-cloud-aiplatform from 1.26.0 to 1.33.1. This has had no side effect on any of the already installed packages (see pip list diff below).

The notebook also requires the evaluate library from huggingface to evaluate the tuned model output. Importing it the standard way has for side effects to install new packages (harmless) but it also upgrades the dill library to a version that's incompatible with the version Apache Beam requires, which is an issue. This is valid for the latest version of evaluate and the previous version too, because dill is not pinned in their setup.py.
To solve that, I created a separate requirments-without-deps.txt that allows us to install packages without having their dependency installed or upgraded (which means also we need to list them by hand in that same file). make install now install both what's in requirements.txt (with their dependecy) and what's in requirements-without-deps.txt (without their dependencies).

To test the notebook, run make install from this branch. You shouldn't see any conflict.

The diff between our previous setup and the new one if given by the diff below:

53a54
> datasets                                 2.14.5
66a68
> evaluate                                 0.4.0
93c95
< google-cloud-aiplatform                  1.26.0
---
> google-cloud-aiplatform                  1.33.1
131a134
> huggingface-hub                          0.17.2
201a205
> multiprocess                             0.70.15
210a215
> nltk                                     3.8.1
293a299
> responses                                0.18.0
297a304
> rouge-score                              0.1.2
382a390
> xxhash                                   3.3.0

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BenoitDherin BenoitDherin marked this pull request as draft September 14, 2023 23:26
@BenoitDherin BenoitDherin self-assigned this Sep 20, 2023
@BenoitDherin BenoitDherin marked this pull request as ready for review September 20, 2023 23:16
@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google-cloud-aiplatform is included in the requirements.txt, but newer versions are required? If so, please add an information about the version requirement in markdown.

(Or I think it's ok to update the requirements.txt itself if it doesn't affect other dependencies.)

And it seems sequence-evaluate is not very widely used? (6 stars.)

How about switching it to more popular one like Hugging Face evaluate library?

https://github.com/huggingface/evaluate/tree/main

https://medium.com/@sthanikamsanthosh1994/understanding-bleu-and-rouge-score-for-nlp-evaluation-1ab334ecadcb


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bigquery magic is simpler to do the same thing.

%%bigquery df

https://notebook.community/GoogleCloudPlatform/python-docs-samples/notebooks/tutorials/bigquery/BigQuery%20query%20magic


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include the explanation on the key name conventions (input_text and output_text ) since it seems they are fixed name?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I also added a description of the JSONL format needed to tune.

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to include the expected training time.

(More than a couple of hours?)

Also, adding an info that the job runs on Vertex AI Pipelines would be helpful.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! For 100 steps, it's 1h. For 500, it's 3:30 for me. Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems us-central1 is also another option now (but using different environment)

https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models

Also, if the training is in europe-west4 , maybe the bucket should also be in the same region as a best practice? (although the data is small in this case)


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! It seems to work with all in us-central1. Done!

Copy link
Collaborator Author

@BenoitDherin BenoitDherin Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the LLM tuning part of the pipeline just failed. It doesnt seem to work if not set to europe-west4 (as opposed to us-central1) quite yet. This is certainly temporary, but will keep it this way for now

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blue -> BLEU (usual in upper case)

Rouge -> ROUGE

I think the BLEU and ROUGE should be elaborated a bit more. The descriptions looks almost the same.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blue -> BLEU

Rouge -> ROUGE


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as the one on the top. I think it's better to use more popular library (which will usually be well-maintained).


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. Good catch. Done!

@@ -0,0 +1,1001 @@
{
Copy link
Collaborator

@takumiohym takumiohym Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the original notebook specifies 100 training steps, but do you think we should use 500?

Given it takes a lot of times, I think it's better to keep it small if it can get reasonable result with it.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to 100. Agreed. Done!

@BenoitDherin BenoitDherin merged commit 1b72928 into master Sep 25, 2023
4 checks passed
@BenoitDherin BenoitDherin deleted the genai-palm-tuning-benoit branch September 25, 2023 16:51
@BenoitDherin BenoitDherin restored the genai-palm-tuning-benoit branch September 25, 2023 16:51
@BenoitDherin BenoitDherin deleted the genai-palm-tuning-benoit branch September 25, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants