Causal Classification Problem Type #449

psinger · 2023-10-16T12:34:04Z

New Problem Type for supporting Causal Classification.

Workflow:

Answer column contains integer encoder binary or multiclass target
Problem Type is Causal Classification Modeling
Metrics are AUC and Accuracy
Number of classes need to be selected via cfg
Pushing to HF and subsequent loading with additional head supported

llm_studio/app_utils/hugging_face_utils.py

llm_studio/src/datasets/text_causal_language_modeling_ds.py

model_cards/text_causal_classification_experiment_summary_card_template.md

llm_studio/src/losses/text_causal_classification_modeling_losses.py

maxjeblick

Thanks a lot for adding the classification problem type! The code looks good (I've left some minor comments), and training also works as expected.

As the setup (loss-metric-num classes) is currently manual, I'd suggest to add some sanity checks (that raise an exception) in the code to check for user input errors. At some later point, we can potentially add automatic handling of the settings, based on the input data. I've ran into the following issues:

Using a dataset where target values are [1, 2, 3 ,4, 5]. num_classes needs to be set to 6, as they are 0-indexed.
Forgot to change metrics/loss accordingly. For metric, experiment will fail at 100%, only.

Apart from that, we probably want to show the predicted class in the validation prediction insights and also add it to validation_predictions.csv.

Some smaller things (can also be addressed later):

Tokenized input text also shows the classification label.
We probably want to hide the Chat Window (funny conversation though :D )

psinger · 2023-10-19T09:43:40Z

Thanks @maxjeblick - will address.

I discussed with @pascal-pfeiffer that we can expect here for users to provide the correct inputs, but I agree that exceptions should be more clear.

llm_studio/python_configs/text_causal_classification_modeling_config.py

maxjeblick

LGTM, thanks a lot for the implementation!

I've left a small comment about ConfigNLPCausalClassificationPrediction, apart from that it looks good!

…nto psi/classification

samvelkoch · 2023-11-05T10:29:23Z

Hello. I played a bit with LLM Studio in H20 Aquarium while solving H2O predict LLM Competition. Models fine-tuned for Casual Classification Problems have no classification_head.pth in downloaded or pushed to HF model files. Any ideas what I have missed in the settings or why that issue happens? Thank you!

maxjeblick · 2023-11-05T11:13:00Z

Hi, @samvelkoch, I'll have a look into this.
Could you create a new issue for better tracking? Thanks!

samvelkoch · 2023-11-05T11:17:27Z

Hi, @samvelkoch, I'll have a look into this. Could you create a new issue for better tracking? Thanks!

Thanks for quick reply. I've published it here since I'm not sure that it is a real issue or just my lack of experience with LLM Studio. Could you please check? I'll publish an issue now.

psinger and others added 5 commits October 16, 2023 12:33

first implementation

d34cddb

Merge branch 'main' into psi/classification

31999bf

updates

4d07203

format

c3cbec5

cfg

e70f88b

psinger marked this pull request as ready for review October 17, 2023 15:46

maxjeblick reviewed Oct 18, 2023

View reviewed changes

llm_studio/app_utils/hugging_face_utils.py Outdated Show resolved Hide resolved

maxjeblick reviewed Oct 18, 2023

View reviewed changes

llm_studio/src/datasets/text_causal_language_modeling_ds.py Outdated Show resolved Hide resolved

maxjeblick reviewed Oct 18, 2023

View reviewed changes

model_cards/text_causal_classification_experiment_summary_card_template.md Show resolved Hide resolved

maxjeblick reviewed Oct 18, 2023

View reviewed changes

llm_studio/src/losses/text_causal_classification_modeling_losses.py Show resolved Hide resolved

maxjeblick suggested changes Oct 19, 2023

View reviewed changes

feedback

25244e5

psinger requested a review from maxjeblick October 19, 2023 12:25

psinger and others added 2 commits October 19, 2023 12:33

import

88e232e

Merge branch 'main' into psi/classification

2b21dc0

maxjeblick reviewed Oct 23, 2023

View reviewed changes

llm_studio/python_configs/text_causal_classification_modeling_config.py Show resolved Hide resolved

maxjeblick approved these changes Oct 23, 2023

View reviewed changes

psinger added 2 commits October 23, 2023 12:51

readme

8a1108f

Merge branch 'psi/classification' of github.com:h2oai/h2o-llmstudio i…

2f1f0bb

…nto psi/classification

psinger merged commit 08475e3 into main Oct 23, 2023
5 checks passed

psinger deleted the psi/classification branch October 23, 2023 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal Classification Problem Type #449

Causal Classification Problem Type #449

psinger commented Oct 16, 2023 •

edited

Loading

maxjeblick left a comment

psinger commented Oct 19, 2023

maxjeblick left a comment

samvelkoch commented Nov 5, 2023

maxjeblick commented Nov 5, 2023

samvelkoch commented Nov 5, 2023

Causal Classification Problem Type #449

Causal Classification Problem Type #449

Conversation

psinger commented Oct 16, 2023 • edited Loading

maxjeblick left a comment

Choose a reason for hiding this comment

psinger commented Oct 19, 2023

maxjeblick left a comment

Choose a reason for hiding this comment

samvelkoch commented Nov 5, 2023

maxjeblick commented Nov 5, 2023

samvelkoch commented Nov 5, 2023

psinger commented Oct 16, 2023 •

edited

Loading