Add `--no_balance` flag to not balance datasets #287

AlexTMallen · 2023-08-22T21:13:38Z

No description provided.

lauritowal

Run with elk elicit gpt2 imdb --no_balance True --disable_cache --max_examples 100 100 --num_gpus 1 --max_inlp_iter 4 and seems to work.

Added some comments though

lauritowal · 2023-08-31T16:46:11Z

elk/extraction/extraction.py

@@ -65,6 +65,9 @@ class Extract(Serializable):
    binarize: bool = False
    """Whether to binarize the dataset labels for multi-class datasets."""

+    no_balance: bool = False


Why not just make it
balance: bool = True ?

That would also avoid having that:
balance=not cfg.no_balance

Because it would be unclear how to use the flag to disable balancing from the CLA. --balance False or something is weirder than --no_balance

--balance False does not seem weirder than --no_balance True to me.
But okay, it's fine for me

Yeah I think I agree with you now

lauritowal · 2023-08-31T16:55:04Z

elk/training/classifier.py

@@ -212,12 +212,11 @@ def inlp(
        p = y.float().mean()
        H = -p * torch.log(p) - (1 - p) * torch.log(1 - p)

-        if max_iter is not None:
-            d = min(d, max_iter)
+        max_iter = max_iter or d


That's just some refactoring which has nothing to do with the balancing I guesS?

right, I also added a max_iter flag and this was a necessary refactoring

lauritowal · 2023-08-31T16:55:44Z

elk/training/supervised.py

@@ -6,7 +6,7 @@


 def train_supervised(
-    data: dict[str, tuple], device: str, mode: str
+    data: dict[str, tuple], device: str, mode: str, max_inlp_iter: int | None = None


that's a new feature not related to the balancing either, right?

AlexTMallen added 4 commits August 22, 2023 21:09

flag

4ea0c40

add inlp max iterations

ec63dfd

add check for 0 val examples

1ed44ef

Merge branch 'main' into no-balance

999c9e5

AlexTMallen requested review from lauritowal and norabelrose August 23, 2023 16:48

lauritowal reviewed Aug 31, 2023

View reviewed changes

lauritowal approved these changes Aug 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--no_balance` flag to not balance datasets #287

Add `--no_balance` flag to not balance datasets #287

AlexTMallen commented Aug 22, 2023

lauritowal left a comment

lauritowal Aug 31, 2023

lauritowal Aug 31, 2023

AlexTMallen Aug 31, 2023

lauritowal Aug 31, 2023

AlexTMallen Sep 21, 2023

lauritowal Aug 31, 2023

AlexTMallen Aug 31, 2023

lauritowal Aug 31, 2023

Add --no_balance flag to not balance datasets #287

Are you sure you want to change the base?

Add --no_balance flag to not balance datasets #287

Conversation

AlexTMallen commented Aug 22, 2023

lauritowal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `--no_balance` flag to not balance datasets #287

Add `--no_balance` flag to not balance datasets #287