detect task type #337

donglihe-hub · 2023-11-03T13:51:31Z

What does this PR do?

Align with nn
Reproduce results

(Align with nn) Now models will remember the task type (multiclass=True or False) after training. The task type will then be used during testing process.
Raise an error when users try to train binary/multiclass models with "tree".
Raise an error when the ratio of unlabeled instances is larger than 10% in the training dataset.

1. Reproduction Results (Using Codes from this PR):

Linear	ECtHR (A) (unlabeled)	ECtHR (B) (unlabeled)	SCOTUS	EUR-LEX	LEDGAR	UNFAIR-ToS (unlabeled)
one-vs-rest	69.6/54.6	75.6/68.8	78.1/68.9	72.0/55.4	86.4/80.0	70.1/72.3
threshold	74.1/68.4	77.2/73.6	79.1/71.8	74.6/62.3	86.2/79.5	75.7/77.1
cost-sensitive	72.9/63.6	76.8/72.2	78.3/71.5	73.4/60.5	86.2/80.1	74.4/75.4

Metric format: Micro-F1/Macro-F1

Results in Yuchen's Paper

2. Causes of the Differences in Unfair-ToS

(Will not fix) There is a huge gap in the results of unfair-tos because currently LibMultiLabel doesn't support unlabeled samples during the test process.

~~(TODO) Will check if Yuchen's method is reasonable. Will figure out a workaround in the latest LibMultiLabel.~~

Currently, multiclass datasets using train_{1vsrest, cost_sensitive, etc.} will not be correctly recognized as multiclass datasets. As a result, the result of Micro-F1 is not equal to P@1.
Now the task type will be determined using the function from commom_utils.

* (A bit more) The task type can either be detected automatically (set "is_multilabel" to "auto" in config file) or be provided by users (Li-Chung will explain the motivation)

Test CLI & API (`bash tests/autotest.sh`)

Test APIs used by main.py.

Test Pass
- (Copy and paste the last outputted line here.)
Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

API document is updated (linear, nn)
Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

donglihe-hub requested review from cjlin1, Eleven1Liu, henryyang42, JamesLYC88 and Gordon119 as code owners November 3, 2023 13:51

detect task type

e65dc32

donglihe-hub force-pushed the task branch from a278bc8 to e65dc32 Compare November 3, 2023 13:52

rewrite the logics of task type detection

8326176

donglihe-hub force-pushed the task branch from aecb787 to b5bab98 Compare November 29, 2023 13:46

fix arg missing issue

8fa186e

donglihe-hub force-pushed the task branch 2 times, most recently from 09ca71a to d0f433e Compare November 30, 2023 19:42

rewrite for reproduction purpose only

173ee1b

donglihe-hub force-pushed the task branch from d0f433e to 173ee1b Compare November 30, 2023 19:44

donglihe-hub mentioned this pull request Nov 30, 2023

Allow Users to Specify Data Type (is_multilabel) in Config File #341

Draft

4 tasks

is_multilabel is renamed multiclass

b856915

donglihe-hub force-pushed the task branch from abfb587 to b856915 Compare December 1, 2023 12:52

donglihe-hub added 2 commits December 1, 2023 17:24

raise error when unlabeled ratio > 0.1

05d51f9

add default value for multiclass

d8d6cd4

donglihe-hub force-pushed the task branch from dfb5f06 to d8d6cd4 Compare December 1, 2023 13:43

warn unlabeled ratio instead of raising error

85854d3

cjlin1 approved these changes Dec 7, 2023

View reviewed changes

cjlin1 merged commit ec603a8 into ASUS-AICS:master Dec 7, 2023
1 check passed

donglihe-hub deleted the task branch December 11, 2023 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect task type #337

detect task type #337

donglihe-hub commented Nov 3, 2023 •

edited

Loading

detect task type #337

detect task type #337

Conversation

donglihe-hub commented Nov 3, 2023 • edited Loading

What does this PR do?

1. Reproduction Results (Using Codes from this PR):

Results in Yuchen's Paper

2. Causes of the Differences in Unfair-ToS

Test CLI & API (bash tests/autotest.sh)

Check API Document

Test quickstart & API (bash tests/docs/test_changed_document.sh)

donglihe-hub commented Nov 3, 2023 •

edited

Loading

Test CLI & API (`bash tests/autotest.sh`)

Test quickstart & API (`bash tests/docs/test_changed_document.sh`)