Issue250 support backend param option in train and learn commands #289

juhoinkinen · 2019-07-02T16:22:05Z

This closes #250.

…ackend

juhoinkinen · 2019-07-02T16:25:11Z

A few question on the way backend parameters are handled.

Here https://github.com/NatLibFi/Annif/blob/master/annif/backend/backend.py#L42 the beparams are first taken from the object's attribute, then updated if there are params coming from CLI option, and finally passed in _suggest call. However, for limit at least TFIDF and fastText backends use the original params attribute, not the passed params variable:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/tfidf.py#L54
https://github.com/NatLibFi/Annif/blob/master/annif/backend/fasttext.py#L116

Also params for the actual fastText model are taken from attribute:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/fasttext.py#L101

However, the chunksize is taken from passed params:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/mixins.py#L23

Is the intention that limit should not be overriden by --backend-param, or can that be changed to come from the passed params?

Overall, is passing params variable necessary, as updating the object's params attribute would seem simpler? E.g. at line https://github.com/NatLibFi/Annif/blob/master/annif/backend/backend.py#L44 (Or would this affect non-CLI usage?)

codecov · 2019-07-02T16:43:37Z

Codecov Report

Merging #289 into master will decrease coverage by <.01%.
The diff coverage is 99.45%.

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
- Coverage   99.37%   99.37%   -0.01%     
==========================================
  Files          59       59              
  Lines        3532     3656     +124     
==========================================
+ Hits         3510     3633     +123     
- Misses         22       23       +1

Impacted Files	Coverage Δ
annif/cli.py	`99.54% <100%> (+0.02%)`	⬆️
annif/backend/fasttext.py	`98.79% <100%> (+0.01%)`	⬆️
annif/backend/dummy.py	`100% <100%> (ø)`	⬆️
annif/project.py	`100% <100%> (ø)`	⬆️
tests/test_backend_tfidf.py	`100% <100%> (ø)`	⬆️
tests/test_backend_pav.py	`100% <100%> (ø)`	⬆️
annif/backend/vw_multi.py	`97.48% <100%> (ø)`	⬆️
annif/backend/nn_ensemble.py	`100% <100%> (ø)`	⬆️
tests/test_backend_omikuji.py	`100% <100%> (ø)`	⬆️
tests/test_backend_nn_ensemble.py	`100% <100%> (ø)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ec36c8...5b4bb73. Read the comment docs.

lgtm-com · 2019-07-02T17:16:12Z

This pull request introduces 2 alerts when merging f23417c into 2ea1130 - view on LGTM.com

new alerts:

2 for Unused local variable

osma · 2019-07-03T07:53:31Z

You found some genuine inconsistencies there :)

A few thoughts about how parameter handling in backends should be done, and a comparison with the current (somewhat broken) situation.

in backend classes, self.params should contain the parameters defined in projects.cfg (as is already the case) - perhaps this field could be renamed to something a bit more specific, e.g. self.config_params
suggest, train and learn methods should take a params attribute (only suggest has it currently); this would be used to pass parameters set via CLI options
the backends should not use self.params (or self.config_params if renamed) directly; instead, the passed params attribute should override parameters from the config file (as is done here - but perhaps there could be a helper method that does this)
backend code should not assume that a particular parameter always exists; for example, params['limit'] should be changed to params.get('limit', self.DEFAULT_LIMIT); see Default values for configuration settings #273 for some thoughts on this

More generally, I think it would be helpful if backend classes explicitly defined the backend-specific (hyper)parameters they need. This information could be used e.g. for hyperparameter optimization (see #240). fasttext, vw_multi and vw_ensemble already sort of do this in their own way, but the parameter definitions should be standardized and probably hyperopt would need more information about the parameters - for example the minimum and maximum values. This can be left for later, but it's probably useful to keep in mind when parameter handling code is modified.

lgtm-com · 2019-07-04T15:25:28Z

This pull request introduces 1 alert when merging 53bf269 into 2ea1130 - view on LGTM.com

new alerts:

1 for Conflicting attributes in base classes

lgtm-com · 2019-07-05T09:22:06Z

This pull request introduces 1 alert when merging f4fc1d0 into 2ea1130 - view on LGTM.com

new alerts:

1 for Conflicting attributes in base classes

lgtm-com · 2019-07-10T08:53:50Z

This pull request introduces 1 alert when merging 954a1e7 into 26540e0 - view on LGTM.com

new alerts:

1 for Conflicting attributes in base classes

lgtm-com · 2019-07-10T16:17:52Z

This pull request introduces 1 alert when merging 3ec8d7f into 26540e0 - view on LGTM.com

new alerts:

1 for Conflicting attributes in base classes

lgtm-com · 2019-12-12T09:32:19Z

This pull request introduces 1 alert when merging 3d35f33 into 6f4cbef - view on LGTM.com

new alerts:

1 for Non-iterable used in for loop

lgtm-com · 2019-12-12T15:37:59Z

This pull request introduces 1 alert when merging 272f453 into 6f4cbef - view on LGTM.com

new alerts:

1 for Non-iterable used in for loop

osma

Sorry about the conflict, most likely caused by adding the collapse_every_n_layers hyperparameter to Omikuji in PR #371. It should be simple to fix.

This looks very good. My only issue is the cli_params parameter name that is used for AnnifProject methods. The parameters could come from another source than the CLI (for example, REST API methods could provide a way to override some parameters in the future), so I suggest renaming it to something more neutral such as be_params.

…rain-and-learn-commands

lgtm-com · 2019-12-16T20:47:16Z

This pull request introduces 1 alert and fixes 1 when merging 74c8690 into 3ec36c8 - view on LGTM.com

new alerts:

1 for Non-iterable used in for loop

fixed alerts:

1 for Unused import

lgtm-com · 2019-12-17T09:37:56Z

This pull request introduces 1 alert and fixes 1 when merging fc967b9 into 3ec36c8 - view on LGTM.com

new alerts:

1 for Non-iterable used in for loop

fixed alerts:

1 for Unused import

lgtm-com · 2019-12-17T11:56:44Z

This pull request introduces 1 alert and fixes 1 when merging 5b4bb73 into 3ec36c8 - view on LGTM.com

new alerts:

1 for Non-iterable used in for loop

fixed alerts:

1 for Unused import

juhoinkinen added 2 commits July 2, 2019 16:03

Expections and test for trying to override a param of a nonexistent b…

0fba48f

…ackend

Refactor validation to own function

f23417c

juhoinkinen added 11 commits July 3, 2019 12:43

Change field name in backend classes: params -> config_params

5aa76a9

Use passed params instead of self.config_params in suggestions

c9668e0

Raise NotSupportedException for trying to train ensemble

c184377

Print model parameters to debug log

f0cd394

Pass params to backend methods

1383245

Test for overriding backend param limit

d91ff1f

Shorter training data file for faster testing

ac14c0c

Test for overriding backend params for fasttext

22e951c

Use passed params in model creation of fasttext

94e6f0b

Allow any order of params in debug log

17e31b5

Fix allow any order of params in debug log: remove also commas

53bf269

Tune test to search param values only in the intended line

f4fc1d0

Raise NotSupportedException for trying to override algorithm

954a1e7

juhoinkinen added 3 commits July 10, 2019 17:35

Pass params to backend initialize method; needed in vw param creation

82341f9

Refine test to necessitate the parameters clause in debug log

ce6d2d9

Test for overriding backend params for vw-multi training

3ec8d7f

juhoinkinen added 4 commits July 11, 2019 10:01

Add project config for testing vw-multi

7b9563c

Pass params as variable for creating vw model (instead using attribute)

3ae2ad0

Pin vw==8.5 in Travis to get around install error of 8.7

69eac1b

Skip fasttext and vw tests if packages for those backends not present

450d88a

juhoinkinen added 2 commits December 12, 2019 17:04

Passing params to initialize is unnecessary

2056d25

Forgot this in "Rename various params for clarity"

272f453

juhoinkinen added 5 commits December 16, 2019 09:09

Remove unused import

7c3fe0e

Move fasttext backend-param testing to fasttext test module

3e7393d

Move vw backend-param testing to vw test module

db7dcac

Move tfidf backend-param testing to tfidf test module

e6251ec

Rename test

dac14e3

osma self-requested a review December 16, 2019 15:45

osma approved these changes Dec 16, 2019

View reviewed changes

juhoinkinen added 6 commits December 16, 2019 19:15

Merge branch 'master' into issue250-support-backend-param-option-in-t…

31bced4

…rain-and-learn-commands

Rename, drop cli_ for more generic case

8879fc2

PAV backend-param test

902c5f6

Omikuji backend-param test

bb3870c

NN-ensemble backend-param test

b1d8f3e

Rename for consistency

74c8690

juhoinkinen added 3 commits December 17, 2019 10:54

Fix PAV backend-param test

fa38b8b

Pass min-docs as parameter

c6c5840

Another fix for PAV backend-param test (for Travis)

fc967b9

osma added this to the 0.45 milestone Dec 17, 2019

Pass only the backend's own params to its methods

5b4bb73

juhoinkinen merged commit dbcb7be into master Dec 17, 2019

juhoinkinen mentioned this pull request Dec 17, 2019

Support --backend-param option in train command #250

Closed

osma mentioned this pull request Dec 20, 2019

Adapt the Maui backend for backend parameter overriding. #373

Merged

juhoinkinen deleted the issue250-support-backend-param-option-in-train-and-learn-commands branch January 28, 2020 12:11

juhoinkinen mentioned this pull request Feb 5, 2020

Remove duplicated be param option #384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue250 support backend param option in train and learn commands #289

Issue250 support backend param option in train and learn commands #289

juhoinkinen commented Jul 2, 2019 •

edited

Loading

juhoinkinen commented Jul 2, 2019 •

edited

Loading

codecov bot commented Jul 2, 2019 •

edited

Loading

lgtm-com bot commented Jul 2, 2019

osma commented Jul 3, 2019

lgtm-com bot commented Jul 4, 2019

lgtm-com bot commented Jul 5, 2019

lgtm-com bot commented Jul 10, 2019

lgtm-com bot commented Jul 10, 2019

lgtm-com bot commented Dec 12, 2019

lgtm-com bot commented Dec 12, 2019

osma left a comment

lgtm-com bot commented Dec 16, 2019

lgtm-com bot commented Dec 17, 2019

lgtm-com bot commented Dec 17, 2019

Issue250 support backend param option in train and learn commands #289

Issue250 support backend param option in train and learn commands #289

Conversation

juhoinkinen commented Jul 2, 2019 • edited Loading

juhoinkinen commented Jul 2, 2019 • edited Loading

codecov bot commented Jul 2, 2019 • edited Loading

Codecov Report

lgtm-com bot commented Jul 2, 2019

osma commented Jul 3, 2019

lgtm-com bot commented Jul 4, 2019

lgtm-com bot commented Jul 5, 2019

lgtm-com bot commented Jul 10, 2019

lgtm-com bot commented Jul 10, 2019

lgtm-com bot commented Dec 12, 2019

lgtm-com bot commented Dec 12, 2019

osma left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Dec 16, 2019

lgtm-com bot commented Dec 17, 2019

lgtm-com bot commented Dec 17, 2019

juhoinkinen commented Jul 2, 2019 •

edited

Loading

juhoinkinen commented Jul 2, 2019 •

edited

Loading

codecov bot commented Jul 2, 2019 •

edited

Loading