Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue250 support backend param option in train and learn commands #289

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jul 2, 2019

This closes #250.

@juhoinkinen
Copy link
Member Author

juhoinkinen commented Jul 2, 2019

A few question on the way backend parameters are handled.

Here https://github.com/NatLibFi/Annif/blob/master/annif/backend/backend.py#L42 the beparams are first taken from the object's attribute, then updated if there are params coming from CLI option, and finally passed in _suggest call. However, for limit at least TFIDF and fastText backends use the original params attribute, not the passed params variable:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/tfidf.py#L54
https://github.com/NatLibFi/Annif/blob/master/annif/backend/fasttext.py#L116

Also params for the actual fastText model are taken from attribute:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/fasttext.py#L101

However, the chunksize is taken from passed params:
https://github.com/NatLibFi/Annif/blob/master/annif/backend/mixins.py#L23

Is the intention that limit should not be overriden by --backend-param, or can that be changed to come from the passed params?

Overall, is passing params variable necessary, as updating the object's params attribute would seem simpler? E.g. at line https://github.com/NatLibFi/Annif/blob/master/annif/backend/backend.py#L44 (Or would this affect non-CLI usage?)

@codecov
Copy link

codecov bot commented Jul 2, 2019

Codecov Report

Merging #289 into master will decrease coverage by <.01%.
The diff coverage is 99.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
- Coverage   99.37%   99.37%   -0.01%     
==========================================
  Files          59       59              
  Lines        3532     3656     +124     
==========================================
+ Hits         3510     3633     +123     
- Misses         22       23       +1
Impacted Files Coverage Δ
annif/cli.py 99.54% <100%> (+0.02%) ⬆️
annif/backend/fasttext.py 98.79% <100%> (+0.01%) ⬆️
annif/backend/dummy.py 100% <100%> (ø) ⬆️
annif/project.py 100% <100%> (ø) ⬆️
tests/test_backend_tfidf.py 100% <100%> (ø) ⬆️
tests/test_backend_pav.py 100% <100%> (ø) ⬆️
annif/backend/vw_multi.py 97.48% <100%> (ø) ⬆️
annif/backend/nn_ensemble.py 100% <100%> (ø) ⬆️
tests/test_backend_omikuji.py 100% <100%> (ø) ⬆️
tests/test_backend_nn_ensemble.py 100% <100%> (ø) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ec36c8...5b4bb73. Read the comment docs.

@lgtm-com
Copy link

lgtm-com bot commented Jul 2, 2019

This pull request introduces 2 alerts when merging f23417c into 2ea1130 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@osma
Copy link
Member

osma commented Jul 3, 2019

You found some genuine inconsistencies there :)

A few thoughts about how parameter handling in backends should be done, and a comparison with the current (somewhat broken) situation.

  • in backend classes, self.params should contain the parameters defined in projects.cfg (as is already the case) - perhaps this field could be renamed to something a bit more specific, e.g. self.config_params
  • suggest, train and learn methods should take a params attribute (only suggest has it currently); this would be used to pass parameters set via CLI options
  • the backends should not use self.params (or self.config_params if renamed) directly; instead, the passed params attribute should override parameters from the config file (as is done here - but perhaps there could be a helper method that does this)
  • backend code should not assume that a particular parameter always exists; for example, params['limit'] should be changed to params.get('limit', self.DEFAULT_LIMIT); see Default values for configuration settings #273 for some thoughts on this

More generally, I think it would be helpful if backend classes explicitly defined the backend-specific (hyper)parameters they need. This information could be used e.g. for hyperparameter optimization (see #240). fasttext, vw_multi and vw_ensemble already sort of do this in their own way, but the parameter definitions should be standardized and probably hyperopt would need more information about the parameters - for example the minimum and maximum values. This can be left for later, but it's probably useful to keep in mind when parameter handling code is modified.

@lgtm-com
Copy link

lgtm-com bot commented Jul 4, 2019

This pull request introduces 1 alert when merging 53bf269 into 2ea1130 - view on LGTM.com

new alerts:

  • 1 for Conflicting attributes in base classes

@lgtm-com
Copy link

lgtm-com bot commented Jul 5, 2019

This pull request introduces 1 alert when merging f4fc1d0 into 2ea1130 - view on LGTM.com

new alerts:

  • 1 for Conflicting attributes in base classes

@lgtm-com
Copy link

lgtm-com bot commented Jul 10, 2019

This pull request introduces 1 alert when merging 954a1e7 into 26540e0 - view on LGTM.com

new alerts:

  • 1 for Conflicting attributes in base classes

@lgtm-com
Copy link

lgtm-com bot commented Jul 10, 2019

This pull request introduces 1 alert when merging 3ec8d7f into 26540e0 - view on LGTM.com

new alerts:

  • 1 for Conflicting attributes in base classes

@lgtm-com
Copy link

lgtm-com bot commented Dec 12, 2019

This pull request introduces 1 alert when merging 3d35f33 into 6f4cbef - view on LGTM.com

new alerts:

  • 1 for Non-iterable used in for loop

@lgtm-com
Copy link

lgtm-com bot commented Dec 12, 2019

This pull request introduces 1 alert when merging 272f453 into 6f4cbef - view on LGTM.com

new alerts:

  • 1 for Non-iterable used in for loop

@osma osma self-requested a review December 16, 2019 15:45
Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the conflict, most likely caused by adding the collapse_every_n_layers hyperparameter to Omikuji in PR #371. It should be simple to fix.

This looks very good. My only issue is the cli_params parameter name that is used for AnnifProject methods. The parameters could come from another source than the CLI (for example, REST API methods could provide a way to override some parameters in the future), so I suggest renaming it to something more neutral such as be_params.

@lgtm-com
Copy link

lgtm-com bot commented Dec 16, 2019

This pull request introduces 1 alert and fixes 1 when merging 74c8690 into 3ec36c8 - view on LGTM.com

new alerts:

  • 1 for Non-iterable used in for loop

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Dec 17, 2019

This pull request introduces 1 alert and fixes 1 when merging fc967b9 into 3ec36c8 - view on LGTM.com

new alerts:

  • 1 for Non-iterable used in for loop

fixed alerts:

  • 1 for Unused import

@osma osma added this to the 0.45 milestone Dec 17, 2019
@lgtm-com
Copy link

lgtm-com bot commented Dec 17, 2019

This pull request introduces 1 alert and fixes 1 when merging 5b4bb73 into 3ec36c8 - view on LGTM.com

new alerts:

  • 1 for Non-iterable used in for loop

fixed alerts:

  • 1 for Unused import

@juhoinkinen juhoinkinen merged commit dbcb7be into master Dec 17, 2019
@juhoinkinen juhoinkinen deleted the issue250-support-backend-param-option-in-train-and-learn-commands branch January 28, 2020 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support --backend-param option in train command
2 participants