Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies v0.53 #499

Merged
merged 7 commits into from
Jun 21, 2021
Merged

Update dependencies v0.53 #499

merged 7 commits into from
Jun 21, 2021

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jun 16, 2021

Dependency updates for Annif v0.53.

All pinned packages are updated to newest available, with two exceptions due to compatibility issues:

A notably update is upgrading joblib to 1.0.1 from 0.17.0, maybe that helps the Annif models to remain compatible for longer in future.

Also installs optional python-Levenshtein package to get rid of warnings about missing it.
Filters warnings about not-installed python-Levenshtein package emitted by Gensim 4.0.1.

During installation there are messages like Using legacy 'setup.py install' for <somepkg>, since package 'wheel' is not installed. I suppose by installing the wheel package the messages would go away, but I wonder if it would do anything else good.

There are some negligible warnings (about PyYAML version during installation and many DeprecationWarnings from tests), but one from TensorFlow could be more serious:

/home/local/jmminkin/git/Annif/venv/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py:497: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
    category=CustomMaskWarning)

For now I could not find out what exactly that is about, but it could be related to the error seen when trying to use old models in Portainer (for ai.dev.finto.fi); the container fails to start due to:

File "/usr/local/lib/python3.8/site-packages/tensorflow/python/keras/layers/core.py", line 1021, in from_config
code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

However at kj-kk when I eval the old yso-fi model I see the usual UserWarnings from sklearn, and models seem to work.

@juhoinkinen juhoinkinen added this to the 0.53 milestone Jun 16, 2021
@juhoinkinen juhoinkinen requested a review from osma June 16, 2021 06:11
@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #499 (1b3b786) into master (0fe6557) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #499      +/-   ##
==========================================
+ Coverage   99.48%   99.50%   +0.01%     
==========================================
  Files          78       78              
  Lines        5669     5672       +3     
==========================================
+ Hits         5640     5644       +4     
+ Misses         29       28       -1     
Impacted Files Coverage Δ
annif/backend/tfidf.py 98.85% <100.00%> (+0.04%) ⬆️
annif/backend/stwfsa.py 100.00% <0.00%> (+1.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fe6557...1b3b786. Read the comment docs.

@osma
Copy link
Member

osma commented Jun 16, 2021

Also installs optional python-Levenshtein package to get rid of warnings about missing it.

That's a bit of a waste, because Annif isn't really using the gensim.similarities.levenshtein module. I guess another option would be to filter the warning. I wonder why Gensim decided to do it like this.

During installation there are messages like Using legacy 'setup.py install' for , since package 'wheel' is not installed. I suppose by installing the wheel package the messages would go away, but I wonder if it would do anything else good.

That might speed up installing those packages with wheels available. Other than that, it shouldn't make a difference.

There are some negligible warnings (about PyYAML version during installation and many DeprecationWarnings from tests), but one from TensorFlow could be more serious:

I'm guessing this could be related to the Lambda layer used in nn_ensemble. This warning is shown when custom layers are serialized. It's a little bit unfortunate. There is this note in the documentation for Lambda layers that explains why serialization of Lambda layers is fragile and it would perhaps be better to use a custom subclass.

For now I could not find out what exactly that is about, but it could be related to the error seen when trying to use old models in Portainer (for ai.dev.finto.fi); the container fails to start due to:

This could be another symptom of serialization/deserialization issues with Lambda layers. Keras seems to use the marshal module for serializing functions; it is not compatible across Python versions. The Docker image just switched from 3.7 to 3.8 so perhaps that could have triggered this (and not the TF update in this PR)?

@osma
Copy link
Member

osma commented Jun 16, 2021

That's a bit of a waste, because Annif isn't really using the gensim.similarities.levenshtein module. I guess another option would be to filter the warning. I wonder why Gensim decided to do it like this.

I see now that Gensim has recently removed the dependency on the Levenshtein library altogether in this PR, but this change is not yet included in a release.

I suggest that we avoid introducing an additional, useless dependency and instead filter the warning by importing like this (in the tfidf backend and possibly elsewhere):

import warnings
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    import gensim.similarities

@sonarcloud
Copy link

sonarcloud bot commented Jun 16, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen juhoinkinen marked this pull request as ready for review June 16, 2021 11:08
@juhoinkinen
Copy link
Member Author

Updated PR description for filtering Gensim warnings.

@osma
Copy link
Member

osma commented Jun 16, 2021

I found some more information about the CustomMaskWarning error. It appears to be a bug in TensorFlow 2.5.0 and it's related to the Add layer, not the Lambda layer that I first suspected.

I've created a draft PR #500 that replaces the Lambda layer with a custom MeanLayer. But it doesn't actually solve the CustomMaskWarning error. It might improve compatibility of saved nn_ensemble models and potentially fix the ValueError: bad marshal data (unknown type code) seen on the Docker infrastructure, but that needs more testing and is out of scope for the 0.53 release.

Copy link
Member

@osma osma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough; we can't get rid of the CustomMaskWarning unless we downgrade TensorFlow, but that would also lose support for Python 3.9.

@juhoinkinen juhoinkinen merged commit 13cd15b into master Jun 21, 2021
@juhoinkinen juhoinkinen deleted the update-dependencies-v0.53 branch June 21, 2021 08:09
@osma osma mentioned this pull request Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants