-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print error and stop the script if mandatory Metadata key is not set #1806
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1806 +/- ##
==========================================
- Coverage 70.90% 70.90% -0.01%
==========================================
Files 23 24 +1
Lines 2609 2622 +13
Branches 593 594 +1
==========================================
+ Hits 1850 1859 +9
- Misses 653 657 +4
Partials 106 106
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@pavel-karatsiuba It should not be an exception. An exception is a unamanged unexcpected situation in a software. It should be handled properly. That means as early as possible and A proper error message should be given to the user so he understands what is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see many weaknesses with your implementation:
- The checking is done pretty late, it get all the page details downloaded before checking this (which is useless)
- I know this is not strictly the topic of this ticket... but have you remarked that if the
Description
comes from the Mediawiki, then nothing stops it to be more than 80 charaters?! I kind of expect that this works fine as well. - If error, the MWoffliner exits with code
0
which is wrong. Remark that if I a give a wrong--customZimDescription
on the command line then it fails with exit code2
(which is correct) - If there is an error then I see something like this (not only I would just prefer to stop pretty quickly but on the top of this the messages are misleading):
[info] [2023-03-12T13:56:24.324Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Ccoordinates&rdlimit=max&rdnamespace=0&colimit=max&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=Interlingua]
[log] [2023-03-12T13:56:24.967Z] Got [407 / 407] articles chunk from namespace [0]
[info] [2023-03-12T13:56:24.967Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Ccoordinates&rdlimit=max&rdnamespace=0&colimit=max&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=Nsoro]
[log] [2023-03-12T13:56:25.585Z] Got [413 / 820] articles chunk from namespace [0]
[log] [2023-03-12T13:56:25.586Z] A total of [1206] articles has been found in namespace [0]
[log] [2023-03-12T13:56:25.586Z] Got ArticleIDs in 2.09 seconds
[log] [2023-03-12T13:56:25.586Z] Total articles found in Redis: 1207
[warn] [2023-03-12T13:56:25.586Z] Couldn't find strings file for [bm], falling back to [en]
[log] [2023-03-12T13:56:25.587Z] Doing dump
[log] [2023-03-12T13:56:25.587Z] Writing zim to [/home/kelson/code/mwoffliner/out/wikipedia_bm_all_2023-03.zim]
[error] [2023-03-12T13:56:25.587Z] Metadata "{key}" is required
[log] [2023-03-12T13:56:25.587Z] Finished dump
[log] [2023-03-12T13:56:25.587Z] Closing HTTP agents...
[log] [2023-03-12T13:56:25.588Z] All dumping(s) finished with success.
[log] [2023-03-12T13:56:25.588Z] Flushing Redis DBs
[info] [2023-03-12T13:56:25.588Z] Finished running mwoffliner after [5s]
[log] [2023-03-12T13:56:25.588Z] Exiting with code [0]
[log] [2023-03-12T13:56:25.589Z] Deleting temporary directory [/dev/shm/mwoffliner-1678629383338]
You need to rethink the architecture of this. First you need to consider that ZIM metadata come AFAIK either from the mediawiki metadata (one short HTTP request at start) or from the command line. Considering this I could propose to:
- Isolate the string metadata constraints in a form of a const table (mandatory, max length, ...)
- Use this table in the sanitizing of command line arguments (when necessary)
- Isolate the metadata sanitizing in a dedicated module (
mwoffliner.lib.ts
is already quite big) - Reuse the string metadata constraints (created earlier) to check if all the metadata are correct
- Exit in a similar manner if wrong input via command line or just wrong metadata.
@kelson42 If I add sanitizing the Description as we made it for the command line parameter, then many downloads will fail. Maybe I need to trim the Description to 80 characters? |
@pavel-karatsiuba obviously you can... even if is is a super minor improvment.... if an improvment at all (strongly suspect Mediawiki to do it already). Won't change anything related to not passing the metadata sanitation. |
@pavel-karatsiuba Any news here? Please rebase/fix conflict before doing anything else. |
3ae267e
to
ebafe88
Compare
@kelson42 Ready to review. With the last change, I fixed the test because the new sanitizing rules do not allow Description higher than 80 characters. |
ebafe88
to
ea927ca
Compare
ea927ca
to
2a0c288
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this table in the sanitizing of command line arguments (when necessary)
This is not complete, for example max size of LongDescription
is not checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still work left... See https://github.com/openzim/zim-tools/pull/339/files, this is the kind of result I was expected here as well.
ea67452
to
72977cb
Compare
…rbose-option Added value of "verbose" parameter into the log if it is invalid.
…ror and stop script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If the mandatory Metadata key is not set then print Error and stop the script
Fixes: #1805