-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix semantics about number of entry/article/front article/...( #514
Comments
Thanks @mgautierfr for the detailed explanation and proposition. This sounds OK. I'm mostly interested in the latest case (new namespace, with listing) as this is what we are about to create now and it sounds reasonable. I guess extending the hints later may have an impact but it's not for the need future so we'll see when this time comes. |
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
@mgautierfr This looks good even if this starts indeed to be quite complex. I have one worry, this is the redirect handling. If I understand properly redirects and articles are treated indifferently. One count usage which is on the top of my mind is the numbers communicated in the Kiwix library (number of medias, number of articles). To me it looks like that what is needed there is the number of front articles but without redirects... and it seems impossible to get that number right? |
From zim file format itself, no. |
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
If we have a specific title index, we must iterate on it. We must not iterate on a (wrong) subset of the entries. Related to #514
With current zim library there is no real distinction between the kind of article in a zim file:
File::getCountArticles()
return the number of articles (all entries) in the zim file.File::getNamespaceCount(char ns)
return the number of articles in the specified namespace.On top of that kiwix-lib has :
Reader::getGlobalCount()
returningFile::getCountArticles()
Reader::getArticleCount()
which parse theM/Counter
metadata to return the number of html articles (IfM/Counter
is not available, return the number of articles inA
namespace)Reader::getMediaCount()
which parse theM/Counter
metadata to return the number of image/video articles.With "new" zim library (in master, to be released) it is a bit more complex.
For now we only have
Archive::getEntryCount()
which return the number of user entries (On old zim files, it is all entries. On new zim files it is entries inC
namespace)We also have three ways to iterate on entries :
iterByPath
which iterate on all user entries.iterByTitle
which iterate on all user entries.iterEfficient
which iterate on ALL entries.And on top of that kiwix-lib has :
Reader::getGlobalCount()
returningArchive::getEntryCount()
Reader::getArticleCount()
which parse theM/Counter
metadata to return the number of html articles (No fallback ifM/Counter
is not available)Reader::getMediaCount()
which parse theM/Counter
metadata to return the number of image/video articles.Parsing the
M/Counter
metadata is still available but we have no specific api for that.With the recent changes, we add a specific listing in zim file to reference "Front articles" (Entries to be displayed as "real" entry to the user, in opposition to "resource" entries). While those front articles are always html content for now, this is not enforce and we shouldn't assume that.
Searching/Iterating by title is/should be made only on those front articles (which may/will be subset of all "user" entries)
Random and suggestions are also made on those front articles.
On top of that, please remember that it is totally valid to have a zim file using the new namespace scheme (all user entries in
C
namespace) but without specific front article listing. We (openzim) will probably never generate them but we must be prepared to read them.We need to define a api to provide some kind of coherent values and a definition of those values and make the api adapt to what zim file version we have to return coherent values (in regards of their definition).
I propose (but I'm really open to any suggestion):
getAllEntryCount()
. Return the number of all user entries.getEntryCount()
. Returning the number of user entries (all on old zim files,C
entries in zim files using new namespace scheme).getArticleCount()
. Returning the number of entries accessible through there titles.This is technically the number of entries in the title listing (specific or not).
. On old zim file (old namespace scheme, no specific listing) this is the same than
getEntryCount()
.. On zim file with new namespace scheme but no specific listing, this will be the count of ALL entries. (Which will be greater than
getEntryCount
). On zim file with new namespace scheme and specific listing, this will be the number of entries in the specific listing (leather than
getEntryCount
)What is important is that it is the number of entries you will have if you iterate on the range returned by
Archive::iterByTitle
(which is presently buggy for zim with specific listing)hasSpecificTitleListing()
telling if zim file has a specific title listing or not.The three ways to iterate on entries would become :
iterByPath
which iterate on all user entries.iterByTitle
which iterate on all front article (as said before, it could be more than all user entries)iterEfficient
which iterate on ALL entries.Or...
For old zim file make :
iterByPath
,iterByTitle
,iterEfficient
iterates on all entries (but with different order)getAllEntryCount
,getEntryCount
,getArticleCount
returns the same things (the number of all entries)For zim file with new namespace scheme but no specific listing :
iterByPath
,iterByTitle
,iterEfficient
iterates on all user entries (C
namespace) (but with different order)getAllEntryCount
returns the number of all entries.getEntryCount
,getArticleCount
returns the number of all user entries.For zim file with new namespace scheme and specific listing :
iterByPath
,iterEfficient
iterates on all user entries (C
namespace) (but with different order)iterByTitle
iterates on front article (listed in specific listing)getAllEntryCount
returns the number of all entries.getEntryCount
returns the number of all user entries.getArticleCount
returns the number of front articles.The text was updated successfully, but these errors were encountered: