Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding local .zim files takes a very long time ~15 minutes #3646

Closed
tigran123 opened this issue Dec 30, 2023 · 34 comments · Fixed by #3674
Closed

Finding local .zim files takes a very long time ~15 minutes #3646

tigran123 opened this issue Dec 30, 2023 · 34 comments · Fixed by #3674
Assignees
Milestone

Comments

@tigran123
Copy link

tigran123 commented Dec 30, 2023

Describe the bug

On Android 12 running on Samsung Galaxy Note10+ phone it took 14 minutes 5 seconds to find just two .zim files.

Expected behavior

Should take a few seconds. Actually, reading a directory and discovering those files should take 2-3ms, but assuming x1000 application overhead it should still complete in a couple of seconds, no more. Also, allowing one more second for reading and parsing the metadata I would say the maximum allowed time for this is 3 seconds. However, if the .zim file architecture is wrong, i.e. if the file is compressed (the 'z' in the name gives this hint) AND the metadata is stored inside the archive, then it would take about 15 minutes to access 100GB files, which is very bad, because this sort of processing should not depend on the filesize. To test this theory I would have to try with the smaller .zim files.

Steps to reproduce the behavior:

  1. Copy two zim files to /Android/media/kiwix. I used WikiPedia English (102GB latest) and WikiPedia Russian (33GB latest).
  2. Go to Android Settings and stop Kiwix app.
  3. While in the Settings clear cache and data of the Kiwix app
  4. Start Kiwix app and give it appropriate permissions
  5. Go to Kiwix and select Settings from the menu
  6. Point to External in Settings and select /Android/media/kiwix directory (making sure that it does point to External device first)
  7. Give the appropriate permission by clicking on "Use this directory" button
  8. Go to the Library tab of Kiwix and on another device start the timer
  9. Observe that the two .zim files will be discovered in 14 minutes and 5 seconds. And this is on a very fast Note10+ smartphone with a very fast 1TB microsd card, so on a lower-end configuration it will take even longer.

Screenshots

Environment

This problem was also mentioned in #3604

@tigran123 tigran123 changed the title Finding local .zim files take a very long time ~15 minutes Finding local .zim files takes a very long time ~15 minutes Dec 30, 2023
@kelson42 kelson42 added this to the 3.10.0 milestone Dec 30, 2023
@kelson42
Copy link
Collaborator

I can only support this issue, really wonder myself. Maybe there a legit explanation, then very interested in it.

@tigran123
Copy link
Author

I have just tested this issue on Samsung Galaxy Tab S4 running Android 10 and here it works fine -- it discovers these files in "just" 1 minute and 46 seconds. Still a bit slow, but acceptable. This tablet has the same 1TB microsd and a slower CPU than Note10+ (and 1/3 of RAM -- 4GB vs 12GB, though it is probably irrelevant).

@kelson42
Copy link
Collaborator

kelson42 commented Dec 30, 2023

@tigran123 Do you have a lot of other kinds of files (on the SD card) beside the two ZIM files?

@tigran123
Copy link
Author

tigran123 commented Dec 30, 2023

Yes, my books library (30,000 books) and Audio library (100,000 files), but they are all in other subdirectories (/Android/media/Books and /Android/media/Audio) and I specifically pointed Kiwix to /Android/media/kiwix, so it shouldn't go looking in other directories outside /Android/media/kiwix. If it does -- that is a bug.

But all these other files are present on both the tablet and the phone (I always have my library close, being a "bookworm" :)

@kelson42
Copy link
Collaborator

kelson42 commented Dec 30, 2023

@tigran123 OK, we have found I guess why this is that slow. But I will leave @MohitMaliFtechiz give the full answer. Hopefully we can better filter by filename extensions...

@MohitMaliFtechiz Sorry for my ignorance but I have a few questions:

  • Are the locations scanned for ZIM files somehow related to storage location setting? If yes, how?
  • Do we have any visual indicator that a scanning is ongoing?

@tigran123
Copy link
Author

If you think that it is the presence of 150k other files that slowed Kiwix down, then I suspect that you are mistaken, because I vaguely remember that when I was setting up this Note10+ smartphone the ZIM files for Kiwix were the first thing that I copied to the microsd, i.e. the books and classical music were NOT yet there -- the microsd was essentially empty. And, as far as I remember, Kiwix was extremely slow to find those ZIM files.

Anyway, it is easy to verify this -- tomorrow I'll just take a small microsd card 200GB in size and will copy just the ZIM files to it -- let's see how fast Kiwix will find them. My guess is -- in 14 minutes. Well, if I am mistaken then this is a very serious bug in Kiwix -- it certainly shouldn't go looking in places where it is not supposed to, i.e. outside the directory explicitly designated by the user.

@tigran123
Copy link
Author

tigran123 commented Dec 30, 2023

Actually, on my tablet the Books and Audio live in /storage/XXXX-YYYY/Android/data/com.termux/{Books,Audio} which is outside /Android/media and maybe that is why Kiwix finds the ZIM files on the tablet relatively quickly. This assumes that it has a bug -- it ignores the explicit location and just scans the whole /Android/media recursively. So, thinking about it, maybe my statement in the previous message is erroneous. Only the experiment will show...

@tigran123
Copy link
Author

tigran123 commented Dec 31, 2023

As promised, I performed the experiment and confirm that on a 200GB microsd with only the two ZIM files present in /Android/media/kiwix and nothing else (except that Android automatically creates some empty directories) Kiwix finds the ZIM files almost instantly -- in about 10-15 seconds.

So this means there is a bug in Kiwix -- it ignores the storage setting pointing it directly to /Android/media/kiwix and attempts to scan /Android/media recursively, but if there are millions of files in there it would never complete and would take ages to find the two ZIM files. Even after finding those files it continues to scan which is why I said above "it never completes". It may complete after 30 minutes or so, but who has the patience to wait that long...

So, I suggest that you fix Kiwix app to honour the user-specified location, IF it is specified. But if nothing is specified, then, yes, it may proceed to scan recursively the whole /Android/media as it currently does.

@kelson42
Copy link
Collaborator

Point to External in Settings and select /Android/media/kiwix directory (making sure that it does point to External device first)
Give the appropriate permission by clicking on "Use this directory" button

Can someone explain to me how this is achieved? In my Settings I can only should internal or external... nothing else.

@tigran123
Copy link
Author

tigran123 commented Dec 31, 2023

When you touch "External" there appears a UI dialog which allows you to select a specific directory and then you have to touch a button at the bottom which says "use this folder". This is a very useful and correct policy, but unfortunately Kiwix appears to ignore this selection. In that dialog you can even change the device, i.e. select internal even though you have selected external at the previous step, but we are not talking about this feature -- we are talking about a specific subdirectory within a given mounted storage filesystem.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 31, 2023

Wow, I don't have this feature on my phone. I guess this is non-playstore-version only. Anyway, this is a feature I don't remember to have validated at all (I might have forgotten)... that said I understand it could be useful even if pretty hacky.

My main concern is the mix which seems to be made between different concepts:
1 - Place where to save ZIM files if Kiwix download them (definitly the case)
2 - Place where a user should put its third party downloaded ZIM files (does it should be the same as #1)
3 - Place(s) to scan automatically (I guess we should scan many locations! Not only #1 or/and #2, or maybe #1 and #2 are not scannable).

@MohitMaliFtechiz Therefore my questions to #3646 (comment)

@tigran123
Copy link
Author

Yes, this UI chooser only appears in the version installed from apk downloaded from your website. And only on Android 12, not on Android 10 (not sure about Android 11 as I don't have any devices that run it).

@tigran123
Copy link
Author

Some programs (e.g. ebookdroid) allow the user to select the list of locations to scan, so that one can specifically avoid some locations which are known to have millions of irrelevant files to prevent the problems we are discussing. It would be good if Kiwix could do the same. Or, at the very least, allow the user to set a single location to be used for all the three purposes you described above. Even this minimalistic approach is good enough, imho.

@MohitMaliFtechiz
Copy link
Collaborator

Wow, I don't have this feature on my phone. I guess this is non-playstore-version only. Anyway, this is a feature I don't remember to have validated at all (I might have forgotten)... that said I understand it could be useful even if pretty hacky.

My main concern is the mix which seems to be made between different concepts: 1 - Place where to save ZIM files if Kiwix download them (definitly the case) 2 - Place where a user should put its third party downloaded ZIM files (does it should be the same as #1) 3 - Place(s) to scan automatically (I guess we should scan many locations! Not only #1 or/and #2, or maybe #1 and #2 are not scannable).

@MohitMaliFtechiz Therefore my questions to #3646 (comment)

@kelson42, For the feature that @tigran123 asking we need to fix the #3636 after that we can do it.

If a user selects a directory then.

  • For the first point => We can store the ZIM files inside that folder if Fetch allows it to be stored in that folder, otherwise, we can store the zim file inside the app-specific directory.
  • For the second point => User can place externally downloaded ZIM files in that folder.
  • For the third point => On Play Store version we can provide the directory picker option it will help users to put their externally downloaded ZIM files in that folder and our application will read those ZIM files from there. For now user needs to copy the ZIM files into the app-specific directory which will be deleted when the app is uninstalled or app data is cleared. The non-playstore version will work like it is working now, it will scan the full storage.

@kelson42
Copy link
Collaborator

kelson42 commented Jan 1, 2024

@MohitMaliFtechiz not asking how it should be, asking how it is today.

@MohitMaliFtechiz
Copy link
Collaborator

For now, we are scanning the whole storage and showing ZIM files on the library screen. On Android 11 and above we are taking the MANAGE_EXTERNAL_STORGAE permission for scanning the storage

  • For internal storage in the non-playstore version we are downloading the content in /storage/Kiwix folder
  • For the external storage the android limits our access so for downloading the content in external storage we are showing a directory picker to the user in which we are saving the ZIM files.
  • In playstore version we are downloading the ZIm files inside the app-specific directory because here with fetch we don't have the permission to store content outside our app-specific directory. And for the externally downloaded zim files user needs to copy those files inside the app-specific directory so it can show in the library screen.

@kelson42
Copy link
Collaborator

kelson42 commented Jan 1, 2024

@MohitMaliFtechiz You still don't answer the questions asked at #3646 (comment). Please stop explaining why things are like this or talk about permissions, because there the concern I have so far is conceptual.

@tigran123
Copy link
Author

tigran123 commented Jan 1, 2024

@MohitMaliFtechiz Thank you, I understand now that you have to deal with the various versions of Android and this makes it more difficult to implement and more confusing to explain to others.

Now, for older versions of Android I have no complaints about the current behaviour. As all versions of Android up to and including 10 allow other apps to access a particular app-specific data directory on external storage (in my case Termux, i.e. /storage/XXXX-YYYY/Android/data/com.termux) then there is no issue there. I just store Books and Audio as "Termux's private data" and it is visible to all other apps with no problem, and I can do rsync to/from a Linux host serving as a "library centre".

But for newer versions of Android (11+) I would like to make the following suggestion. Please consider to implement it, if it is not too hard. This is what I suggest. If the user did NOT specify a specific directory via UI chooser, then the current behaviour is acceptable -- no changes required. But if the user DID point to a specific directory, then would it be possible to kill the current scanning thread (presumably trying to scan the whole storage, both internal and external) and restart it but with the parameter telling it to search JUST THIS user-specified directory (recursively, of course) and nothing else? If this is implemented, then the problem of the current ticket will go away. Thank you.

Just to clarify: I cannot move Books and Audio outside /Android/media/ on Android 12 because then either this data won't be visible to book-reading and music-playing apps OR it will be impossible to write to this data from Termux via rsync. Even within /Android/media/ for Termux rsync to work properly I had to pass --inplace option on Android 12 (see the problem described and this workaround provided here: termux/termux-packages#18755

@MohitMaliFtechiz
Copy link
Collaborator

Extremely sorry about that!!

@tigran123 OK, we have found I guess why this is that slow. But I will leave @MohitMaliFtechiz give the full answer. Hopefully we can better filter by filename extensions...

It is slow because it scans the full storage to filter the ZIM files by their extension. We should try to improve this scanning process.

File(directory).walk().filter { it.extension.isAny(*zimFileExtensions) }.toList()

arrayOf("%." + zimFileExtensions[0], "%." + zimFileExtensions[1]),

Are the locations scanned for ZIM files somehow related to the storage location setting? If yes, how?

No, scanning is not related to the storage setting, scanning worked independently.

Do we have any visual indicator that a scanning is ongoing?

Yes, while scanning the storage, there is a loading progress showing on the library screen, it hides when the storage scanning is complete.

@tigran123
Copy link
Author

tigran123 commented Jan 1, 2024

Are the locations scanned for ZIM files somehow related to the storage location setting? If yes, how?

No, scanning is not related to the storage setting, scanning worked independently.

Hmm, I see. Well, in that case I guess what I am suggesting in the previous message also implies using this user-specified directory as a place where to look for side-loaded ZIM files. I think it makes perfect sense to do this, don't you? And for the older versions of Android (where no UI directory picker appears) the user's selection of "Internal/External" should imply some default hardcoded location like /Android/media/kiwix on the particular device (i.e. internal or external), for example.

@tigran123
Copy link
Author

Do we have any visual indicator that a scanning is ongoing?

Yes, while scanning the storage, there is a loading progress showing on the library screen, it hides when the storage scanning is complete.

No, that does not appear to be the case on my system (both Android 10 and 12). All I see is a rotating circle, but it does NOT indicate progress in any way, just the status "busy". Here is the screenshot:

Screenshot_20240101_122319_Kiwix

@MohitMaliFtechiz
Copy link
Collaborator

@tigran123 yes, It's the circular loading progressBar that we are showing while scanning the storage.

@tigran123
Copy link
Author

tigran123 commented Jan 1, 2024

@tigran123 yes, It's the circular loading progressBar that we are showing while scanning the storage.

Unfortunately, it cannot serve as a progress bar, because it does not show the progress at all. It is making 360-degree rotations indefinitely, so how am I to decode the progress from these rotations? There is no way to do this. I repeat: it is just a BUSY status indicator, not a progress bar. My screenshot may be confusing you because obviously it can show an instantaneous picture (it is not a video) -- which would look like a "circular loading progress bar". But obviously I am assuming that you are not looking at a screenshot, but know exactly what I am talking about -- the circular string inside a circle (like a snake) is making 360 degree (FULL CIRCLE!) rotations infinitely many times. So it cannot serve as a progress bar. If it was slowly moving and the completion would correspond to 360 degree, then, yes, it could serve as a progress indicator. But that is certainly NOT what actually happens.

@MohitMaliFtechiz
Copy link
Collaborator

@tigran123 I concur with your perspective. This implementation was established quite some time ago, likely with a specific purpose in mind. One possible reason is that we perform a comprehensive scan of storage across various locations, both internal and external. Consequently, we traverse through every conceivable path in search of ZIM files, employing different methods to check for their existence. For instance, distinct methods are used to examine app-specific directories in internal and external storage. We amalgamated all potential paths and methods for scanning storage, which explains why there is no progress indicator displayed on the page. Instead, we showcase a continuous rotating circle until the scanning process is complete.

@kelson42 kelson42 removed this from the 3.10.0 milestone Jan 2, 2024
@kelson42 kelson42 added this to the 3.11.0 milestone Jan 2, 2024
@Jaifroid
Copy link
Member

Jaifroid commented Jan 2, 2024

I know we've discussed this several times in different issues, but this just reinforces for me that this "scanning" behaviour is a cause of a lot of problems for the Kiwix Android app (not least that it is disallowed by the Play Store), and should IMHO be considered a legacy behaviour. Nowadays users may have tens of thousands of photos, etc., and scanning everything is no-longer acceptable behaviour for users from a privacy perspective, as well as being increasingly impractical.

If #3628 can be achieved, maybe it's time to think about retiring this full device-scanning code.

@tigran123
Copy link
Author

@Jaifroid Ok, it would seem that you are suggesting to use the file-picker to point to a specific .zim file, is that correct? But is it short-lived (the comment mentioned "file descriptor" which is short-lived, but I wasn't sure if it referred to the file-picker or not) or can the reference to .zim file be reused in subsequent sessions? What I am asking is this: currently, if I click on the + button, then I can point to a specific .zim file and it is opened correctly. And when I save bookmarks they refer to it in subsequent sessions also correctly. However, for some reason it is not added to the list in the Library tab. Is this an oversight or is there a reason for this?

So, before retiring the full device-scanning code one has to carefully and clearly formulate what it is going to be replaced with.

@Jaifroid
Copy link
Member

Jaifroid commented Jan 2, 2024

@tigran123 Android has persistent permissions (if the permission is stored by the app in the correct way), so choosing a folder to scan for ZIMs contained in it (and optionally any subfolders) needn't be temporary. I don't think it should matter that the file is being accessed via file descriptors. Once the app has been granted permission by the user, the permission should be storable.

Regarding bookmarks, I don't know the current mechanism, but I envisage they should be stored in the app's database (or possibly its own scoped file system) independently of the location of ZIM archives. If a user opens a bookmark for a ZIM the app can no longer find, the user should be prompted to open the correct location.

All of above is just my tuppence. I'm not an Android dev here (I work on the PWA).

@tigran123
Copy link
Author

tigran123 commented Jan 2, 2024

@tigran123 Android has persistent permissions (if the permission is stored by the app in the correct way), so choosing a folder to scan for ZIMs contained in it (and optionally any subfolders) needn't be temporary. I don't think it should matter that the file is being accessed via file descriptors. Once the app has been granted permission by the user, the permission should be storable.

Ok, then it is by omission that Kiwix does not add the .zim file (added via + button) to the Library list -- it should so so.

If a user opens a bookmark for a ZIM the app can no longer find, the user should be prompted to open the correct location.

Currently this does not happen -- instead, just an error message is shown about the missing .zim file.

@MohitMaliFtechiz
Copy link
Collaborator

If #3628 can be achieved, maybe it's time to think about retiring this full device-scanning code.

@Jaifroid Yes, we are thinking about it, I am waiting for #3628 to complete.

Ok, then it is by omission that Kiwix does not add the .zim file (added via + button) to the Library list -- it should so so.

@tigran123 Thanks for your suggestion, we are planning to implement this in #3628.

@kelson42
Copy link
Collaborator

kelson42 commented Jan 2, 2024

Ok, then it is by omission that Kiwix does not add the .zim file (added via + button) to the Library list -- it should so so.\n\n@tigran123 Thanks for your suggestion, we are planning to implement this in #3628.

@MohitMaliFtechiz please open a dedicated ticket and fix quickly in a dedicated PR. This has nothing to do with the playstore and is a bug.

@kelson42
Copy link
Collaborator

kelson42 commented Jan 17, 2024

@MohitMaliFtechiz Can we do anything to speed-up the scanning? Better inform the user about time left? Run this scanning at more pertinent time?

@kelson42 kelson42 modified the milestones: 3.11.0, 3.10.0 Jan 17, 2024
@tigran123
Copy link
Author

The best way to speed up scanning is by NOT doing any scanning at all. If the interface with a plus button + is provided properly, i.e. allow the user to point to a .zim file and do NOT forget to add that file to the list, then this solution would be absolutely perfect. Much better than wasting battery on scanning which will always take ages (because people who use Kiwix would have millions or at least many hundreds of thousands of files on their micro sd cards, which soon will be available in 2TB).

@kelson42
Copy link
Collaborator

kelson42 commented Jan 17, 2024

@MohitMaliFtechiz Do we have a way to make this scanning:

  • only on demand
  • "block" the app during the operation
  • Propose a kind of progress bar (even if this is only an estimation)
    ?

Question to speed-up the overall operation remains.

@MohitMaliFtechiz
Copy link
Collaborator

MohitMaliFtechiz commented Jan 17, 2024

Edited: @tigran123, It is likely a good option; however, rather than disabling the scanning feature, we should consider enhancing it. The scanning process could be significantly expedited if users are given the option to designate a specific folder for all their ZIM files. Additionally, users could manually select additional ZIM files via the "+" button, which would then be automatically added to the library screen.

@MohitMaliFtechiz Can we do anything to speed-up the scanning? Better inform the user about the time left? Run this scanning at more pertinent time?

Yes, we can show the remaining time of scanning. And also I will try to improve the scanning process.

Only on demand

Yes we can do this by disabling the default scanning when opening the LocalLibraryFragment. We can only do scanning when the user tries to refresh the data.

Propose a kind of progress bar (even if this is only an estimation)?

It would be good to show a progress bar to show the remaining progress of the scanning process that will show the remaining time of the scanning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants