Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massively improve findFile performance #2

Merged
merged 1 commit into from
May 4, 2024

Conversation

raxod502
Copy link

@raxod502 raxod502 commented Apr 30, 2024

Please see mihonapp/mihon#705 for a technical discussion of the issue being solved here. The approach taken is as follows:

  1. Remove support for the ignoreCase parameter to findFile. This parameter is not present in the upstream DocumentFile API, and is not actually needed by Mihon as far as I am aware. An accompanying pull request against Mihon (Massively improve findFile performance mihonapp/mihon#728) removes the usage of this unneeded parameter.
  2. By removing ignoreCase, we can massively optimize the runtime of findFile (by as much as 1,000x when checking file existence in a directory with many existing files). Instead of listing all files, fetching details for each one, and then inspecting the results - we simply construct the document ID for the specific file we wish to check, and perform the check directly.
  3. I updated the permissions on gradlew since they were set incorrectly - it should be an executable script. This is unrelated but is an easy fix and is necessary if you wish to test the change locally.

DCO statement: The contribution was created in whole by Radian LLC, which attests its right to submit it to the project under the terms of the Apache License.

@raxod502
Copy link
Author

An alternative approach using SAF primitives more conventionally would look like this:

        String curDocumentId = DocumentsContract.getDocumentId(mUri);
        Uri childrenUri = DocumentsContract.buildChildDocumentsUriUsingTree(mUri, curDocumentId);

        Bundle bundle = new Bundle();

        String selection = DocumentsContract.Document.COLUMN_DISPLAY_NAME + " = ?";
        bundle.putString(ContentResolver.QUERY_ARG_SQL_SELECTION, selection);

        String[] selectionArgs = { displayName };
        bundle.putStringArray(ContentResolver.QUERY_ARG_SQL_SELECTION_ARGS, selectionArgs);

        ContentResolver resolver = mContext.getContentResolver();
        String[] columns = {
            DocumentsContract.Document.COLUMN_DOCUMENT_ID,
            DocumentsContract.Document.COLUMN_DISPLAY_NAME,
        };

        Cursor cursor = resolver.query(childrenUri, columns, bundle, null);
        while (cursor.moveToNext()) {
            String documentId = cursor.getString(0);
            String documentName = cursor.getString(1);

            if (!documentName.equals(displayName)) {
                // wtf?
                continue;
            }

            Uri documentUri = DocumentsContract.buildDocumentUriUsingTree(mUri, documentId);
            return new TreeDocumentFile(this, mContext, documentUri, documentName);
        }

        return null;

In addition to being incredibly obtuse, this doesn't actually work: although documentation suggests that the returned cursor should respect the provided selection arguments, it in fact iterates through all documents in the directory, defeating the entire point. Once I identified the approach taken in this PR, I abandoned further debugging.

return new TreeDocumentFile(this, mContext, uri.uri, displayName);
}
String documentId = DocumentsContract.getDocumentId(mUri);
documentId += "/" + displayName;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was rather wondering about that. I will adjust the code so that it checks the provider that is being used, and if it is not ExternalStorageProvider then it can fall back to the slower implementation that is more generic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reconsidering, I think maybe we just leave a comment and assume it will work on most providers, in case some OEMs use different package names for ExternalStorageProvider.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #3 to document this, per your suggestion.

@FooIbar
Copy link

FooIbar commented Apr 30, 2024

An alternative approach using SAF primitives more conventionally would look like this:

It does not work because DocumentsProvider will only extract the sort order from the queryArgs parameter.
https://cs.android.com/android/platform/superproject/main/+/main:frameworks/base/core/java/android/provider/DocumentsProvider.java;l=619;drc=074bb2dc367bde42b837a8de4d4e88f26c074c32

@arkon
Copy link

arkon commented May 2, 2024

Remove support for the ignoreCase parameter to findFile. This parameter is not present in the upstream DocumentFile API, and is not actually needed by Mihon as far as I am aware.

It was explicitly added because we often saw different behaviour depending on actual filesystems with regards to handling case insensitivity. That inconsistent behavior probably still exists, so I imagine removing it will actually break something, so I'm not entirely sure this is the way to go.

@FooIbar
Copy link

FooIbar commented May 2, 2024

Remove support for the ignoreCase parameter to findFile. This parameter is not present in the upstream DocumentFile API, and is not actually needed by Mihon as far as I am aware.

It was explicitly added because we often saw different behaviour depending on actual filesystems with regards to handling case insensitivity. That inconsistent behavior probably still exists, so I imagine removing it will actually break something, so I'm not entirely sure this is the way to go.

Are you referring to something like https://git.mihon.tech/tachiyomi/tachiyomi/issues/5381?
I think this is due to the underlying fs is case-insensitive while DocumentFile/UniFile was performing a case-sensitive check in findFile, see https://issuetracker.google.com/issues/232290071.
It will be case-insensitive with the proposed changes, so should not be an issue.

@raxod502
Copy link
Author

raxod502 commented May 3, 2024

Setting aside whether this PR actually would change behavior, in what circumstance would case sensitivity matter? Is there ever a situation in which we would be checking whether a file exists, and a file does exist but only with different casing, and we would want that to return true?

I looked at https://git.mihon.tech/tachiyomi/tachiyomi/issues/5381, but it's not clear to me what the actual issue was. Did the name of an existing manga change capitalization, and it was unable to create a new folder with the new capitalization because the filesystem is case-insensitive-but-case-preserving (the worst kind)?

If there are concrete cases where the case insensitivity was required, maybe I can adjust the library to fix those problems without needing to do case insensitive directory scans in all cases, which is just asking for trouble.

@FooIbar
Copy link

FooIbar commented May 3, 2024

Let me try to explain https://git.mihon.tech/tachiyomi/tachiyomi/issues/5381.

  1. Source changed its name form Mangasee -> MangaSee.
  2. Tachiyomi try to create the MangaSee folder
  3. Before creating the folder, UniFile will check if it exists through findFile. It's a case-sensitive check, so will return false.
  4. UniFile create the new folder. FileSystemProvider, which is the super class of ExternalStorageProvider, will generate a unique file name by calling FileUtils.buildUniqueFile in createDocument.
  5. FileUtils.buildUniqueFile will check if the file exists. Since the fs is case-insensitive, it will return true and add file counter suffix.
  6. MangaSee (1) created.

@arkon
Copy link

arkon commented May 3, 2024

@FooIbar's explanation covers it.

maybe I can adjust the library to fix those problems without needing to do case insensitive directory scans in all cases

I'm not sure you really can without overhauling how file checks are done in general since it's just relying on names (but also just assuming that case insensitivity is sort of acceptable).

It might be nice if it's possible to actually check if the case sensitivity problem is applicable at runtime and only doing the slower method if that's the case?

If you want to just yolo and see what happens, you could also just have your change in TreeDocumentFile only apply when ignoreCase is false, have overloads for createFile, then you can potentially play around with that?

@FooIbar
Copy link

FooIbar commented May 3, 2024

have your change in TreeDocumentFile only apply when ignoreCase is false

The proposed implementation is essentially ignoreCase = true (for tachiyomi's use cases).
TreeDocumentFile.exists will eventually call File.exists , so it has the same case sensitivity as the fs (i.e. case-insensitive in external storage).

@arkon
Copy link

arkon commented May 4, 2024

I'm not actually sure it'll meet all use cases, but I'm open to let folks try. 🤷

@arkon arkon merged commit e0def6b into tachiyomiorg:main May 4, 2024
1 check passed
@raxod502 raxod502 deleted the rr-improve-findfile-performance branch May 4, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants