Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have a test to ensure utf8 compatability for the files app and webdav? #3190

Closed
butonic opened this issue Apr 30, 2013 · 18 comments
Closed

Comments

@butonic
Copy link
Member

butonic commented Apr 30, 2013

I had a stange error when the files were stored on an nfs that in the end did not store files as utf8. Everything seemed to work fine but files containing ß. äöü were no Problem.

The acceptence tests use the files from the Demo owncloud. could we add a file with the name ß äöü ÄÖÜ èéȩ w͢͢͝h͡o͢͡ ̸͢k̵͟n̴͘ǫw̸̛s͘ ̀́w͘͢ḩ̵a҉̡͢t ̧̕h́o̵r͏̵rors̡ ̶͡͠lį̶e͟͟ ̶͝in͢ ͏t̕h̷̡͟e ͟͟d̛a͜r̕͡k̢̨ ͡h̴e͏a̷̢̡rt́͏ ̴̷͠ò̵̶f̸ u̧͘ní̛͜c͢͏o̷͏d̸͢e̡͝ .txt which becomes 'ß äöü ÄÖÜ èéȩ w͢͢͝h͡o͢͡ ̸͢k̵͟n̴͘ǫw̸̛s͘ ̀́w͘͢ḩ̵a҉̡͢t ̧̕h́o̵r͏̵rors̡ ̶͡͠lį̶e͟͟ ̶͝in͢ ͏t̕h̷̡͟e ͟͟d̛a͜r̕͡k̢̨ ͡h̴e͏a̷̢̡rt́͏ ̴̷͠ò̵̶f̸ u̧͘ní̛͜c͢͏o̷͏d̸͢e̡͝ .txt' containing the name as a string as well?

Where would we put a PHP-Unit test for this? files?

@danimo explained to me that the client always sends NFC. How do we deal with filenames entered in the ui as NFD? Can this happen?

Links:
http://en.wikipedia.org/wiki/Unicode_equivalence
http://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about

@butonic
Copy link
Member Author

butonic commented Apr 30, 2013

pulling @icewind1991 @DeepDiver1975 @Raydiation @bartv2 @karlitschek here for an opinion on how to test utf8 capability of the files app and webdav.

@danimo
Copy link
Contributor

danimo commented Apr 30, 2013

Ok, let's handle this case by case:

  1. Storage backends (even local ones, as they could be NFS mounts) need to be checked for UTF8-safety. Storing and reading back a sophisticated string like the above sounds like a good approach. This should be complemented with documentation in the admin manual. Also we have hints that in order to store UTF-8 with NFS on NFS-Servers running Windows, NFSv4 is needed. I do not have more information than that. Several hardware vendors, e.g. Dell, are known to drive their NFS Servers with Windows Server, so this could actually be a pretty common problem.
  2. The problem of Unicode equivalence is on a not on storage system, but on file level: In general, most storage backends will allow both forms, or even both forms in one filename, and in fact there is little we can do about NFD "leaking in" from somewhere. Dropbox advises the users of their APIs to normalize to NFC, and while this is simple with HTTP (the RFC advises on NFC form for percent encoding), it cannot be forced on NFS, Samba, or whatever. All that can be done is checking for NFC-formedness when reading the file name from the backend. The backend setup process is the wrong place to check for this. Recommended reading on this topic: Unicode Equivalence.

@bantu
Copy link

bantu commented May 13, 2013

@danimo You can get rid of the UTF8 dependency on the filesystem by using randomly generated ASCII/Hex characters as filesystem filenames and putting the original filename into the DB, which surely supports UTF8. This will also get rid of any problems regarding max filename or max pathname length etc.

@danimo
Copy link
Contributor

danimo commented May 13, 2013

@bantu This is sort of what we do on windows, but it has several disadvantages, the most important being that the database may never get lost.

@bantu
Copy link

bantu commented May 13, 2013

@danimo Don't you think that is a reasonable assumption?

@DeepDiver1975
Copy link
Member

the current implementation for windows can be extended to store the mapping of real file/folder names to their physical name in a second storage. e.g. a hidden file within the directories.

In case the database is lost we can recreate the names on the initial scan.

PRs are welcome ...

@bantu
Copy link

bantu commented May 13, 2013

@DeepDiver1975 Yeah no, a hidden file probably won't do it. At least not on a system with many concurrent write accesses. Also poorly scales across multiple servers. Just use the database and implement preriodic SQL dumps instead?

@DeepDiver1975
Copy link
Member

At least not on a system with many concurrent write accesses.

should not be an issue as this file will write to in case a file is created, deleted or renamed.
And there should be one file in each subfolder for sure.

@bantu
Copy link

bantu commented May 13, 2013

@DeepDiver1975 Kind of how SVN did it (and failed)? What's wrong with with using the DB? It seems to be a DB you are looking for.

@DeepDiver1975
Copy link
Member

Kind of how SVN did it

yes

(and failed)

hmm - as long as the files are not accessible by users directly I don't see an issue with that.

What's wrong with with using my suggestion?

Nothing - with the exceptions that database backup strategies are within the responsibility of the system operator.
We as a community cannot do much here I'd say.

@bantu
Copy link

bantu commented May 13, 2013

Well, you require a DB - so why not use it?

@DeepDiver1975
Copy link
Member

Well, you require a DB - so why not use it?

Maybe I've been not precise enough: The main data storage will be the database. The hidden file is just the fallback.

@bantu
Copy link

bantu commented May 13, 2013

@DeepDiver1975 Oh, yeah, that wasn't clear. Must have missed it. Sorry.

@DeepDiver1975
Copy link
Member

Must have missed it. Sorry.

no prob

@PVince81
Copy link
Contributor

Is this still relevant after so many months ? 😄

@timw4mail
Copy link

This is still relevant for #10625, and CIFS. I'm not sure about webdav,

@butonic
Copy link
Member Author

butonic commented Sep 8, 2014

filecache test has been added with #10244

@RobinMcCorkell
Copy link
Member

Closing due to inactivity

@lock lock bot locked as resolved and limited conversation to collaborators Aug 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants