Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve normalise #12

Open
ndmitchell opened this issue Oct 19, 2014 · 17 comments
Open

Improve normalise #12

ndmitchell opened this issue Oct 19, 2014 · 17 comments

Comments

@ndmitchell
Copy link
Contributor

When reviewing the Shake normalise function, I realised the implementation I had was all wrong. I should backport the appropriate Shake tests to the FilePath library and see if that also needs improvements.

@thomie
Copy link
Contributor

thomie commented Oct 27, 2014

I had a brief look at the tests in https://github.com/ndmitchell/shake/blob/master/Test/FilePath.hs#L33-L62

Notable discrepancy:
Shake.FilePath: norm "a/." === "a"
System.Filepath: normalise "a/." == "a/"

Special handling of trailing dots was added to System.FilePath in 3815668.

ndmitchell added a commit that referenced this issue Oct 27, 2014
@ndmitchell
Copy link
Contributor Author

Looking at the trailing slash behaviour of normalise, I'm not certain what the right answer should be. Is going from trailing slash to not too much change for normalise?

@thomie
Copy link
Contributor

thomie commented Oct 27, 2014

Since normalise "a/" == "a/", I think normalise "a/." should also be "a/", not "a". I don't quite understand the Shake version, maybe it should be changed to keep the trailing slash.

@ndmitchell
Copy link
Contributor Author

Quite possibly, I'd like the Shake and non-Shake versions to end up equivalent where they overlap - and some changes to the Shake version is fine.

@ndmitchell
Copy link
Contributor Author

So I think my underlying uneasiness all comes down to what a/. should be represented as. Essentially, you can chose to preserve the presence/absence of a trailing slash (and go for a), or preserve knowledge that it's a directory (and go for a/). What worries me is we don't have a function to test if something appears to be a directory or not, but we do have functions to talk about trailing slashes - perhaps that was a design flaw at the very beginning.

In Shake I jumped for preserving the presence/absence of a trailing slash, in FilePath we seem to be preserving directory information. I'm still trying to figure out in my head which is better.

@ndmitchell
Copy link
Contributor Author

There are lots of things here that aren't quite right, comparing to the Shake version (which also has a few minor trailing slash and drive issues). A few examples on Windows.normalise:

  • ///// is \\\\\ (literally, not escaped, so 5 instances of \) - should probably be \
  • /a: is a:, should probably be unchanged, but this breaks the idempotence - since a: renormalises to A:.
  • /a:/ is a:/ - should probably be \a:\.

I think the "right" solution is probably to rethink the whole library to be based around:

data Lexeme = Drive Char | UNC String | Separator Char | Path String

parse :: String -> [Lexeme]

display :: [Lexeme] -> String
display = concatMap $ \x -> case x of
    Drive x -> [x,':'] -- x will be an ASCII character
    UNC x -> '\\':'\\':x -- x will be a UNC name, not containing any pathSeparators
    Separator x -> x -- x will be a member of pathSeparators
    Path x -> x -- x will not contain any pathSeparators

Where we rely on the property display . parse == id. Writing normalise is then pretty trivial, and things like changing combine as per #8 would become feasible.

Does anyone (e.g. @thomie) think that seem sensible? Or a bad idea? Or too disruptive?

@thomie
Copy link
Contributor

thomie commented Nov 21, 2014

Those are all invalid filepaths:

*System.FilePath.Windows> isValid "/////"
False
*System.FilePath.Windows> isValid "/a:"
False
*System.FilePath.Windows> isValid "/a:/"
False

I also get a different result for this one:

*System.FilePath.Windows> normalise "/////"
"\\\\\\\\\\"

But I like your idea. splitDrive is particularly brittle at the moment. We could take it even further: reject invalid filepaths from being parsed. Not sure what the implication would be, but can't we make filepath a wrapper around system-filepath?

@ndmitchell
Copy link
Contributor Author

By \\\\\ I meant it printed \\\\\\\\\\ - I was removing the escape characters to print it literally.

Since the filepath library is one of the GHC Core libraries it's very important that we don't break reverse compatibility other than for bugs, so we'll have to support invalid paths forever (I personally think that's a good thing, in certain circumstances things which would otherwise be invalid can be useful to do some subset of operations on). We also can't depend on system-filepath, since that isn't a Core library (it also does a few things I'm not a fan of). I'm not suggesting we expose the parse/display functions or Lexeme data type.

@thomie
Copy link
Contributor

thomie commented Nov 21, 2014

By \\ I meant it printed \\\ - I was removing the escape characters to print it literally.

Ok, I was just confused for some reason. Still, it's not a valid filepath, so normalise can do whatever.

Even core libraries can be changed, after an accepted proposal on the libraries list, but I see what you're saying.

It would be nice if the section "Should FilePath by an abstract data type?" from the old readme would come back, with an explanation why invalid paths can sometimes be a good thing.

@ndmitchell
Copy link
Contributor Author

Agreed, I should resurrect that section, but make it up to date (before it was all about proposals, and before system-filepath existed).

@ndmitchell
Copy link
Contributor Author

I've updated the README with an abstract filepath section. Work still needed on normalise.

@ndmitchell ndmitchell changed the title Investigate normalise Improve normalise Dec 23, 2015
@ndmitchell
Copy link
Contributor Author

Idea is to do this after #53.

@Rufflewind
Copy link
Member

Should C:\\\\ be normalised to C:\? Windows API (e.g. CreateFile) seems treat them the same.

@ndmitchell
Copy link
Contributor Author

@Rufflewind - yes, probably. That said, there are plenty of holes in normalise that could do with being addressed...

@hasufell
Copy link
Member

hasufell commented Nov 4, 2021

> normalise "\\??????????????\\D:\\lol"
"D:\\lol"
> normalise "\\abc\\D:\\lol"
"D:\\lol"

both paths are invalid, yet normalise makes them valid.

Maybe normalise should be a no-op for invalid paths.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 5, 2023

Should C:\\\\ be normalised to C:\? Windows API (e.g. CreateFile) seems treat them the same.

Just hit this issue today. Could we possibly fix at least this inconsistency, if a larger rewrite is not on cards?

Another helpful option would be at least to document known caveats with existing normalise.

hasufell added a commit that referenced this issue Jan 5, 2023
@hasufell
Copy link
Member

hasufell commented Jan 6, 2023

Normalise is a mess. I'm not even sure what are all the expected properties.

Rewriting it in-place isn't possible. We'll have to rename the function and deprecate the old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants