Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking] Make basename and dirname ignore a trailing path separator #43137

Open
giordano opened this issue Nov 18, 2021 · 8 comments
Open

[Breaking] Make basename and dirname ignore a trailing path separator #43137

giordano opened this issue Nov 18, 2021 · 8 comments
Labels
breaking This change will break code filesystem Underlying file system and functions that use it
Milestone

Comments

@giordano
Copy link
Contributor

giordano commented Nov 18, 2021

Statement of the problem

Currently, basename and dirname have this documented, yet odd, behaviour:

julia> basename("/tmp/bar")
"bar"

julia> basename("/tmp/bar/")
""

julia> dirname("/tmp/bar")
"/tmp"

julia> dirname("/tmp/bar/")
"/tmp/bar"

What these functions do when there is a trailing path separator doesn't make any sense. For example, the Single Unix Specification, §3.170 doesn't allow empty files names, which are defined as follow:

A sequence of bytes consisting of 1 to {NAME_MAX} bytes used to name a file. The bytes composing the name shall not contain the <NUL> or <slash> characters.

Additionally, this doesn't match what the Unix utilities with the same names do:

% basename /tmp/bar
bar
% basename /tmp/bar/
bar
% dirname /tmp/bar 
/tmp
% dirname /tmp/bar/
/tmp

I was told that this behaviour was borrowed from Python's os.path.basename and os.path.dirname, but this doesn't change the fact it's meaningless.

This also makes it harder than necessary to automatically handle paths received from other functions, which may or may not have a trailing path separator, so if you are aware of basename and dirname oddness you have to do checks like

isdirpath(path) && (path = dirname(path))

and a similar one is used in BinaryBuilder as well.

Solution

My suggestion is to make basename and dirname ignore trailing path separators.

This was discussed already in #33000 and resolved by explicitly documenting the current behaviour in #37580. Here I'd like to ask to consider making the breaking change for Julia v2.0, basically what #33021 tried to do. I'd argue that no one in their sane mind should rely on basename reporting an empty string: if they want to know whether a path ends with a path separator, that's the job of isdirpath. Therefore, my expectation is that this change will cause very little breakage in practice.

@giordano giordano added breaking This change will break code filesystem Underlying file system and functions that use it labels May 21, 2022
@giordano giordano added this to the 2.0 milestone Jul 20, 2022
@elextr
Copy link

elextr commented Jul 21, 2022

The "insane" behaviour you mention allows a path to be a directory and have it returned by dirname() without losing its last segment. The odd Unix behaviour does not allow that.

If we are talking breaking, my suggestion would be that basename() S/B renamed filenamepart() and dirname() S/B renamed dirpathpart() (or similar names) and then the current behaviour matches correctly the name and is totally useful for manipulating strings which contain file paths.

Then if anybody still wants the odd (but standardised) Unix behaviour the names dirname() and basename() are available for that use.

@giordano
Copy link
Contributor Author

giordano commented Jul 21, 2022

The "insane" behaviour you mention allows a path to be a directory and have it returned by dirname() without losing its last segment. The odd Unix behaviour does not allow that.

Are you referring to

julia> dirname("/tmp/bar/")
"/tmp/bar"

?

@elextr
Copy link

elextr commented Jul 21, 2022

Yes, compared to Unix:

$ dirname /tmp/bar/
/tmp

PS I quoted "insane" to imply that I do not agree with the OP on that assessment, sorry if that confused anyone.

@giordano
Copy link
Contributor Author

Well, I specifically don't like that behaviour, and I provided examples where you need extra checks to avoid dirname from giving the wrong result. When you get a path programmatically you can't control whether there is a trailing separator or not, and its presence affecting the result of dirname/basename is a complication.

@elextr
Copy link

elextr commented Jul 21, 2022

So how do I pass a directory path as a string? The current behaviour allows indicating that to dirname() so it returns the whole thing.

Basically there are two behaviours that have differing use-cases, so given the choice, why not have both as I suggested, why cut off other peoples behaviours just because you don't want it?

@giordano
Copy link
Contributor Author

I'm not opposed to having a different function which does something else. I'm still missing the use case (honestly I don't understand what you mean by "pass a directory as a string"), but that's not my concern, I'll probably not use it and that's it. I'd just like dirname/basename to follow the standard and consistent behaviour.

@elextr
Copy link

elextr commented Jul 21, 2022

Good we agree, and I hope others do to, just please don't call behaviour for use-cases you don't know or understand "odd" "not sane" etc.

@tecosaur
Copy link
Contributor

tecosaur commented Oct 3, 2024

If we're going to think about this as a "Julia 2.0" change, I can't help but mention actually having a path type to work around a large class of "file paths are just strings" issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code filesystem Underlying file system and functions that use it
Projects
None yet
Development

No branches or pull requests

3 participants