Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview 7 inconsistent FileSystemInfo LinkTarget and ResolveLinkTarget results when using virtual drives #57582

Closed
carlossanlop opened this issue Aug 17, 2021 · 11 comments · Fixed by #57996
Assignees
Labels
area-System.IO documentation Documentation bug or enhancement, does not impact product or test code
Milestone

Comments

@carlossanlop
Copy link
Member

Originally reported by @fitdev in #57575

On Windows, there is an inconsistency of final targets returned depending on which APIs are used, and whether or not virtual drives created with subst command are used in the path (both the link and the target residing on such virtual drives):

The DirectoryInfo.ResolveLinkTarget(returnFinalTarget: true) API (and other APIs you added that resolve final target) resolve the final target to non-virtual drives, while the DirectoryInfo.LinkTarget (and other similar APIs that resolve the immediate target), when called repeatedly while LinkTarget != null stop at the virtual drive level.

So, given a structure like this:

C:\RootDirectoryForVirtualDriveX
X:\SomeRealFolder
X:\SymlinkToRealFolderOnX -> X:\SomeRealFolder

Calling DirectoryInfo.ResolveLinkTarget(returnFinalTarget: true) on X:\SymlinkToRealFolderOnX will result in: C:\RootDirectoryForVirtualDriveX\SomeRealFolder

While calling DirectoryInfo.LinkTarget (repeatedly while the result is non-null) on X:\SymlinkToRealFolderOnX will result in: X:\SomeRealFolder

Even though, logically, and for consistency results should be the same.

@carlossanlop carlossanlop added this to the 6.0.0 milestone Aug 17, 2021
@carlossanlop carlossanlop self-assigned this Aug 17, 2021
@ghost
Copy link

ghost commented Aug 17, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Originally reported by @fitdev in #57575

On Windows, there is an inconsistency of final targets returned depending on which APIs are used, and whether or not virtual drives created with subst command are used in the path (both the link and the target residing on such virtual drives):

The DirectoryInfo.ResolveLinkTarget(returnFinalTarget: true) API (and other APIs you added that resolve final target) resolve the final target to non-virtual drives, while the DirectoryInfo.LinkTarget (and other similar APIs that resolve the immediate target), when called repeatedly while LinkTarget != null stop at the virtual drive level.

So, given a structure like this:

C:\RootDirectoryForVirtualDriveX
X:\SomeRealFolder
X:\SymlinkToRealFolderOnX -> X:\SomeRealFolder

Calling DirectoryInfo.ResolveLinkTarget(returnFinalTarget: true) on X:\SymlinkToRealFolderOnX will result in: C:\RootDirectoryForVirtualDriveX\SomeRealFolder

While calling DirectoryInfo.LinkTarget (repeatedly while the result is non-null) on X:\SymlinkToRealFolderOnX will result in: X:\SomeRealFolder

Even though, logically, and for consistency results should be the same.

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 17, 2021
@carlossanlop carlossanlop changed the title Preview 7 inconsistent FileSystemInfo.LinkTarget result when using virtual drives Preview 7 inconsistent FileSystemInfo LinkTarget and ResolveLinkTarget results when using virtual drives Aug 17, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Aug 17, 2021
@carlossanlop
Copy link
Member Author

I manually tested this and was able to reproduce it. Here are my findings:

A) ResolveLinkTarget(returnFinalTarget: true) internally calls the Windows function GetFinalPathNameByHandleW to retrieve the final target:

return Interop.Kernel32.GetFinalPathNameByHandle(handle, bufPtr, (uint)buffer.Length, Interop.Kernel32.FILE_NAME_NORMALIZED);

We currently pass the FILE_NAME_NORMALIZED flag to ensure we resolve all path segments properly. The documentation says:

A file name is considered normalized if all of the following are true:

  • It contains the full directory path for the file, including the volume name, unless the user opened the file by file ID but does not have traverse privilege for the entire path. (For more information, see FltGetFileNameInformation.)
  • The volume name is the volume's nonpersistent device object name (for example, "\Device\HarddiskVolume1").
  • All short names are expanded to the equivalent long names.
  • Any trailing ":$DATA" or "::$DATA" strings are removed from the stream name.
  • All mount points are resolved.

Interestingly, I tested passing the FILE_NAME_OPENED flag but nothing changed and I can still repro the reported "issue" (the virtual drive is resolved to its real path). I'll follow up on this with the Windows team to ask why this is happening, but in the end, I think the correct flag to use to retrieve the final target should be the current one we have, FILE_NAME_NORMALIZED.

Now, on the other hand, LinkTarget retrieves the immediate target (not the final one) by using the DeviceIoControl function:

bool success = Interop.Kernel32.DeviceIoControl(

The example of this issue happens to only have one symlink (no chain of 2 or more), so passing true to ResolveLinkTarget(returnFinalTarget) is unnecessary.

If you call ResolveLinkTarget(returnFinalTarget: false), you should get the exact same path as LinkTarget, because of this:

GetImmediateLinkTarget(linkPath, isDirectory, throwOnError: true, returnFullPath: true);

I do not think this is a bug. This seems to be expected behavior coming from Windows: When retrieving the final target of a symlink, we want to resolve all the path segments (especially because symlinks can be relative). As part of the normalization, Windows happens to also resolve a virtual drive to its real path.

I'd like to leave this issue open in case @fitdev or the community have some additional thoughts.

@carlossanlop carlossanlop modified the milestones: 6.0.0, Future Aug 17, 2021
@fitdev
Copy link

fitdev commented Aug 17, 2021

@carlossanlop Thank you for looking into this.

When looking through your code I saw that you indeed use 2 different Windows APIs to resolve targets, and I did suspect that the inconsistency comes from the way GetFinalPathNameByHandleW handles paths. And I understand why it happens. My issue is just that this behavior is unexpected (unless you literally read the source code, which most users will not) and confusing, and can potentially lead to bugs in user code.

The problem being that you would expect the final targets obtained both ways to be the same, and currently nowhere in the API names or the docs / comments it is suggested that it may not be so. Furthermore, on Linux or other platforms, these 2 ways of getting final target may actually produce the same result (Windows being the exception), which again would be inconsistent.

You may ask why would anyone be using LinkTarget repeatedly to obtain final target when ResolveLinkTarget(returnFinalTarget: true) is available, but there are scenarios when this may be needed, such as when one is implementing their own FileSystem library/framework/APIs (my case) and needs to have access to the full chain of links while having a consistent behavior provided by BCL across platforms.

It may also be important in addressing somewhat niche questions like finding all symlinks whose targets are on the same logical drive.

But I guess my biggest concern is just inconsistency. Perhaps you could address that with another bool/enum flag for the ResolveLinkTarget public API which informs the user of the fact that the resolved target maybe further normalized (or not, depending on the value) by the OS. At least that way this inconsistent behavior will be discoverable.

An enum may be (names are just an example):

public enum LinkTargetType {
Immediate = 0,
Final,
FinalNormalized
}

@fitdev
Copy link

fitdev commented Aug 17, 2021

Also, I have not yet tested a case without Virtual Disks, but where a symlink's target is set throgh another symlink:

C:\RealFolder\RealFile.txt
C:\SymlinkToRealFolder -> C:\RealFolder
C:\SymlinkToRealFile.txt -> C:\SymlinkToRealFolder\RealFile.txt // <- Note how the target is set using the symlink in the path

I suspect that in this case, once again results will be inconsistent, and the GetFinalPathNameByHandleW would return C:\RealFolder\RealFile.txt for C:\SymlinkToRealFile.txt while the other way would return C:\SymlinkToRealFolder\RealFile.txt.

Which is quite serious discrepancy, because this way the resulting paths may look totally different between the two methods.

@fitdev
Copy link

fitdev commented Aug 17, 2021

And one last thought: somehow a distinction between 2 concepts in the API should be surfaced or at least noted either in API names, comments, or docs (later perhaps even specific APIs for the 2 concepts could be made available):

  1. The link target - that is the actual File/Directory in the File System
    vs
  2. Path segments within the path to link target, which themselves maybe symbolic links / reparse points.

And so the trouble is that the GetFinalPathNameByHandleW presumably normalizes-away all "non-real" path segments, while the other APIs preserve those and only care whether or not the very last path segment is a "real" file/directory.

So in my opinion, it is not ideal that the current API effectively mixes these 2 related, but slightly different concepts.

@jozkee
Copy link
Member

jozkee commented Aug 17, 2021

IIRC, virtual drives are another kind of Reparse Point (Mount Point I think), so it's understandable that ResolveLinkTarget(true) also resolves the virtual drive to the actual real path. I inclined to say that we should address the discrepancy by making ResolveFinalTarget(false) able to traverse the chain of links manually and be capable to reach the same path that ResolveLinkTarget(true) returns.

With that said, we should improve the API ResolveFinalTarget(false) so it can support returning the target of junctions and mount points in general.
@fitdev does that sound good to you?

@fitdev
Copy link

fitdev commented Aug 18, 2021

@jozkee

  1. What's ResolveFinalTarget(bool)? Is this a new API you are proposing? There's just ResolveLinkTarget(bool) and LinkTarget APIs at present.

  2. Regarding

With that said, we should improve the API ResolveFinalTarget(false) so it can support returning the target of junctions and mount points in general.

If you do not fully support junctions / mount points (at least in as far as "reading" them), it does not make sense to partially support them through a single, presumably new API - "ResolveFinalTarget" (while the other APIs like LinkTarget still do not support them at all (i.e. see #57575 and the conclusion we reached there for 6.0 milestone).

So my take on this, either you support these 2 kinds of reparse points in all your "read" APIs consistently, or not at all, i.e. treat them as regular directories. Writing support for these is another matter which is not part of this issue.

  1. Turns out, as I suspected yesterday, this issue is bigger than what I originally wrote, because it is not about just virtual disks (may not even be about Windows only either - but I do not know about other platforms).

So, today I did this test:

C:\RealFolder\RealFile.txt
C:\SymlinkToRealFolder -> C:\RealFolder
C:\SymlinkToRealFile.txt -> C:\SymlinkToRealFolder\RealFile.txt // <- Note how the target is set using the symlink in the path

Note, there are no junctions or virtual drives involved - only symlinks.

And, just as I thought results are very different for ResolveLinkTarget(true) and LinkTarget while LinkTarget != null:

ResolveLinkTarget(true) returns final, normalized path, that does not contain any redirection segments (i.e. no symlinks, junctions, mount points): C:\RealFolder\RealFile.txt

Whereas LinkTarget while LinkTarget != null returns just C:\SymlinkToRealFolder\RealFile.txt, i.e. the returned target path still contains the SymlinkToRealFolder redirection segment.

And this is the issue! It's confusing and inconsistent, currently seen on Windows only, though perhaps a similar thing happens on Linux, but I don't know.

And the reason it happens is because the 2 APIs ResolveLinkTarget(true) and LinkTarget effectively surface 2 different things:
ResolveLinkTarget(true) in its present form is about returning the final, ultimate target, without any redirection path segements, while LinkTarget only cares about the last path segment (whether a file or a directory) to be a "real" one, and hence will stop (return null) when the last path segment is not a link of any kind.

It's a subtle but an important issue and should somehow be addressed to avoid confusion. As I suggested the distinction in the behavior, perhaps, can be surfaced though another flag or an enum that tells what kind of final target should be returned - merely the one so that the last path segment is not a link (consistent with LinkTarget API) or the one such that there are no links of any kind in the whole path (i.e. it is fully resolved):

public enum LinkTargetType {
// Immediate target of the link as it was set
Immediate = 0,
// Final target of the link, so that the last path segment is not a link, consistent with calling LinkTarget repeatedly until it is null
Final,
// The ultimate, fully resolved, target path without any links in the path. NOT consistent with calling LinkTarget repeatedly until it is null, and may produce completely different paths (depending on specific cases)
FinalResolved
}

@jozkee jozkee self-assigned this Aug 18, 2021
@jozkee
Copy link
Member

jozkee commented Aug 18, 2021

What's ResolveFinalTarget(bool)? Is this a new API you are proposing? There's just ResolveLinkTarget(bool) and LinkTarget APIs at present.

I meant ResolveLinkTarget(bool), I crossed the words there, sorry.

And the reason it happens is because the 2 APIs ResolveLinkTarget(true) and LinkTarget effectively surface 2 different things

Exactly, you are trying to get the same result from two different APIs. One is for resolution and the other, LinkTarget, is for getting the raw target of a link, a practical example is traversing through the links (in case there's more than one) manually, it's name doesn't convey resolution of the targets. If you care about not resolving paths, then you should stick to LinkTarget. I don't think there's something to do regarding that discrepancy.

@fitdev
Copy link

fitdev commented Aug 18, 2021

Right. It does make sense. However I still think that at the very least in the comments, and in the docs it should be explained that in general: recursive LinkTarget != ResolveLinkTarget(true)

Also note that the way ResolveLinkTarget(true) is currently implemented (via GetFinalPathNameByHandleW) will allow it to resolve non-symlink reparse points too (i.e. Junctions), which are unsupported at present. Therefore this may have inconsistent results in a different way: LinkTarget will report null for Junction (since they are unsupported), and so will ResolveLinkTarget(false), but ResolveLinkTarget(true) will actually report some real physical path. That's ok, if we are supposed to think of ResolveLinkTarget(true) as returning the ultimate physical non-redirectable-further location, but then again this should somehow be noted more clearly in the comments / docs.

@jozkee
Copy link
Member

jozkee commented Aug 18, 2021

Yes, we can fix Junctions returning null LinkTarget, that will be tracked in #57575. And for the noted discrepancy, we can document the behavior in the API remarks. Thanks for reporting your experience with the new API, very valuable!

@jozkee jozkee added the documentation Documentation bug or enhancement, does not impact product or test code label Aug 18, 2021
@jozkee jozkee modified the milestones: Future, 6.0.0 Aug 18, 2021
@fitdev
Copy link

fitdev commented Aug 19, 2021

Just wanted to clarify though, with respect to Junctions. The current behavior of ResolveLinkTarget(true) should then be preserved, i.e. it should be able to resolve links even if there are junctions in the link target chain, and even if the path itself represents a Junction. But since this requires no change, that behavior will be upheld.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 24, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 27, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO documentation Documentation bug or enhancement, does not impact product or test code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants