Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardlinks not correctly detected on SMB network share (NTFS?) #3080

Closed
rxxg opened this issue Jan 7, 2020 · 24 comments · Fixed by #3088
Closed

Hardlinks not correctly detected on SMB network share (NTFS?) #3080

rxxg opened this issue Jan 7, 2020 · 24 comments · Fixed by #3088
Assignees
Labels
awaiting response we are waiting for your reply, please respond! :) bug Did we break something? enhancement Enhances DVC

Comments

@rxxg
Copy link
Contributor

rxxg commented Jan 7, 2020

Using DVC version 0.80, with repo and cache both on the same network drive.
Windows setup.

I'm testing the unprotect command, but I assume other commands relying on the same check will also exhibit strange failures.

PS Q:\dvc-test> dvc --version
0.80.0
PS Q:\dvc-test> cat .dvc\config
[cache]
dir = 'Q:\DVC cache (test)'
type = "reflink,symlink,hardlink,copy"
protected = true
PS Q:\dvc-test> dir


    Directory: Q:\dvc-test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       06/01/2020     16:58                .dvc
-a----       06/01/2020     14:57             24 .gitignore
-a----       06/01/2020     14:57            155 big_file.dvc
-ar---       27/12/2019     16:42      134217728 big_file

PS Q:\dvc-test> fsutil.exe hardlink list big_file
Error:  The request is not supported.

As you can see, I checked out my files from git at 14:57 today; dvc checkout created a protected link to the original 128Mb file created 10 days ago, rather than a copy (as requested), but Windows can't tell me that it is a link. Trust me, it is 😉

PS Q:\dvc-test> dvc unprotect big_file
PS Q:\dvc-test> dir


    Directory: Q:\dvc-test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       06/01/2020     16:58                .dvc
-a----       06/01/2020     14:57             24 .gitignore
-a----       06/01/2020     14:57            155 big_file.dvc
-a----       27/12/2019     16:42      134217728 big_file

Strange - the unprotect operation was very fast, and the creation date of the file is still last year rather than today.

PS Q:\dvc-test> echo Hello >> big_file
PS Q:\dvc-test> dvc status
WARNING: corrupted cache file '..\DVC cache (test)\1d\cca63ad430e16fa12716d1a9bb3a6c'.
big_file_3.dvc:
     changed outs:
                                                                                                                       not in cache:       big_file_3

Sure enough, modifying big_file corrupts the cache, since I was not modifying a copy.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 7, 2020
@pared
Copy link
Contributor

pared commented Jan 7, 2020

@rxxg I am unabe to reproduce the issue,
can I ask you to run following scirpt?

rmdir /s repo
mkdir repo
pushd repo
git init --quiet
dvc init -q
dvc config cache.type hardlink
dvc config cache.protected true
git commit -am "init dvc"
fsutil file createnew data 10485760
dvc add data
git add .gitignore data.dvc
git commit -am "add data"
fsutil hardlink list data
dvc unprotect data
fsutil hardlink list data
echo hello >> data
dvc status
popd

Does the status display corrupted cache WARNING?

[EDIT]
Also, can I ask you to provide output of dvc version command? (note that its without --)

[EDIT2]
Sorry, forgot its NFS drive, let me try to reproduce that again.

@pared
Copy link
Contributor

pared commented Jan 7, 2020

@rxxg Also, as a temporary workaround you can change cache type to copy (dvc config cache.type copy) and use dvc checkout --relink big_file.dvc.

@rxxg
Copy link
Contributor Author

rxxg commented Jan 7, 2020

@pared Thanks for looking at this. Yes, the use of a network drive (repo and cache) is essential.

(I had been using copy cache but it is very slow for our use case since the copy that DVC does involves reading then writing to the network drive in 16k chunks. Native Windows copy is of the order of 3 seconds for a 128Mb file, 30 seconds for shutil.copyfileobj. There may be a separate bug report or PR for this.)

@rxxg
Copy link
Contributor Author

rxxg commented Jan 7, 2020

For the record:

Q:\repo> git init --quiet
Q:\repo> dvc init -q
Q:\repo> dvc config cache.type hardlink
WARNING: You have changed the 'cache.type' option. This doesn't update any existing workspace file links, but it can be done with:
             dvc checkout --relink
Q:\repo> dvc config cache.protected true
Q:\repo> git commit -am "init dvc"
[master (root-commit) 7e8ba77] init dvc
 2 files changed, 12 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
Q:\repo> fsutil file createnew data 10485760
File Q:\repo\data is created
Q:\repo> dvc add data
100% Add|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|1.00/1.00 [00:01<00:00,  1.48s/file]

To track the changes with git, run:

     git add .gitignore data.dvc
Q:\repo> git add .gitignore data.dvc
Q:\repo> git commit -am "add data"
[master d7e862f] add data
 2 files changed, 8 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 data.dvc
Q:\repo> fsutil hardlink list data
Error:  The request is not supported.
Q:\repo> dvc unprotect data
Q:\repo> fsutil hardlink list data
Error:  The request is not supported.
Q:\repo> echo hello >> data
Q:\repo> dvc status
WARNING: corrupted cache file '.dvc\cache\f1\c9645dbc14efddc7d8a322685f26eb'.
data.dvc:
     changed outs:
                                                                                                                                                                                           not in cache:       data
Q:\repo> dvc version
DVC version: 0.80.0
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False

@pared
Copy link
Contributor

pared commented Jan 7, 2020

@rxxg Ok, thank you very much, I am trying to reproduce it on my machine.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jan 7, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 7, 2020
@efiop
Copy link
Contributor

efiop commented Jan 7, 2020

@rxxg Could you please install psutil with pip install psutil and then run dvc version again and show us the output?

Side note for us: need to improve the way dvc version tests for link types by doing additional check for the created links. E.g. create hardlink and then do a sanity check with System.is_hardlink.

@efiop efiop added bug Did we break something? enhancement Enhances DVC labels Jan 7, 2020
@rxxg
Copy link
Contributor Author

rxxg commented Jan 8, 2020

Sure.

DVC version: 0.80.0
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'Q:\\')
Filesystem type (workspace): ('NTFS', 'Q:\\')

@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

@rxxg I see that it is reporting NTFS, but you were saying you are on NFS. Was it a typo or am I missing something?

@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

@rxxg Btw, is it your work machine or your personal one? We've seen something similar on NTFS in #2831 , but weren't able to find the cause for such a strange FS behavior at that time.

@rxxg
Copy link
Contributor Author

rxxg commented Jan 8, 2020

Sorry, typo 😳 Windows NTFS network share
It's my work machine, so I have zero control over the servers and even finding out info about the hardware/network protocol is hard work.
I had locking failures which came from the same cause (#2944) but things are fine since the change to the locking system.

@efiop efiop changed the title Hardlinks not correctly detected on NFS drive Hardlinks not correctly detected on NTFS network share Jan 8, 2020
@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

@rxxg Thanks for clarifying! Makes more sense now. Btw, I suppose you don't have WSL enabled either, right? That would explain why fsutil doesn't work for you. That won't explain the original issue though, so we are still researching...

@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

The issue might be caused by us using GetFileInformationByHandle, which could return incomplete data https://stackoverflow.com/questions/3523271/get-windows-hardlink-count-without-getfileinformationbyhandle. Looks like FindFirstFileNameW and FindNextFileNameW are the alternatives. And ansible is actually using it as well https://github.com/ansible/ansible/blob/105f60cf480572fb5547794cda1f9a05559ae636/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.LinkUtil.psm1#L230 .

@rxxg
Copy link
Contributor Author

rxxg commented Jan 8, 2020

fsutil does work as expected on my local drive. I don't know what WSL is I'm afraid.

@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

So we need to make our is_hardlink https://github.com/iterative/dvc/blob/0.80.0/dvc/system.py#L235 use FindFirstFileNameW and FindNextFileNameW to count hardlinks instead of relying on nNumberOfLinks. And then give you the dev version to check if it works for you. 🙂

@efiop
Copy link
Contributor

efiop commented Jan 8, 2020

@rxxg Created a POC patch for it. Please run

pip uninstall -y dvc
pip install git+https://github.com/efiop/dvc@3080

to install it and then run

dvc version

and share its output.

@rxxg
Copy link
Contributor Author

rxxg commented Jan 9, 2020

Bad news 😞

(dvc-3080) PS Q:\dvc-test> dvc -v version
ERROR: unexpected error - (50, 'FindFileNames', 'The request is not supported.')


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@pared pared removed the awaiting response we are waiting for your reply, please respond! :) label Jan 9, 2020
@efiop
Copy link
Contributor

efiop commented Jan 9, 2020

@rxxg Could you show dvc version -v (in that particular order), please?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jan 9, 2020
@rxxg
Copy link
Contributor Author

rxxg commented Jan 9, 2020

Oops, sorry.

(dvc-3080) PS Q:\dvc-test> dvc version -v
ERROR: unexpected error - (50, 'FindFileNames', 'The request is not supported.')
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\main.py", line 48, in main
    ret = cmd.run()
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 46, in run
    "Cache: {}".format(self.get_linktype_support_info(repo))
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 103, in get_linktype_support_info
    link(src, dst)
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 48, in hardlink
    assert System.is_hardlink(link_name)
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 250, in is_hardlink
    return System._count_hardlinks(path) > 1
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 241, in _count_hardlinks
    return len(FindFileNames(path))
pywintypes.error: (50, 'FindFileNames', 'The request is not supported.')
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@efiop
Copy link
Contributor

efiop commented Jan 9, 2020

Thanks @rxxg ! Interesting. Btw, are you aware of how the network share is setup? I'm not really a windows guy, and google didn't help much 🙁 Is there a central server that you are connected to? If that is so, my only explanation right now is that it is running something old which doesn't support FileFirstFileName.

Looks like we ran out of options here, fs is returning an incomplete data and alternative ways of counting links are not supported. Another option that might work for you is enabling symlink support on your machine and using dvc config cache.type symlink.

@rxxg
Copy link
Contributor Author

rxxg commented Jan 10, 2020

I don't have many details about the network server, sorry. Windows tells me that there is a cluster running NTFS + DFS, but I don't know what's on the other side.

My biggest concern at this point is that DVC is detecting that hardlinks are available (which they are, kind of) and trying to use them, but then failing to detect that the links have been correctly created. So if there are no other options for checking links DVC should refuse to try and create them and fallback to the next cache type?

I will try symlinks next.

[EDIT]
So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

@efiop
Copy link
Contributor

efiop commented Jan 11, 2020

My biggest concern at this point is that DVC is detecting that hardlinks are available (which they are, kind of) and trying to use them, but then failing to detect that the links have been correctly created. So if there are no other options for checking links DVC should refuse to try and create them and fallback to the next cache type?

Yes, will update my PR to do preciselly that. Currently using simple asserts in it, but should actually rise a proper exception instead. Thanks for the reminder! 🙂

So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

Have you tried installing our exe? Or do you have very limited rights on your machine?

@rxxg
Copy link
Contributor Author

rxxg commented Jan 11, 2020

So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

Have you tried installing our exe? Or do you have very limited rights on your machine?

I'm working on a utility which combines git + DVC in one UI for our particular workflow, to be internally redistributed as one package. My user base doesn't have admin rights on their machines. (I'm assuming the dvc exe needs admin to set up symlinks?)

@rxxg rxxg changed the title Hardlinks not correctly detected on NTFS network share Hardlinks not correctly detected on SMB network share (NTFS?) Jan 11, 2020
@rxxg
Copy link
Contributor Author

rxxg commented Jan 11, 2020

Sorry, typo 😳 Windows NTFS network share

I've edited the issue description, I'm not 100% convinced that there actually is an NTFS system on the other side of the network. Windows seems to always report NTFS in the UI, even when there is something else (like an OSX share).

@efiop
Copy link
Contributor

efiop commented Jan 11, 2020

My user base doesn't have admin rights on their machines. (I'm assuming the dvc exe needs admin to set up symlinks?)

Yes, but I think there is a way to do that without those rights https://www.google.com/search?q=windows+symlink+without+admin , but I haven't tried that myself 🙁 Or, well, you could ask your admin to enable those for your machines 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) bug Did we break something? enhancement Enhances DVC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants