Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: RemoveAll does not always work on very large directories #20841

Closed
mattfarina opened this issue Jun 29, 2017 · 50 comments
Closed

os: RemoveAll does not always work on very large directories #20841

mattfarina opened this issue Jun 29, 2017 · 50 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mattfarina
Copy link

os.RemoveAll and other os operations (e.g., os.Rename) fail with large sets of files on Windows.

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.8.3

What operating system and processor architecture are you using (go env)?

Windows amd64 and Windows Subsystem for Linux amd64

What did you do?

An example repo can be found at https://github.com/mattfarina/go-test-windows-files. It's a repo with 10,000 files to be deleted to use as an example.

What did you expect to see?

I expect os.RemoveAll to delete all the files

What did you see instead?

Some of the files are deleted and then an error is thrown.

@mattfarina
Copy link
Author

Note, using the system commands on windows and bash (WSL) work without issue. It's just in the os package.

mattfarina added a commit to Masterminds/glide that referenced this issue Jun 29, 2017
For an example of the issue and status on Go itself see
golang/go#20841

When working with large file sets such as those in a large dependency
tree (e.g. thousands of files) functions such as os.RemoveAll and
os.Rename can fail on Windows and Windows Subsystem for Linux. The
system commands work without issue.

This change detects Windows and the WSL environments then drops
down to system commands to make these changes. On mac, linux, and
other the os package functions are still used.
@alexbrainman
Copy link
Member

I tried running https://github.com/mattfarina/go-test-windows-files on my Windows XP and Windows 7. It works as expected - "go run main.go" prints "nil", and "files" directory is deleted. I tried running it couple of times, and it always succeeds. I tried both go1.8.3 and current tip.

Alex

@bradfitz
Copy link
Contributor

@mattfarina, the question for Windows quirks is always: which virus scanner(s) are you using?

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows labels Jun 29, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Jun 29, 2017
@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jun 29, 2017
@mattfarina
Copy link
Author

mattfarina commented Jun 29, 2017

@bradfitz great question. I used Windows 10 Pro Creators Edition to reproduce this. Some Glide users ran into a problem and this was the underlying cause.

Windows Defender that came with the system by default is all I have for antivirus. The system has a minimal install. VSCode, git, go, and WSL are all that I installed to reproduce the issue.

This may be a new issue to Windows 10 Creators Edition. I initially reproduced the Glide issue on the k8s/helm project. I'd previously worked on helm under Windows Anniversary using Glide without issue.

Happy to supply other details that may help.

@alexbrainman
Copy link
Member

Some of the files are deleted and then an error is thrown.

What is the error message?

Alex

@mattfarina
Copy link
Author

@alexbrainman See the example at https://github.com/mattfarina/go-test-windows-files. Readme has that detail for this case.

@alexbrainman
Copy link
Member

See the example at https://github.com/mattfarina/go-test-windows-files.

Thank you. But the "remove files: directory not empty" message looks strange to me.

Searching for "directory not empty" in Go repo, I find ENOTEMPTY. But I don't see how windows code in Go repo can return ENOTEMPTY. Can you try and see why your program returns "directory not empty" message? I am trying to understand which windows syscall fails and what error it returns.

Thank you.

Alex

@kumarharsh
Copy link

@alexbrainman just to clarify the error message, it's not remove files: ..., but this:

remove C:\Workspace...\path-to-project\folder1: The directory is not empty.
remove C:\Workspace...\path-to-project\folder2: The directory is not empty.
...
etc

@alexbrainman
Copy link
Member

remove C:\Workspace...\path-to-project\folder1: The directory is not empty.

The "The directory is not empty" message must be ERROR_DIR_NOT_EMPTY. The only way you can get this message (as far as I know), if you are trying to delete directory with some files inside. Is it possible that some other program is creating files as you are running os.RemoveAll ?

Something like:

package main

import (
	"fmt"
	"io/ioutil"
	"os"
	"os/exec"
	"path/filepath"
	"time"
)

func runParent() error {
	tmpdir, err := ioutil.TempDir("", "ALEX")
	if err != nil {
		return err
	}

	exe, err := os.Executable()
	if err != nil {
		return err
	}

	cmd := exec.Command(exe, tmpdir)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	err = cmd.Start()
	if err != nil {
		return err
	}

	// wait before clent starts creating new files
	time.Sleep(100 * time.Millisecond)

	err = os.RemoveAll(tmpdir)
	if err != nil {
		return err
	}

	return nil
}

func runChild(dir string) error {
	for i := 0; i < 10000; i++ {
		path := filepath.Join(dir, fmt.Sprintf("file%d.txt", i))
		f, err := os.Create(path)
		if err != nil {
			return err
		}
		f.Close()
	}
	return nil
}

func main() {
	if len(os.Args) == 1 {
		err := runParent()
		if err != nil {
			fmt.Printf("PARENT: %v\n", err)
		}
	} else {
		err := runChild(os.Args[1])
		if err != nil {
			fmt.Printf("CHILD: %v\n", err)
		}
	}
}

If I run this program on my Windows PC, I get this:

C:\>u:\test
PARENT: remove C:\DOCUME~1\brainman\LOCALS~1\Temp\ALEX711754103: The directory is not empty.

C:\>

Alex

@kumarharsh
Copy link

I'm facing this error through glide, the developer of which is the OP (matt). No idea what goes on inside glide itself, but apart from glide, no other program would be modifying or touching anything within the directory being deleted.

... except maybe Windows Defender ...

@djdv
Copy link
Contributor

djdv commented Jun 30, 2017

I can only give the not so helpful "works on my machine" response here, tested on W10(Ver:1703 Build:15063.447) with Go 1.8.3 and 1.9beta2. I have Windows defender disabled on this machine so maybe it's related. It might be worth running the tests with at least "Real-time protection" disabled since defender may have open handles when the files are fresh, although I'd imagine if this was the cause the results would be inconsistent.

@mattn
Copy link
Member

mattn commented Jun 30, 2017

I tried this on Windows7 64bit. And I got return nil.

I use Symantec Endpoint Protection. I'll try in later on Windows10 64bit.

@brendandburns
Copy link

@mattfarina

Check out:

https://glennsarti.github.io/blog/wsl-ruby-puppet/

For some instructions about disabling Defender on WSL files, it would be interesting to see if that fixes things on the WSL side...

@mattn
Copy link
Member

mattn commented Jun 30, 2017

I don't repro on Windows10. I use Windows Defender.

@thomastaylor312
Copy link

thomastaylor312 commented Jun 30, 2017

@brendandburns I just tried testing with the exclusion added and still get the issue. Both with the test repo from @mattfarina and trying it with glide. Let me know if I can test something else

@thomastaylor312
Copy link

I also just tried turning off real-time protection to no avail

@alexbrainman
Copy link
Member

@kumarharsh and @thomastaylor312 if you can reproduce what @mattfarina described (the https://github.com/mattfarina/go-test-windows-files ), then you should be able to work out what is going on with os.RemoteAll.

For example, does this program https://github.com/mattfarina/go-test-windows-files/blob/master/main.go fails? What is the error message? Can you see any files remain in the "files" directory after program completes? Why did the program skipped them while deleting others? Did the program even attempted to delete them? Were the files in the "files" folder before you run the program? I would be interested in every error that occurs inside of os.RemoveAll, so I would be adding println everywhere to print every error. What are these errors? Can you see any errors related to the files that are left in the "files" directory.

Alex

@thomastaylor312
Copy link

@alexbrainman I will keep debugging it. But here are the answers to your questions:
Error message: remove files: directory not empty
Files deleted: It deletes about 5100 of them. I can't seem to find a pattern in which files are deleted and those that are stuck behind

I'll add some more debugging and see what I can find out

@alexbrainman
Copy link
Member

Error message: remove files: directory not empty

Is the error message "directory not empty" or "The directory is not empty"?

Files deleted: It deletes about 5100 of them. I can't seem to find a pattern in which files are deleted and those that are stuck behind

There are 10001 files in that directory. So you are left with about 4901 remaining files. Out of remaining 4901 files, did os.RemoveAll even attempted to delete them? You can put println before each file is removed and log println output into a file, and than look in that file. If you discover that os.RemoveAll did attempt to delete them all, then each file deletion must have failed. What are the error messages for each failure? Again you could print them all with println.

Thank you.

Alex

@thomastaylor312
Copy link

@alexbrainman That is what I meant by "debugging" in this case. 🙂 I am planning on sprinkling println around the RemoveAll function to see what is going on. I will let you know my results as soon as I do.

As for the error message, I copy pasted it directly from the output. It is remove files: directory not empty

@mattfarina
Copy link
Author

In the case the error remove files: directory not empty is in the pattern of remove [LOCATION]: directory not empty where [LOCATION] is the directory passed into os.RemoveAll.

In my case the example at https://github.com/mattfarina/go-test-windows-files removes a different number of files on each run. The last two runs removed 5118 and 5101 files.

I plan to merge in some changes to Glide this week to fix this. Hopefully it's just a temporary work around until we have a better fix.

@brendandburns Thanks for the Ruby WSL link. I'll dig into that a little more.

@thomastaylor312
Copy link

Ok, getting a little further on this. The names, err1 := fd.Readdirnames(100) call in RemoveAll is returning io.EOF. I am trying to figure out in Readdirnames and the functions it calls where the error is coming from. I haven't been able to yet

@jessfraz
Copy link
Contributor

jessfraz commented Sep 23, 2017 via email

@ianlancetaylor
Copy link
Contributor

I think the fix is in https://golang.org/cl/62970 .

@alexbrainman
Copy link
Member

CustomRename function in windows also has bug.
if my project root directory not in c:, glide get command get error.
it because move can't move directory between two partitions. eg:
cmd.exe /c move "C:\Users\admin\AppData\Local\Temp\1624_30103" "D:/project/vendor"
拒绝访问。
移动了  0 个目录。

@zhengtg you are correct, CustomRename function has a bug (I can reproduce it here - Windows move command refuses to move directory if source and destination are on different volumes), but CustomRename function is part of glide project, and you should use https://github.com/Masterminds/glide/issues to report problems with glide command.

Alex

@tyru
Copy link

tyru commented Dec 11, 2017

This seems to occur again in go1.10beta1.

Sorry, deleted my comment because it was totally wrong.
After checking the source code, it appears this patch has not been applied yet to go1.9.1, go1.9.2, go1.10beta1.

@tyru
Copy link

tyru commented Dec 11, 2017

For go1.10beta1, I confirmed this patch fixed the problem for my environment at least.

Environment

  • go version go1.10beta1 linux/amd64
  • OS name: Microsoft Windows 10 Pro (from systeminfo command)
  • OS version: 10.0.15063 N/A Build 15063 (from systeminfo command)
  • Ubuntu (WSL): 16.04.3 LTS (from lsb_release -a command)

@bradfitz
Copy link
Contributor

@tyru correct, there is no fix for this yet, in any branch, and it will probably not be fixed in either Go 1.9.x or Go 1.10.x.

It's really Microsoft's job to fix this in WSL if they want to fully act like Linux.

But we might consider a fix for Go 1.11. The change above (https://golang.org/cl/62970) has merge conflicts and unaddressed feedback. Ping @raggi. :)

@jessfraz
Copy link
Contributor

ping @jstarks

@jstarks
Copy link

jstarks commented Dec 11, 2017

Interesting. We (WSL team) will look into this.

@jstarks
Copy link

jstarks commented Dec 11, 2017

OK, that was fast. We believe this has been fixed in WSL in the recently released Windows 10 Fall Creators Update ("RS3", build 16299). Please let us know if you're able to reproduce this after updating to the latest Windows release.

@raggi
Copy link
Contributor

raggi commented Dec 11, 2017

Sorry, catching up with the comments on the original CL has been falling down my priority list. I won't be picking it up until January at the earliest.

It's important to point out here that the title is misleading and makes me sad that I mentioned Windows in the original bug report, because it's too easy to focus on that too much.

This behavior exists for many filesystems on many platforms. The spec allows for concurrent modification and for unbounded directory sizes. There is no tractable implementation for very large directories other than the behavior the original CL aims to fix. Platform doesn't really matter. You can almost certainly replicate the original bug with sufficiently large directories on both OSX and NFS today.

@bradfitz
Copy link
Contributor

@raggi, thanks for reminding us of that. I remember you did mention that in the CL review comments. Will retitle.

@bradfitz bradfitz changed the title os: RemoveAll does not work for large file sets on WSL os: RemoveAll does not always work on very large directories Dec 12, 2017
@bradfitz bradfitz removed OS-Windows WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Dec 12, 2017
@cwen0
Copy link

cwen0 commented Jan 15, 2018

I also encounter this error when using os.RemoveAll, and I don't use WSL.

os : CentOS Linux release 7.3.1611 (Core) 
kernel: 3.10.0-514.el7
go version: go version go1.9.2 linux/amd64

Is there any good way to replace os.RemoveAll?

@davecheney
Copy link
Contributor

@cwen0 please file a new issue. If it turns out it’s a duplicate we’ll link the two issues.

@cwen0
Copy link

cwen0 commented Jan 15, 2018

ok, Thanks

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/121255 mentions this issue: os: when looping in RemoveAll, close and re-open directory

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/171099 mentions this issue: os: fix RemoveAll hangs on large directory

@golang golang locked and limited conversation to collaborators Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests