-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CreateDirectory on Windows #61954
Comments
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsThe CreateDirectory implementation was improved on Unix by #58799 and #61777. The first PR eliminates exists syscalls by deducting that information from the directory creation syscall. The second PR reduces allocations when creating parent directories by allocating the parent paths on the stack instead of creating strings. Similar improvements can be made to the Windows implementation.
|
@iSazonov you thumbsed up this, any interest in a PR? Seems relevant to Powershell. 😸 |
@danmoseley PowerShell MSFT team agreed that PS community will start experimental work on new File System Provider in next milestone. We already have some API enhancement requests (important for PowerShell) in .Net repository and I hope we get them early so that we have a time for adoption. I ping-ed .Net in one issue but without answer - but I hope. |
Would like to have a look :) I hope I can improve the performance in FileSystem.DirectoryCreation.Windows.cs |
@deeprobin it's yours. |
Unfortunately, I could not achieve any relevant performance improvement here. I think the poor performance is mainly due to the WinAPI calls. The rest is just minimal "validation overhead". We could perhaps make minimal improvements by using a ReadOnlySpan instead of a string in Path & PathInternal. We could also analyze whether I would look at it again this afternoon, however if anyone has any ideas please let me know. |
We are already using runtime/src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CreateDirectory.cs Lines 15 to 26 in f0b7773
and we can't use |
Have you tried not checking if the directory exists but just trying to create it? For other optimization (similar to #61777) you should see a memory drop. Which is always nice to have, even if it does not provide a major CPU time reduction. |
So encoding would then probably only make a minimal performance difference.
I'll take a look at it later today :D |
We just have to use the |
*A methods are just wrappers around the *W methods in all cases I'm aware of. They are guaranteed to be slower as well as being inappropriate |
runtime/src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CreateDirectory.cs Lines 21 to 27 in 42c3a98
In my opinion we should call Is there anything against this change? |
Is this check still up to date. I don't really understand it. runtime/src/libraries/Common/src/System/IO/FileSystem.DirectoryCreation.Windows.cs Lines 120 to 132 in 0db9ff6
|
@deeprobin probably if you remove that code, then try to do Another thing you could do is put in a Debug.Fail there somewhere temporarily, and run the tests. You will see which test hits it. |
I was only able to achieve a performance boost of 7us (& allocation reduce of 47B). Before
After
Maybe one of you can find oppurnities to improve the performance. |
Thanks for the measurements. I'll let IO owners comment, but wanted to note that the standard deviation in those measurements is such that it's borderline whether it's meaningful improvement or not. |
Given the slim improvements, we would not want to make the code less maintainable. However I had a chance to take a look at your changes @deeprobin and several of them seem like worthwhile cleanup to remove unnecessary work. I like how you kept things in their own commits so they could be reviewed individually. Do you want to throw up a PR, and we can comment there? If there are changes you believe are dubious you could keep those out of the PR. |
I'd speculate a common use case is to create subdirectory in current folder (parent folders already exist). |
By looking at this code more closely I think we can close this issue. Obviously, creating a large number of new directories and even more so new nested directories (without having intermediate directories) are not common scenarios. Common scenario is:
Current Windows code addresses the scenario very well. This code first checks if a directory exists and only then tries to create it. The Unix code first tries to create a directory and then checks if it exists, i.e. it makes an extra call for this general scenario. I believe we don't want follow the pattern and to do extra pinvoke on Windows in the common scenario. Using Unix code pattern and replacing string stack with int stack makes little sense too. Today all (5 or 6) used helper methods use string type for the path parameter. And in common scenario this would exclude one allocation if terminate slash presents. |
When I changed the Unix implementation to omit the exist checks, the rationale was that perf-sensitive code shouldn't try to create the same directory over and over again in a loop. That's what you describe here as the common scenario. |
I don't understand what your comment is about. If you're talking about a high-performance scenario, there's only one way --- to create a directory and block it from being deleted - only then this code can stop worrying about recreating the directory. Optimizations made earlier in CreateDirectory (or which may have been made) have no effect on this scenario, since it excludes this code altogether. |
I mean that if the new files created in a loop will always be written to the same directory, it's best to move directory creation out of the loop.
|
Only if we lock the directory. Otherwise another thread/process can remove it. So having CreateDirectory in the first place is more reliable and is common practice. |
Most code doesn't account for directories that were just created go missing. Moving the |
You say the obvious things and again I don't understand how this relates to improving CreateDirectory on Windows requested in the issue.
Based on these two points, I conclude that we can close this issue. A more promising issue should be that we need to analyze the entire Path API stack to see if it is possible to reduce the number of allocations. |
You're trying to optimize for the case ensuring directories exist, while I'm trying to optimize for the case where new directories get created. Both are valid use-cases. |
@danmoseley w/ the profiling provided by deeprobin I see room for improvement on the runtime side & in the flow of directory creation... I have some promising mockups, so I'd love to pick this up! |
Very good. If this is also okay on the part of the others, I am curious about your results! A thought that still occurred to me would be perhaps to distinguish between the WinAPI calls, depending on whether the path is representable in ANSI (A call) or Unicode (W call). |
fyi .Net strings are already in the encoding used by CreateDirectoryW, plus the fact that CreateDirectoryA is a wrapper over CreateDirectoryW (converting the ANSI to Unicode under the hood) so it would only ever be slower (& the same is true at least for other file-system related win32 calls). |
The CreateDirectory implementation was improved on Unix by #58799 and #61777.
The first PR eliminates exists syscalls by deducting that information from the directory creation syscall. The second PR reduces allocations when creating parent directories by allocating the parent paths on the stack instead of creating strings.
Similar improvements can be made to the Windows implementation.
cc @adamsitnik @danmoseley
The text was updated successfully, but these errors were encountered: