Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use UUID's to generate random tempname on Windows #33785

Merged
merged 5 commits into from
Jan 18, 2020

Conversation

musm
Copy link
Contributor

@musm musm commented Nov 7, 2019

cc @StefanKarpinski @vtjnash @davidanthoff

I wrote up this quick implementation that use UIUD's to generate random file names as recommended in the MSFT docs. Note, we already load the library "Rpcrt4.dll" so it shouldn't be an issue to use it here. (According to Libdl.dll() this library is already loaded on Windows when using Julia. )

base/file.jl Outdated Show resolved Hide resolved
base/file.jl Outdated
@assert ccall((:RpcStringFreeW, :Rpcrt4), Cint, (Ref{Ptr{Cwchar_t}},), nameptr) == 0

filename = joinpath(parent, name)
if !ispath(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with this. I feel like this should just error (maybe?), since it will be an incredible rare scenario. In the previous code, we did keep looping until a unique name was generated. Here, in the unlikely scenario it is a path, we recursively call the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe if cleanup is false we could not call ispath at all? It would be nice if this function would actually not touch the file system at all.

I tend to think that just assuming we'll never run into a conflict is ok here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe if cleanup is false we could not call ispath at all?

Agreed, in fact it's not guaranteed to return a unique filename (even if very unlikely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @StefanKarpinski worked on this, could you comment on ?

It would be nice if this function would actually not touch the file system at all.

I don't see how it would be possible to do this in the scenario, since if we don't check !ispath(filename) , temp_cleanup_later could accidentally remove the file which we did not generate.

base/file.jl Outdated

id = Ref{GUID}()
r = ccall((:UuidCreate,:Rpcrt4), Cint, (Ref{GUID},), id)
r == 0 || (r == RPC_S_UUID_LOCAL_ONLY && (@warn "UIUD guaranteed to be unique to this computer only"; true)) || (r == RPC_S_UUID_NO_ADDRESS && error("Cannot get Ethernet or token-ring hardware address for this computer."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the warning? It is not clear when that case would actually ever happen (no network card installed?), but even if it does, it seems that a machine unique filename is good enough and doesn't need to emit a warning?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine not to emit this warning to me, but I'm not an expert in this platform.

Copy link
Contributor Author

@musm musm Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UuidCreate

For security reasons, it is often desirable to keep ethernet addresses on networks from becoming available outside a company or organization. The UuidCreate function generates a UUID that cannot be traced to the ethernet address of the computer on which it was generated. It also cannot be associated with other UUIDs created on the same computer. If you do not need this level of security, your application can use the UuidCreateSequential function, which behaves exactly as the UuidCreate function does on all other versions of the operating system.

UuidCreateSequential

The UuidCreateSequential function returns RPC_S_UUID_LOCAL_ONLY when the originating computer does not have an ethernet/token ring (IEEE 802.x) address. In this case, the generated UUID is a valid identifier, and is guaranteed to be unique among all UUIDs generated on the computer. However, the possibility exists that another computer without an ethernet/token ring address generated the identical UUID. Therefore you should never use this UUID to identify an object that is not strictly local to your computer. Computers with ethernet/token ring addresses generate UUIDs that are guaranteed to be globally unique.

In my PR I use UuidCreate

Reading things over again, the intention of UuidCreate/Sequential are more general than the intended purpose here which is to use them to generate a random temporary filename on the local computer. Therefore is probably more appropriate to simply use UuidCreateSequential and forgo these warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UuidCreate is what is used by default to create GUIDS on Windows, so I think I would just stick with that, and treat a return value of RPC_S_UUID_LOCAL_ONLY as fine and not worth a warning.

base/file.jl Outdated

@assert ccall((:RpcStringFreeW, :Rpcrt4), Cint, (Ref{Ptr{Cwchar_t}},), nameptr) == 0

filename = joinpath(parent, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prefix the filename with something like jl_? I think there is already a constant defined in the code somewhere. That way one could easily identify the temp files created by Julia.

@davidanthoff
Copy link
Contributor

Also pinging @JeffBezanson, who wrote the current version of tempname for Windows according to blame.

Context is #33741 and #33765.

@StefanKarpinski
Copy link
Sponsor Member

Also pinging @JeffBezanson, who wrote the current version of tempname for Windows according to blame.

Seems unlikely—probably just an artifact of him being the person to move the code to the current location.

@davidanthoff
Copy link
Contributor

Seems unlikely—probably just an artifact of him being the person to move the code to the current location.

It looks to me as if he really wrote it: 5e2ff12

@musm musm closed this Nov 11, 2019
@musm musm reopened this Nov 11, 2019
@musm
Copy link
Contributor Author

musm commented Nov 12, 2019

we have unrelated failures here

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 12, 2019

package_win32 is a segfault in tempname

@musm
Copy link
Contributor Author

musm commented Nov 12, 2019

oops your right. Might be a fluke. The previous build both ran into the same unrelated repl kill test errors

test/file.jl Outdated
@@ -50,6 +50,20 @@ end

using Random

@testset "that temp names are actually unique" begin
temps = [tempname(cleanup=false) for _ = 1:1000]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has an expected failure rate on our CI of 0.001% (1 - e^(-1000^2 / 62^6 / 2)) which means 1 in 100k CI runs (or about 1 in 10k commits) would be expected to fail this test (over the current lifetime of this project, that would mean there'd have been about 2-3 failures)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. The filenames that are generated in this PR are 128 bit UUIDs, which, as far as I can tell, will lead to a way, way, way lower collision rate than what you compute here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point that Jameson is making is that this test is unnecessarily strict, which I agree (I only used 1000 to test the PR). Something like 100, is more than sufficient.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least this does not use Julia's RNG, which gets reset on every @testset.

base/file.jl Outdated
return filename
else
# in the unlikley scenario filename is a path, call this function again regenerate a unique filename
return !ispath(filename) ? (temp_cleanup_later(filename); filename) : tempname(parent; cleanup=cleanup)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to get rid of ispath if possible. Is it ok to replace these last statements with

cleanup && temp_cleanup_later(filename)
return filename

Like in the non-windows version?

@musm
Copy link
Contributor Author

musm commented Nov 21, 2019

Ok 32-bit windows is now fixed (calling convention should've been stdcall)

@musm
Copy link
Contributor Author

musm commented Nov 21, 2019

This is now ready for review

@musm musm closed this Nov 21, 2019
@musm musm reopened this Nov 21, 2019
@musm musm force-pushed the wintempname branch 2 times, most recently from a0fec89 to 96e6dac Compare November 22, 2019 05:05
@musm
Copy link
Contributor Author

musm commented Nov 22, 2019

Note, this PR does lead to a bigger difference than what we currently have between the linux and windows behavior for tempname; I'm not sure if we care about this. On linux we have names like jl_XXXXXX and on windows now jl_c1b1ff4c-c90a-46a9-8348-85ad8170ff04

@musm
Copy link
Contributor Author

musm commented Nov 23, 2019

I think the jl_ prefix doesn't really make too much sense here. These are temporary filenames and thus the name shouldn't have meaning, so why bother with the unnecessary prefix here, where we are generating a UUID anyways. I think it makes more sense in the linux version with jl_XXXXXX, but with the UUID's I'd argue it's not needed.

@davidanthoff
Copy link
Contributor

I sometimes go into my temp folder to delete stuff, and I find it quite helpful in those cases to be able to tell which file originated from what application, so that I can do select deletes, so I kind of like the jl_ prefix. Is there any real downside to it?

@musm
Copy link
Contributor Author

musm commented Nov 23, 2019

Ok just wondered if someone had thoughts about that.

This PR from my standpoint is ready. Tests are all passing.

test/file.jl Show resolved Hide resolved
@musm
Copy link
Contributor Author

musm commented Nov 25, 2019

All green on the Western front....edit: apparently not, looks like still running tests...

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach seems good. Some open-ended questions:

  1. Is it OK to have filenames with - in them? Seems like it should be a pretty well supported character, but if not, we could add name = filter(!=('-'), name) (although, for that matter we could also use the existing code in uuid.jl for turning these into strings, as that definition of struct UUID is valid as a drop-in replacement for this struct GUID usage)
  2. Is it OK for the filenames to be so long? Since the tempdir may already be pretty deep in the path (C:\Documents and Settings\Your Full Username Here\AppData\Local\Temp\), this'll reduce the maximum supported username length from 201 to 174 utf16 code points.
  3. Is it OK to use filenames without suffixes?

All of these seem OK to me, but I just wanted to voice them.

base/file.jl Outdated Show resolved Hide resolved
base/file.jl Outdated Show resolved Hide resolved
base/file.jl Outdated Show resolved Hide resolved
base/file.jl Show resolved Hide resolved
@JeffBezanson
Copy link
Sponsor Member

Triage thinks this is a bug fix and so does not block feature freeze.

@Keno
Copy link
Member

Keno commented Jan 14, 2020

Seems like this didn't work on win64?

@musm
Copy link
Contributor Author

musm commented Jan 15, 2020

It was passing before I made the change to use underscores.
I need to update the PR with the some of the requested changes, which I should get to later today.

@StefanKarpinski
Copy link
Sponsor Member

I’ll repeat: putting a UUID in the temp name is too long. This should not be longer that than 10 or so chars.

@musm
Copy link
Contributor Author

musm commented Jan 16, 2020

import Base.StringVector

function _rand_string()
	nchars = 10
	A = Vector{UInt8}(undef, nchars)
	ccall((:SystemFunction036, :Advapi32), stdcall, UInt8, (Ptr{Cvoid}, UInt32), A, sizeof(A))

	slug = StringVector(10)
	chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
	for i = 1:nchars
	    slug[i] = chars[(A[i] % length(chars)) + 1]
	end
	return name = String(slug)
end

which approach is better to use the above, or the one in the last commit?

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jan 17, 2020

The implementation in the comment seems better to me. It's slightly biased because 62 doesn't divide 256, but that seems fine. @vtjnash, any opinion on the matter?

@musm
Copy link
Contributor Author

musm commented Jan 17, 2020

Does that actually create a bias?

@musm
Copy link
Contributor Author

musm commented Jan 17, 2020

Anyways I tend to agree that it is probably a better approach, and is similar in approach to that used in the .Net Core that @davidanthoff linked.

@musm musm requested a review from vtjnash January 17, 2020 19:00
@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jan 17, 2020

Does that actually create a bias?

Yes: since divrem(256, 62) == (4, 8) the first 8 characters (0-7) are more likely than the rest by odds of 5:4 (so 25% more likely). However, that's ok since for maximum distinguishability on a case-insensitive filesystem we should be choosing the digits twice as often as the letters anyway. So the bias actually makes the resulting slug slightly more likely to be distinguishable. I think we should proceed with this.

function _rand_string()
nchars = 10
A = Vector{UInt8}(undef, nchars)
ccall((:SystemFunction036, :Advapi32), stdcall, UInt8, (Ptr{Cvoid}, UInt32), A, sizeof(A))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can fail. The return value should be checked, but there's not much we can do to recover to just throw an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we do the same thing in the stdlib, so I'm gonna merge this and then fix them both at once.

@Keno Keno merged commit d759b5b into JuliaLang:master Jan 18, 2020
KristofferC pushed a commit that referenced this pull request Jan 20, 2020
* Use UIUD to create random tempname on Windows

* Use underscores and remove extension

* Truncate to 10 chars the UUID

* Generate the random name from a random byte array

* Update file.jl

(cherry picked from commit d759b5b)
@rfourquet
Copy link
Member

Given that this implementation is biased, in a non-problematic way, wouldn't if be fine to even use Libc.rand() then, and simplify slightly _rand_string() ? (not that I suggest doing it, now that it's already merged; just a genuine question).

@musm musm deleted the wintempname branch January 20, 2020 12:55
@musm
Copy link
Contributor Author

musm commented Jan 20, 2020

wouldn't hurt to open a PR @rfourquet if you have something in mind, thanks.

KristofferC pushed a commit that referenced this pull request Apr 11, 2020
* Use UIUD to create random tempname on Windows

* Use underscores and remove extension

* Truncate to 10 chars the UUID

* Generate the random name from a random byte array

* Update file.jl
BioTurboNick pushed a commit to BioTurboNick/julia that referenced this pull request Apr 13, 2020
* Use UIUD to create random tempname on Windows

* Use underscores and remove extension

* Truncate to 10 chars the UUID

* Generate the random name from a random byte array

* Update file.jl

(cherry picked from commit d759b5b)
@simeonschaub simeonschaub removed the forget me not PRs that one wants to make sure aren't forgotten label May 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants