-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_common: make as_file return the real file on os.PathLike #232
Conversation
af4b710
to
5538060
Compare
6d9757b
to
bf93ff5
Compare
We currently have custom handling for pathlib.Path, but that means that custom readers that do not using pathlib.Path will have their traversables copied to a temporary directory in as_file, which can be an expensive operation. By adding a dispatch for os.PathLike, we extend this behavior for all path-like objects. Signed-off-by: Filipe Laíns <[email protected]>
I know I had some comments on this elsewhere. Can you provide an example of what use-case this benefits? In what situations will a resource provider have a path-like object that's not a |
|
I just don't see where CompatbilityFiles exposes a PathLike. Can you show me a demo from an actual loader where the file would end up being temporary due to the current implementation, but would be a Path object after this change? |
#233 😄 |
Sorry, on phone, misclicked. That PR makes |
Okay. I see that now. Thanks for the explanation. I can see the value in these changes in principle. I'm still unconvinced this change is worth the additional complexity. The approach is a clean one, but it still adds new expectations and increases cognitive and maintenance burden. I'd like to see a use-case that demonstrates the value of this change in practice. Is there actually a loader that implements resource_path that would benefit from this change? And is that use-case worth the extra trouble of adding this complexity, especially since this code is a compatibility layer? Another way to think about it - by leaving the code simpler, legacy loaders get a sub-optimal but adequate experience and are incentivized to implement I'm reluctant to add code that's likely never going to be used except maybe in compatibility code, especially if that code doesn't expire with that compatibility code. Once CompatibilityFiles is deprecated and removed, this behavior would remain, likely unused by anybody, for future generations to wonder if it's needed. What about a more surgical approach where @singledispatch(CompatibilityFiles.ChildPath)
def _(path):
return path.actual_path or _tempfile(path.read_bytes, suffix=path.name) At least that would obviously go away when CompatibilityFiles does. Still I wonder if it's needed. |
Okay, before we get into all that, do you have any reasoning for not wanting to give preferential treatment to |
What's special about it is that If there were a resource provider out there whose I do worry that the tests don't adequately test the intended behavior. When I remove the new behavior, the tests still pass as long as the object passed to as_file implements the Traversable protocol (as required):
It's not obvious to me that PathLikeTraversable needs to be a broken traversable to capture the failed expectation. Moreover, Probably instead, the test should explicitly construct a non-Path PathLike object and pass it to as_file and assert that the file is the same filename as the passed object. Maybe in But I'm still unconvinced this functionality is needed. Is there a use-case where a resource provider other than CompatibilityFiles would supply a non-Path PathLike object for |
Okay, so I guess you and me have different opinions on this. I dislike this bypass is done using a type check instead of a protocol that anyone can implement. And that is especially relevant considering this code is in the standard library, meaning this decision will affect people for several years, so I think we should try to make sure the behavior we put in now will be as future-proof as it can. There may be use-cases where using For eg. I have some code locally where I was playing with a custom loader for editable packages. The goal was to minimize and delay rebuilding/compilation steps until the very last moment, and only do them if absolutely necessary (eg. I have an auto generated 3GB data file, I only want to re-generate it if anyone actually tries to access it). That alone is not enough for me to tell you that we should have this functionality, as I am not actually shipping any code, but it does demonstrate the sort use-cases that may arise and we are not considering. But anyway, this is a fairly minor issue, so I don't want to be wasting more of your time with it. Feel free to close this PR if you feel like that is the best decision.
Oh, yes, I forgot to add the fs path tests. I only added the
I would probably just add a new test in |
I don't want you to be discouraged. I do think we are converging on a solution.
That's a compelling point, but I believe the converse is true as well - if this behavior turns out to be difficult to support or have unintended consequences, it's difficult to remove. I used to be of the opinion that an interface should be lenient and support as many plausible use-cases as possible. After working on CPython for a long time, I've found the opposite to be true, that it's preferable to limit the interface surface and specifically deny unintended uses. That's why I feel strongly that this behavior should not be added until there is at least one use-case that strongly calls for this behavior. I do sympathize with the argument that requiring a use-case could lead to a chicken/egg problem, that by not supporting the interface, it incentivizes alternate implementations, preventing otherwise elegant use-cases from emerging.
This example is helpful. I think I can imagine why this custom loader would not be able to present a So then the question becomes, could this use-case be served by a Traversable that generates the content on demand? Could this resource reader return a traversable that never puts the content on disk? In other words, could the loader rely on the temporary file generation of I'm still leaning toward the two presented options (file already exists on disk and thus can be a The thing that would push me to change my position to supporting this change would be to have a docstring in the test that describes the conditions under which this behavior is useful. Something like: def test_as_file_pathlke():
"""
as_file may also accept a non-Path PathLike object resolved from a
ResourceReader.files() in cases where the content is not on the
file system at the time ``files()`` is called but is present on the file
system by the time :func:`as_file` is called on one of the children. For
example, consider `FFY00's custom loader <https://github.com/...>`_
in which files are only generated on demand at the last minute
at their canonical location and re-used for subsequent calls and
interpreter instances. See python/importlib_resources#232 for more
details.
""" |
That makes sense, and I respect your opinion here. The chicken/egg issue does concern me a bit.
That'd trigger the content generation when
FWIW, I am working on such a loader for https://github.com/FFY00/mesonpy. We want to use it for scipy and numpy, which are heavy projects, so triggering unneeded rebuilds should be avoided. I don't have any code pushed yet, but I hopefully I will soon. |
I'm going to close this for now, but happy to revisit at any time. |
We currently only have custom handling for pathlib.Path, but that means
that custom readers that do not using pathlib.Path will have their
traversables copied to a temporary directory in as_file, which can be an
expensive operation.
By adding a dispatch for os.PathLike, we extend this behavior for all
path-like objects.
Signed-off-by: Filipe Laíns [email protected]