Raise exception when open with write mode in call stack #140

jayqi · 2021-04-03T01:16:51Z

A pass at trying to detect when __fspath__ is called by open with a write mode. IMO it doesn't work all that well. It appears open is a weird function that doesn't have its own frame in the call stack. It jumps from __fspath__'s frame directly to the frame that calls open. I don't think there's a straightforward way to get object references from the frames (it's bytecode or source), so also hard to reliably detect if open is being called on the cloud path instance, or to pick up on what mode open is being called with.

github-actions · 2021-04-03T01:18:23Z

🚀 Deployed on https://deploy-preview-140--gallant-agnesi-5f7bb3.netlify.app

codecov · 2021-04-03T01:18:38Z

Codecov Report

Merging #140 (e6c2bae) into master (ecf787e) will decrease coverage by 0.0%.
The diff coverage is 91.8%.

@@           Coverage Diff            @@
##           master    #140     +/-   ##
========================================
- Coverage    93.5%   93.5%   -0.1%     
========================================
  Files          21      21             
  Lines        1119    1154     +35     
========================================
+ Hits         1047    1079     +32     
- Misses         72      75      +3

Impacted Files	Coverage Δ
cloudpathlib/cloudpath.py	`90.6% <91.1%> (+<0.1%)`	⬆️
cloudpathlib/client.py	`87.5% <100.0%> (ø)`
cloudpathlib/exceptions.py	`100.0% <100.0%> (ø)`
cloudpathlib/s3/s3client.py	`92.8% <0.0%> (ø)`

jayqi · 2021-04-03T01:30:40Z

tests/test_cloudpath_file_io.py

+    #         p,
+    #         "w",
+    #     ) as f:
+    #         pass


This case fails because the test frame's code_context looks like this:

[' with open(\n']

Not sure what to do about it.

pjbull · 2021-04-03T18:08:44Z

Ok @jayqi, this really sent me down a 🐰 🕳️ ...

Here's a notebook showing a few more options:
https://gist.github.com/pjbull/1b8f92b84a40cd6fe2033468f98594a4

Thoughts on those?

jayqi · 2021-04-03T18:42:37Z

Wow, well done. 👏 It looks to me like you're capturing the main cases. I think grabbing the entire source of the frame and parsing the AST is probably the only reliable way to do this. One concern I have is how much overhead this adds to each call, and how it scales with the frame's size.

pjbull · 2021-04-03T18:52:32Z

Yeah, that's exactly my concern too. I'll see if I can add some benchmarking. No clue how long those things take....

pjbull · 2021-04-04T07:14:44Z

OK, I updated that gist above with some benchmarking. Here are my takeaways:

Adding the check makes the __fspath__ take ~10x the time of having no check
On my machine, the slowdown is in the same ballpark as writing a 48KB file or reading a 1.6MB on. As file sizes grow larger than those, the slowdown will start to be overwhelmed by the cost of reading/writing.
inspect.getsource is the most expensive call. I didn't track down a faster call for this in some poking around.
Neither AST parsing nor regex change execution time appreciably, even for reasonably large code contexts, so choosing one of those should not be based on performance.

So, I think if we go with Option 1 as outlined in #128 we should:
(1) add the AST version based on getcurrentframe which is fast and robust as compared to the other options, and (2) implement an environment variable to allow opting out of the performance hit if you are sure you don't use open with cloudpaths.

For me, options 2-4 as outlined in #128 are also still on the table given this research.

jayqi · 2021-04-04T20:30:43Z

Do you think inspect.getsource might have a different speed if it's getting the source from an installed version of cloudpathlib? (Editable and normal)

6 ms wall time per run isn't untenably slow but I guess not trivial. Adds a minute to 10,000 files.

pjbull · 2021-04-04T20:58:06Z

Do you think inspect.getsource might have a different speed if it's getting the source from an installed version of cloudpathlib? (Editable and normal)

I think since the frame we want to look at is from the caller's code, it will depend on how their code is installed so not sure that we can speed it up.

I did just poke at bytecode disassembly with dis as well, and that gave me the same order of magnitude as getsource (~5ms).

Agreed that 6ms isn't large enough to rule this approach out.

I think to improve the user friendliness of working with paths, I'm inclined towards this change.

pjbull · 2021-04-12T06:51:18Z

@jayqi Check out the latest commits. This seems to work, but it does add a ton of complex code for a narrow-ish use case.

jayqi

"Pull request authors can't request changes on their own pull request." 😂 🙄

cloudpathlib/cloudpath.py

jayqi · 2021-04-13T04:58:11Z

tests/test_cloudpath_file_io.py

+
+
+# entire function is passed as source, so check separately
+# that all of the built in open write modes fail


So this is interesting. I thought it was a feature of my previous version of the test that the function would have both safe and unsafe opens, and only fail for the unsafe one. Is it the case that if we had a safe open first and unsafeopen second in the same function, it would fail on the first one because the checking code would see the second open?

Anyways, I think some additional things we want to test:

open(pathlib_path, "w") and open(cloud_path, "w")` in the same function

monkeypatching that constant to False disables the check

cloudpathlib/cloudpath.py

pjbull · 2021-04-14T17:44:14Z

Another rev here. I thought that with the AST we lost the line numbers for matching writeable opens we detected and the entry point from calls, but it turns out that with a little bit of tracking, we can use both the AST approach and raise the error on the right line.

For example, here is the right trace in a notebook:

Tests path locally on 3.8 on linux as well, so there is likely some fix needed for Python<3.8

Small bonus fix: silence warnings in tests by checking for attributes in __del__ methods

jayqi · 2021-04-14T23:26:48Z

Uh oh, did we end up using a Python 3.8-only feature? It looks like the 3.8 build passed but 3.6 and 3.7 failed.

pjbull · 2021-04-14T23:29:21Z

Yeah, see above:

Tests path locally on 3.8 on linux as well, so there is likely some fix needed for Python<3.8

I'm seeing differences in what line inspect reports for a given frame across versions. Need to dig in more to find a fix.

pjbull · 2021-05-14T20:47:30Z

OK, I think this is ready for review. It is passing all the tests.

I do think that this is a pretty substantial amount of complexity to add for maitenance, so we may want to turn this off by default if we get lots of bug reports that come up in this code block.

I wouldn't be surprised if there are cases where the inspect calls don't do what we expect given all the different ways one can run Python.

Performance update

Here's info on timing for this implementation:

Script:

from cloudpathlib import CloudPath
import statistics
import time


if __name__ == "__main__":
    p = CloudPath('s3://cloudpathlib-test-bucket/5bmg9iTs74wmewasLTP44W-test_cloudpath_file_io-test_file_discovery/dir_0/file0_0.txt')

    it = 100
    res = []

    for _ in range(it):
        start = time.time()

        with open(p, "r"):
            pass

        res.append(time.time() - start)

    print(f"Median time {it} iterations: {statistics.median(res)}")

Results:

Around +0.05s on average for each call to open

❯ CLOUDPATHLIB_CHECK_UNSAFE_OPEN=True python timing.py
Median time 100 iterations: 0.2683759927749634

❯ CLOUDPATHLIB_CHECK_UNSAFE_OPEN=False python timing.py
Median time 100 iterations: 0.2952016592025757

Raise exception when open with write mode in call stack

a7ac50f

jayqi commented Apr 3, 2021

View reviewed changes

pjbull added 2 commits April 11, 2021 23:46

Implement AST-based writeable open detection

81e92d3

docs update

5b135a9

Remove extra code and exception update message

ee69c9b

jayqi commented Apr 13, 2021

View reviewed changes

pjbull added 2 commits April 14, 2021 10:26

Update to track and use line numbers for proper error

35edbb0

Use writable assert for proper error message

6395eb5

Fix line mix-up in python<=3.7

e6c2bae

pjbull marked this pull request as ready for review May 14, 2021 20:47

pjbull changed the title ~~[WIP] Raise exception when open with write mode in call stack~~ Raise exception when open with write mode in call stack May 15, 2021

pjbull mentioned this pull request Jun 5, 2021

Add ability to detect if cache is dirty and should be uploaded even if file was not opened by cloudpathlib #73

Open

This was referenced Jan 22, 2023

Hello from fsspec! #96

Closed

Some libraries generate confusing errors when opening CloudPath objects #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise exception when open with write mode in call stack #140

Raise exception when open with write mode in call stack #140

jayqi commented Apr 3, 2021

github-actions bot commented Apr 3, 2021 •

edited

Loading

codecov bot commented Apr 3, 2021 •

edited

Loading

jayqi Apr 3, 2021 •

edited

Loading

pjbull commented Apr 3, 2021

jayqi commented Apr 3, 2021

pjbull commented Apr 3, 2021

pjbull commented Apr 4, 2021

jayqi commented Apr 4, 2021

pjbull commented Apr 4, 2021

pjbull commented Apr 12, 2021

jayqi left a comment

jayqi Apr 13, 2021

pjbull commented Apr 14, 2021 •

edited

Loading

jayqi commented Apr 14, 2021

pjbull commented Apr 14, 2021

pjbull commented May 14, 2021



		# entire function is passed as source, so check separately
		# that all of the built in open write modes fail

Raise exception when open with write mode in call stack #140

Are you sure you want to change the base?

Raise exception when open with write mode in call stack #140

Conversation

jayqi commented Apr 3, 2021

github-actions bot commented Apr 3, 2021 • edited Loading

codecov bot commented Apr 3, 2021 • edited Loading

Codecov Report

jayqi Apr 3, 2021 • edited Loading

Choose a reason for hiding this comment

pjbull commented Apr 3, 2021

jayqi commented Apr 3, 2021

pjbull commented Apr 3, 2021

pjbull commented Apr 4, 2021

jayqi commented Apr 4, 2021

pjbull commented Apr 4, 2021

pjbull commented Apr 12, 2021

jayqi left a comment

Choose a reason for hiding this comment

jayqi Apr 13, 2021

Choose a reason for hiding this comment

pjbull commented Apr 14, 2021 • edited Loading

jayqi commented Apr 14, 2021

pjbull commented Apr 14, 2021

pjbull commented May 14, 2021

Performance update

github-actions bot commented Apr 3, 2021 •

edited

Loading

codecov bot commented Apr 3, 2021 •

edited

Loading

jayqi Apr 3, 2021 •

edited

Loading

pjbull commented Apr 14, 2021 •

edited

Loading