Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depend on phantomjs_bin to provide the phantomjs binaries #78

Merged
merged 5 commits into from
Jan 11, 2021

Conversation

cutz
Copy link
Contributor

@cutz cutz commented Jan 11, 2021

The current release of phantomjs_bin has binaries that aren't executable. jayjiahua/phantomjs-bin-pip#1

At import time we check the binary we are going to use is user executable and attempt to fix it if it is not. This feels a bit dangerous, perhaps there is a better way to go about this?

Fixes Issue #77

@cutz
Copy link
Contributor Author

cutz commented Jan 11, 2021

Tests run in isolation with this change.

setup.py Show resolved Hide resolved
@@ -27,6 +31,15 @@

logger = __import__('logging').getLogger(__name__)

# Make sure our phantomjs executable is, in fact, user executable
# https://github.com/jayjiahua/phantomjs-bin-pip/issues/1
st = os.stat(phantomjs_exe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest deferring all this until it is used (e.g., inside run_phantom_on_page), and only if running fails.

  1. Import-time side-effects are generally bad; and
  2. Especially when they can fail; and
  3. This may not be necessary at all, better to let the OS tell us if it is.

@cutz
Copy link
Contributor Author

cutz commented Jan 11, 2021

@jamadden out of curiosity is this code using warnings instead of the logger just so that it always shows up? Is that still appropriate here?

I noticed warnings was imported when used. Should we do something similar with errno and stat and only import those in the except block if we fall down that case?

@jamadden
Copy link
Member

out of curiosity is this code using warnings instead of the logger just so that it always shows up?

Yes.

Is that still appropriate here?

Probably not.

I noticed warnings was imported when used. Should we do something similar with errno and stat and only import those in the except block if we fall down that case?

I probably would, yes. I think initially the delayed import of warnings here was probably premature optimization, perhaps coupled with a desire to avoid polluting the namespace (although that argument doesn't really hold up because it doesn't delete it).

But for really-unexpected-exceptions-that-almost-certainly-only-happen-once, like this one, there is something to be said for not polluting the namespace and keeping the global import list small (small import lists help readers and tools both). As soon as they were needed more than once, though, move them to the top. (There's no argument to be made for optimization, though, as many other dependencies as get pulled in.)

# Make sure our phantomjs executable is, in fact, user executable
# https://github.com/jayjiahua/phantomjs-bin-pip/issues/1
import warnings
warnings.warn('Phantomjs executable %s is not executable. Updating permissions' % (phantomjs_exe),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this before, but one should essentially never use a dynamic warning string. The old code did, and the old code was wrong.

A dynamic warning string is both hard to filter using the provided mechanisms (e.g., string matching based on an environment variable), and also can be a leak.

The warnings module strives to only emit a warning one time for any given location. It does this by keeping track of the stack location and the string that was emitted. If it sees the same (location, string) pair again, it doesn't emit the warning. Otherwise, it makes a new entry and emits the warning.

The old code didn't have a problem here because it could only ever be emitted once anyway (at import time). This code could have an issue because it can be run multiple times from different threads simultaneously.

Now, the dynamic string is typically going to format to the same value every time. Will the warnings cache account for that, or will it be based on object identity? No idea (actually, in CPython, warnings is implemented in C, and the easiest way to do such a cache would be to use a standard dictionary, so it would be value based, but lets pretend we don't know that).

How about using the logger here? Logging wasn't set up at module import time, but it should be by the time this executes. (The old code should have used a custom Warning object subclass.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL.

Works for me. Thanks.

Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@cutz cutz merged commit 78d8633 into master Jan 11, 2021
@cutz cutz deleted the byo-phantomjs branch January 11, 2021 21:49
@cutz
Copy link
Contributor Author

cutz commented Jan 11, 2021

The best part of this is I get to mark another project as running under compattest....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants