Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depend on phantomjs_bin to provide the phantomjs binaries #78

Merged
merged 5 commits into from
Jan 11, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
'fudge',
'nti.testing',
'zope.testrunner',
'nose >= 1.3.0'
jamadden marked this conversation as resolved.
Show resolved Hide resolved
]


Expand Down Expand Up @@ -70,6 +71,7 @@ def _read(fname):
'nti.wref',
'Paste',
'PasteDeploy',
'phantomjs-binary==2.1.3',
'PyPDF2',
'pyquery',
'pytz',
Expand Down
37 changes: 22 additions & 15 deletions src/nti/contentrendering/phantom.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
import contextlib
from six.moves import urllib_parse

from phantomjs_bin import executable_path as phantomjs_exe

try:
from urllib.request import pathname2url
except ImportError:
Expand All @@ -27,7 +29,6 @@

logger = __import__('logging').getLogger(__name__)


def javascript_path(js_name):
"""
:return: A path to a javascript resource of this package, suitable for passing to phantomjs.
Expand All @@ -39,18 +40,6 @@ def javascript_path(js_name):
return resource_filename(__name__, js_name)


if not os.getenv('DATASERVER_DIR_IS_BUILDOUT'):
# Buildout puts it first on the path
import warnings
try:
warnings.warn("Using whatever phantomjs is on the PATH; supported version 2.1.1; version found at %s is %s"
% (subprocess.check_output(['which', 'phantomjs']).strip(),
subprocess.check_output(['phantomjs', '-v']).strip()),
UserWarning, stacklevel=1)
except subprocess.CalledProcessError:
warnings.warn("Phantomjs not found on the PATH")


class _PhantomProducedUnexpectedOutputError(subprocess.CalledProcessError):

def __str__(self):
Expand Down Expand Up @@ -105,7 +94,7 @@ def run_phantom_on_page(htmlFile, scriptName, args=(), key=_none_key,
# over a socket as opposed to stdout/stderr? As of 1.6, I think this is the
# recommended approach

process = ['phantomjs', scriptName, htmlFile]
process = [phantomjs_exe, scriptName, htmlFile]
process.extend(args)
__traceback_info__ = process
logger.debug("Executing %s", process)
Expand All @@ -118,7 +107,25 @@ def run_phantom_on_page(htmlFile, scriptName, args=(), key=_none_key,
stderr = open('/dev/null', 'wb')

with _closing(stderr):
jsonStr = subprocess.check_output(process, stderr=stderr).strip()
try:
jsonStr = subprocess.check_output(process, stderr=stderr).strip()
except OSError as ex:
import errno
if ex.errno != errno.EACCES:
raise

# Make sure our phantomjs executable is, in fact, user executable
# https://github.com/jayjiahua/phantomjs-bin-pip/issues/1
import warnings
warnings.warn('Phantomjs executable %s is not executable. Updating permissions' % (phantomjs_exe),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this before, but one should essentially never use a dynamic warning string. The old code did, and the old code was wrong.

A dynamic warning string is both hard to filter using the provided mechanisms (e.g., string matching based on an environment variable), and also can be a leak.

The warnings module strives to only emit a warning one time for any given location. It does this by keeping track of the stack location and the string that was emitted. If it sees the same (location, string) pair again, it doesn't emit the warning. Otherwise, it makes a new entry and emits the warning.

The old code didn't have a problem here because it could only ever be emitted once anyway (at import time). This code could have an issue because it can be run multiple times from different threads simultaneously.

Now, the dynamic string is typically going to format to the same value every time. Will the warnings cache account for that, or will it be based on object identity? No idea (actually, in CPython, warnings is implemented in C, and the easiest way to do such a cache would be to use a standard dictionary, so it would be value based, but lets pretend we don't know that).

How about using the logger here? Logging wasn't set up at module import time, but it should be by the time this executes. (The old code should have used a custom Warning object subclass.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL.

Works for me. Thanks.

UserWarning, stacklevel=1)

import stat
mode = os.stat(process[0]).st_mode
os.chmod(process[0], mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH)

# Try to execute again now that we have updated permissions
jsonStr = subprocess.check_output(process, stderr=stderr).strip()
__traceback_info__ += (jsonStr,)

if expect_no_output:
Expand Down