Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fall back to downloading using the requests Python package (if installed) when urllib2 fails due to SSL error #2538

Conversation

bartoldeman
Copy link
Contributor

This fixes #2522 for me, on a new enough CentOS 6 with these packages
installed:
python-requests-2.6.0-4.el6.noarch
pyOpenSSL-0.13.1-2.el6.x86_64

This fixes easybuilders#2522 for me, on a new enough CentOS 6 with these packages
installed:
python-requests-2.6.0-4.el6.noarch
pyOpenSSL-0.13.1-2.el6.x86_64
@@ -460,20 +467,31 @@ def download_file(filename, url, path, forced=False):
attempt_cnt = 0

# use custom HTTP header
url_req = urllib2.Request(url, headers={'User-Agent': 'EasyBuild', "Accept" : "*/*"})
headers = {'User-Agent': 'EasyBuild', "Accept" : "*/*"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace before ':'

except HTTPError as err:
if not HAVE_REQUESTS:
status_code = err.code
if 400 <= status_code <= 499:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't status_code potentially undefined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for HAVE_REQUESTS because status_code is set as
status_code = url_req.status_code
before
url_req.raise_for_status()

from requests.exceptions import HTTPError
HAVE_REQUESTS = True
except ImportError:
import urllib2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add another catch here, to give more useful feedback? Something like "Can't import requests nor urllib2. If you are in an old system please install python-requests"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urllib2 is always available (standard library). But perhaps the urlopen exception handler can check for something like <urlopen error [Errno 1] _ssl.c:492: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure> and if so, report
"SSL issues with urllib2. If you are using RHEL/CentOS 6.x please install the python-requests and pyOpenSSL RPM packages" and try again?

@boegel boegel added this to the next release milestone Sep 4, 2018
@bartoldeman
Copy link
Contributor Author

I changed the commit so that the request package is only used if the error occurs, for the principle of the least surprise.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a test would be nice, although that may not be trivial (we'd somehow need to force the fallback to requests?)

@@ -50,6 +50,8 @@
import tempfile
import time
import urllib2
from urllib2 import HTTPError
HAVE_REQUESTS = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman Please move this down below, right above where _log is created:

try:
    import requests
    from requests.exceptions import HTTPError
    HAVE_REQUESTS = True
except ImportError:
    from urllib2 import HTTPError
    HAVE_REQUESTS = False

@@ -440,6 +442,7 @@ def derive_alt_pypi_url(url):

def download_file(filename, url, path, forced=False):
"""Download a file from the given URL, to the specified path."""
global HAVE_REQUESTS, HTTPError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this with the import construct above

url_fd = urllib2.urlopen(url_req, timeout=timeout)
_log.debug('response code for given url %s: %s' % (url, url_fd.getcode()))
if HAVE_REQUESTS:
url_req = requests.get(url, headers=headers, stream=True, timeout=timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this up where url_req is defined without requests:

if HAVE_REQUESTS:
    url_req = requests.get(url, headers=headers, stream=True, timeout=timeout)
else:
    url_req = urllib2.Request(url, headers=headers)

# urllib2 does the right thing for http proxy setups, urllib does not!
url_fd = urllib2.urlopen(url_req, timeout=timeout)
status_code = url_fd.getcode()
_log.debug('response code for given url %s: %s' % (url, status_code))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman Hmm, this whole block is becoming a bit much now, especially since download_file is already quite messy... Can we move this up into a dedicated function, something like:

status_code = download_from_url_to(...)

_log.warning("URL %s was not found (HTTP response code %s), not trying again" % (url, err.code))
except HTTPError as err:
if not HAVE_REQUESTS:
status_code = err.code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done above?

status_code = url_fd.getcode().code

break
else:
_log.warning("HTTPError occurred while trying to download %s to %s: %s" % (url, path, err))
attempt_cnt += 1
except IOError as err:
_log.warning("IOError occurred while trying to download %s to %s: %s" % (url, path, err))
error_re = re.compile(r"<urlopen error \[Errno 1\] _ssl.c:.*: error:.*:"
"SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>")
if error_re.match(str(err)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construct is used to switch to trying with requests, but it's a bit messy, especially with the inline import statements...
I think we need a use_requests = True or maybe broken_urrlib = True here which can be checked above?

# for backward compatibility, and to avoid relying on 3rd party Python library 'requests'
use_requests = False
while not downloaded and attempt_cnt < max_attempts:
    if use_requests:
        if not HAVE_REQUESTS:
           raise EasyBuildError("...")
        # do requests...
    else:
        # do what we do now
        try:
            ...
        except IOError as err:
            if error_re.match(str(err)):
                use_requests = True

@bartoldeman
Copy link
Contributor Author

This is the cleanest I can make it without making a separate function. If you still want a function please let me know.

@bartoldeman
Copy link
Contributor Author

Triggering travis rebuild, seems to have been a glitch...

@bartoldeman bartoldeman closed this Sep 6, 2018
@bartoldeman bartoldeman reopened this Sep 6, 2018
if used_urllib is urllib2:
# urllib2 does the right thing for http proxy setups, urllib does not!
url_fd = urllib2.urlopen(url_req, timeout=timeout)
status_code = url_fd.getcode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman add .code here rather than below?

status_code = url_fd.getcode().code

or maybe with

if hasattr(status_code, 'code'):
    status_code = status_code.code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that work? .code is a field of the exception err. for urllib2, and getcode() returns an integer.

One can use

if hasattr(err, 'code'):
    status_code = err.code

but I prefer to use use_urllib2 then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure what I was smoking here, I was clearly mixing things up. Thanks for clarifying.

headers = {'User-Agent': 'EasyBuild', 'Accept': '*/*'}
# for backward compatibility, and to avoid relying on 3rd party Python library 'requests'
url_req = urllib2.Request(url, headers=headers)
used_urllib = urllib2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman I'd prefer using a boolean here (use_urllib2) and handling HTTPError via the import above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot from requests.exceptions import HTTPError before it is actually used (because at the first download attempt HTTPError is still urllib.HTTPError, that is why
I changed to used_urllib. If I do both:

use_urllib2 = True
HTTPError = requests.HTTPError

that's a bit double...

Perhaps I should just write a urlopen() emulation using requests (including raising the urllib2.HTTPError exception manually), which might be the cleanest then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I overlooked that you first need HTTPError from urllib2...

url_fd = urllib2.urlopen(url_req, timeout=timeout)
status_code = url_fd.getcode()
else:
r = requests.get(url, headers=headers, stream=True, timeout=timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman don't use single-letter variables outside of list comprehensions please

@boegel boegel changed the title Try to download using the requests Python package, if installed. fall back to downloading using the requests Python package (if installed) when urllib2 fails due to SSL error Sep 7, 2018
@easybuilders easybuilders deleted a comment from boegelbot Sep 7, 2018
@boegel boegel dismissed damianam’s stale review September 9, 2018 07:05

questions answered, suggestions implemented

@boegel boegel removed the change label Sep 9, 2018
boegel
boegel previously approved these changes Sep 9, 2018
@boegel
Copy link
Member

boegel commented Sep 9, 2018

@bartoldeman Looks good to go taking your clarifications into account, but we should add a test to verify that this fallback isn't broken in the future, see bartoldeman#7 (which also fixes a broken URL in test_download_url).

…ackage

add dedicated test for fallback to requests in download_file function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants