You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.
While trying to crawl a site that requires form based authentication without passing any form of authentication information an error was thrown org.apache.hc.client5.http.ClientProtocolException: Target host is not specified.
The root cause for this was that the returned status code was 303 and the url in the header was /inloggen instead of an absolute url. The url normalizer's filter()-method in turn transforms this input to http://inloggen which is a problem for the http client (hence "target host is not specified").
Simply resolving the returned url against the current web url (which is a seed url and always absolute) avoids throwing the confusing exception. Later on the login form is crawled without throwing errors, but at least while debugging or when looking at the result it is immediately clear that some kind of authentication is needed.
The text was updated successfully, but these errors were encountered:
While trying to crawl a site that requires form based authentication without passing any form of authentication information an error was thrown
org.apache.hc.client5.http.ClientProtocolException: Target host is not specified
.The root cause for this was that the returned status code was 303 and the url in the header was
/inloggen
instead of an absolute url. The url normalizer's filter()-method in turn transforms this input tohttp://inloggen
which is a problem for the http client (hence "target host is not specified").Simply resolving the returned url against the current web url (which is a seed url and always absolute) avoids throwing the confusing exception. Later on the login form is crawled without throwing errors, but at least while debugging or when looking at the result it is immediately clear that some kind of authentication is needed.
The text was updated successfully, but these errors were encountered: