Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crawl redirect errors #826

Merged
merged 5 commits into from
Oct 12, 2021
Merged

Fix crawl redirect errors #826

merged 5 commits into from
Oct 12, 2021

Conversation

bookwyrm
Copy link
Contributor

Allow two redirect hops to support new WP instance crawl behavior.
Report on errors from too many redirects in CLI for debugging.

This fixes a "Too Many Redirects" error when a fresh site redirects twice to go from `/` to the actual post front page.
So that we can have a place to start debugging when we encounter a "Too Many Redirects" error.
bookwyrm referenced this pull request Oct 11, 2021
When creating a Psr7 request with a base_uri set in the Client and a
path that starts with // (like '//wp-sitemap.xml'), the path is
interpreted as an absolute URL and we get a ClientException for
'unknown host: wp-sitemap.xml'. This is fixed by removing base_uri
from the client and always making Requests with the full url.

This was discovered while debugging #824 and may address that issue.
src/Crawler.php Outdated Show resolved Hide resolved
Based on feedback from team, it makes more sense to simply log the error in the standard way and keep going with the crawl.
@john-shaffer john-shaffer merged commit 2bd8d5b into develop Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants